Gong Qingfeng | Principles

principles


breiman

  • conversation
    • moved sf -> la -> caltech (physics) -> columbia (math) -> berkeley (math)
    • info theory + gambling
    • CART, ace, and prob book, bagging
    • ucla prof., then consultant, then founded stat computing at berkeley
    • lots of cool outside activities
      • ex. selling ice in mexico
  • 2 cultures paper
    1. generative - data are generated by a given stochastic model
      • stat does this too much and needs to move to 2
      • ex. assume y = f(x, noise, parameters)
      • validation: goodness-of-fit and residuals
    2. predictive - use algorithmic model and data mechanism unknown
      • assume nothing about x and y
      • ex. generate P(x, y) with neural net
      • validation: prediction accuracy
      • axioms
    3. Occam
    4. Rashomon - lots of different good models, which explains best? - ex. rf is not robust at all
    5. Bellman - curse of dimensionality - might actually want to increase dimensionality (ex. svms embedded in higher dimension)
      • industry was problem-solving, academia had too much culture

box + tukey

  • questions
    1. what points are relevant and irrelevant today in both papers?
      • relevant
      • box
        • thoughts on scientific method
        • solns should be simple
        • necessity for developing experimental design
        • flaws (cookbookery, mathematistry)
      • tukey
        • separating data analysis and stats
        • all models have flaws
        • no best models
        • lots of goold old techniques (e.g. LSR) - irrelevant
      • some of the data techniques (I think)
      • tukey multiple-response data has been better attacked (graphical models)
    2. how do you think the personal traits of Tukey and Box relate to the scientific opinions expressed in their papers?
      • probably both pretty critical of the science at the time
      • box - great respect for Fisher
      • both very curious in different fields of science
    3. what is the most valuable msg that you get from each paper?
      • box - data analysis is a science
      • tukey - models must be useful
      • no best models
      • find data that is useful
      • no best models
  • box_79 “science and statistics”
    • scientific method - iteration between theory and practice
      • learning - discrepancy between theory and practice
      • solns should be simple
    • fisher - founder of statistics (early 1900s)
      • couples math with applications
      • data analysis - subiteration between tentative model and tentative analysis
      • develops experimental design
    • flaws
      • cookbookery - forcing all problems into 1 or 2 routine techniques
      • mathematistry - development of theory for theory’s sake
  • tukey_62 “the future of data analysis”
    • general considerations
      • data analysis - different from statistics, is a science
      • lots of techniques are very old (LS - Gauss, 1803)
      • all models have flaws
      • no best models
      • must teach multiple data analysis methods
    • spotty data - lots of irregularly non-constant variability
      • could just trim highest and lowest values
        • winzorizing - replace suspect values with closest values that aren’t
      • must decide when to use new techniques, even when not fully understood
      • want some automation
      • FUNOP - fulll normal plot
        • can be visualized in table
    • spotty data in more complex situations
      • FUNOR-FUNOM
    • multiple-response data
      • understudied except for factor analysis
      • multiple-response procedures have been modeled upon how early single-response procedures were supposed to have been used, rather than upon how they were in fact used
      • factor analysis
        1. reduce dimensionality with new coordinates
        2. rotate to find meaningful coordinates
          • can use multiple regression factors as one factor if they are very correlated
      • regression techniques always offer hopes of learning more from less data than do variance-component techniques
    • flexibility of attack
      • ex. what unit to measure in