Books We Can Learn Fundamental Methods of Data Science

The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2009)

7 major machine (statistical) learning method are described, which are

  • Logistic Regression(Hastie, Tibshirani and Friedman, 2009, Chapter 4, 5)
  • Support Vector Machine(Hastie, Tibshirani and Friedman, 2009, Chapter 12)
  • Neural Network(Hastie, Tibshirani and Friedman, 2009, Chapter 11)
  • Tree-Based Method (Random Forests and Decision Tree)(Hastie, Tibshirani and Friedman, 2009, Chapter 15)
  • Principal Components Analysis (PCA)(Hastie, Tibshirani and Friedman, 2009, Chapter 14)
  • Clustering Method (K-Mean andHierarchical)(Hastie, Tibshirani and Friedman, 2009, Chapter 14)
  • Association Rules [biggest success among data mining](Hastie, Tibshirani and Friedman, 2009, Chapter 14)

If you want to see more practical version or do implement.

  • An Introduction to Statistical Learning with Applications in R (James, Witten, Hastie and Tibshirani, 2013)

Feature Engineering for Machine Learning (Zheng and Casari, 2018)

Before modeling, raw data is required to be featured in order to get insight, however, the practice of this process, ‘Feature engineering’ was not organized. This book is about that, which said ‘Good features should not only represent salient aspects of the data, but also conform to the assumptions of the model’.

Mostly Harmless Econometrics: An Empiricist’s Companion (Angrist, 2008)

In ideal situations, the best method to prove the relationship among variants can be experimental random assignment to eliminate selection bias, however, due to scarcity of resources, it is less likely to be employed in the real world.

Instead of experimental random assignment, there are three approach to access the relationship, which are Instrumental variables, Regression discontinuity designs and Differences-in-differences except (Liner) Regression.

If you want to see more practical version or do implement.

  • Mastering ‘Metrics: The Path from Cause to Effect (Angrist and Pischke, 2014)

  • Instrumental variables (IV) (Angrist, 2008, Chapter 4) ‘method harnesses partial or incomplete random assignment, whether naturally occurring or generated by researchers’
  • Regression discontinuity designs (RD) (Angrist, 2008, Chapter 6) ‘The RD design exploits abrupt changes in treatment status that arise when treatment is determined by a cutoff’
  • Instrumental variables (IV) (Angrist, 2008, Chapter 5) ‘in the absence of random assignment, treatment and control groups are likely to differ for many reasons’

These definitions are retrived from (Angrist and Pischke, 2014).

Econometric Analysis of Cross Section and Panel Data (Wooldridge, 2002)

If you want to see more practical version or do implement.

  • Introductory Econometrics: A Modern Approach (Wooldridge, 2000)

Time series analysis (Hamilton, 1994)

It can be said that Time Series Analysis is included in Econometrics, however, it can be characterized by autocorrelation and there are some important frameworks, which are

  • Box–Jenkins Method (mainly including ARIMA and SARIMAX) (Hamilton, 1994, Chapter 3-5)
  • State Space Model (such as The Kalman Filter) (Hamilton, 1994, Chapter 13)

If you want to see more practical version or do implement.

  • Time Series Analysis and Its Applications With R Examples (Shumway and Stoffer, 2016)

Recently Facebook developed ‘The Prophet Forecasting Model’ for Time Series Analysis (Taylor and Letham, 2017).

Reference

  • Hastie, T., Tibshirani, R., and J. Friedman, J., 2009. The Elements of Statistical Learning
  • James, G., Witten, D., Hastie, T. and Tibshirani, R., 2013. An Introduction to Statistical Learning with Applications in R
  • Zheng, A. and Casari, A., 2018. Feature Engineering for Machine Learning
  • Angrist, J.D., 2008. Mostly Harmless Econometrics: An Empiricist’s Companion
  • Angrist, J.D. and Pischke, J -S., 2014. Mastering ‘Metrics: The Path from Cause to Effect
  • Wooldridge, J.M., 2002. Econometric Analysis of Cross Section and Panel Data
  • Wooldridge, J.M., 2000. Introductory Econometrics: A Modern Approach
  • Hamilton, R., 1994. Time series analysis
  • Shumway, R.H. and Stoffer, D.S., 2016. Time Series Analysis and Its Applications With R Examples
  • Taylor, S.J. and Letham, B., 2017. Forecasting at scale. PeerJ Preprints 5:e3190v2 https://doi.org/10.7287/peerj.preprints.3190v2