##### My small collection of books and scientific papers related to Data Science, Data Mining, Machine Learning, and Statistics.

### Data Science

- The Art of Data Science
*[Matthew J. Graham]* - An Introduction to Data Science
*[Jeffrey Stanton]* - Field Guide to Data Science
*[Booz | Allen | Hamilton]*

### Statistics

- Think Bayes, Bayesian Statistics Made Simple
*[Allen B. Downey]* - Probabilistic Programming & Bayesian Methods for Hackers
*[Cam Davidson-Pilon]* - Frequentism and Bayesianism: A Python-driven Primer
*[Jake VanderPlas]* - Statistical Modeling: The Two Cultures
*[Leo Breiman]*

### Data Modelling

- Modelling with Data
*[Ben Klemens]*

### Data Mining

- Programmers Guide to Data Mining
*[Ron Zacharski]* - Data Mining and Analysis: Fundamental Concepts and Algorithms
*[Mohamed J. Zaki]* - Data Mining and Business Analytics with R
*[Johannes Ledolter]* - Mining of Massive Datasets
*[Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman]* - Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery
*[Graham Williams]*

### Machine Learning / Statistical Learning

- Large-Scale Machine Learning on Heterogeneous Distributed Systems
*[Google Research]* - The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition
*[Trevor Hastie, Robert Tibshirani, Jerome Friedman]* - An Introduction to Statistical Learning with Applications in R
*[Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani]* - Bayesian Reasoning and Machine Learning
*[David Barber]* - Gaussian Process for Machine Learning
*[C. E. Rasmusses, C. K. I. Williams]* - Information Theory, Inference, and Learning Algorithms
*[David MacKay]* - Introduction to Machine Learning
*[Amnon Shashua]* - Machine Learning
*[Abdelhamid Mellouk, Abdennacer Chebira]* - Machine Learning, Neural and Statistical Classification
*[D. Michie, D.J. Spiegelhalter, C.C. Taylor]* - Pattern Recognition and Machine Learning
*[M. Jordan, J. Kleinberg, B. Schölkopf]* - Reinforcement Learning: An Introduction
*[Richard S. Sutton, Andrew G. Barto]* - A Few Useful Things to Know about Machine Learning
*[Pedro Domingos]* - Dive into Machine Learning
*[@hangtwentyy]* - An Introduction to Machine Learning Theory and its Applications
*[Irina Papuc]* - MLlib: Machine Learning in Apache Spark
*[Databricks et al.]* - Gradient-Based Learning Applied to Document Recognition
*[Yann LeCun et al.]* - Machine Learning at Scale
*[Sergei Izrailev, Jeremy M. Stanley]* - SparkNet: Training Deep Networks in Spark
*[Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan]* - Convergent Learning: Do different neural networks learn the same representations?
*[Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft]* - Human-level concept learning through probabilistic program induction
*[Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum]* - Hidden Technical Debt in Machine Learning Systems
*[D. Sculley, Gary Holt, Daniel Golovin et al.]*

### Papers

- The Structural Virality of Online Diffusion
*[Sharad Goel, Ashton Anderson, Jake Hofman, Duncan J. Watts]* - Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
*[Matei Zaharia et al.]* - Spark: Cluster Computing with Working Sets
*[Matei Zaharia et al.]* - Lightweight Asynchronous Snapshots for Distributed Dataflows
*[dataArtisans et al.]*

#### Cheatsheets

- Probability Cheatsheet v2.0
*[William Chen, Joe Blitzstein]*