My small collection of books and scientific papers related to Data Science, Data Mining, Machine Learning, and Statistics.
Data Science
- The Art of Data Science [Matthew J. Graham]
- An Introduction to Data Science [Jeffrey Stanton]
- Field Guide to Data Science [Booz | Allen | Hamilton]
Statistics
- Think Bayes, Bayesian Statistics Made Simple [Allen B. Downey]
- Probabilistic Programming & Bayesian Methods for Hackers [Cam Davidson-Pilon]
- Frequentism and Bayesianism: A Python-driven Primer [Jake VanderPlas]
- Statistical Modeling: The Two Cultures [Leo Breiman]
Data Modelling
- Modelling with Data [Ben Klemens]
Data Mining
- Programmers Guide to Data Mining [Ron Zacharski]
- Data Mining and Analysis: Fundamental Concepts and Algorithms [Mohamed J. Zaki]
- Data Mining and Business Analytics with R [Johannes Ledolter]
- Mining of Massive Datasets [Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman]
- Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery [Graham Williams]
Machine Learning / Statistical Learning
- Large-Scale Machine Learning on Heterogeneous Distributed Systems [Google Research]
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition [Trevor Hastie, Robert Tibshirani, Jerome Friedman]
- An Introduction to Statistical Learning with Applications in R [Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani]
- Bayesian Reasoning and Machine Learning [David Barber]
- Gaussian Process for Machine Learning [C. E. Rasmusses, C. K. I. Williams]
- Information Theory, Inference, and Learning Algorithms [David MacKay]
- Introduction to Machine Learning [Amnon Shashua]
- Machine Learning [Abdelhamid Mellouk, Abdennacer Chebira]
- Machine Learning, Neural and Statistical Classification [D. Michie, D.J. Spiegelhalter, C.C. Taylor]
- Pattern Recognition and Machine Learning [M. Jordan, J. Kleinberg, B. Schölkopf]
- Reinforcement Learning: An Introduction [Richard S. Sutton, Andrew G. Barto]
- A Few Useful Things to Know about Machine Learning [Pedro Domingos]
- Dive into Machine Learning [@hangtwentyy]
- An Introduction to Machine Learning Theory and its Applications [Irina Papuc]
- MLlib: Machine Learning in Apache Spark [Databricks et al.]
- Gradient-Based Learning Applied to Document Recognition [Yann LeCun et al.]
- Machine Learning at Scale [Sergei Izrailev, Jeremy M. Stanley]
- SparkNet: Training Deep Networks in Spark [Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan]
- Convergent Learning: Do different neural networks learn the same representations? [Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft]
- Human-level concept learning through probabilistic program induction [Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum]
- Hidden Technical Debt in Machine Learning Systems [D. Sculley, Gary Holt, Daniel Golovin et al.]
Papers
- The Structural Virality of Online Diffusion [Sharad Goel, Ashton Anderson, Jake Hofman, Duncan J. Watts]
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [Matei Zaharia et al.]
- Spark: Cluster Computing with Working Sets [Matei Zaharia et al.]
- Lightweight Asynchronous Snapshots for Distributed Dataflows [dataArtisans et al.]
Cheatsheets
- Probability Cheatsheet v2.0 [William Chen, Joe Blitzstein]