Toggle navigation sidebar
Toggle in-page Table of Contents
Data Science for Everyone
Data Science for Everyone: course text
1. Introduction
1.1. Data science in this course
1.2. Who can do data science?
1.3. Why have a special course in data science?
1.4. Data science is a science and an art
2. Thinking like a scientist
2.1. Case Study: John Snow
2.2. Establishing causality is hard
2.3. Observational studies
2.4. More causality terminology
3. Programming 1
3.1. What is programming?
3.2. Why Python?
3.3. Building blocks
3.4. Data types
3.5. Libraries
3.6. Tables
4. Statistics 1
4.2. Population parameters and sample statistics
4.3. Empirical distributions
4.4. Testing hypotheses: Swain vs. Alabama
4.5. Testing hypotheses: Null vs. Alt.
4.6. Comparing two samples
4.7. Error probabilities
5. Statistics 2
5.2. Percentiles
5.3. Bootstrapping
5.4. Hypothesis tests with confidence intervals
5.5. Working with the mean
5.6. Normal distribution
5.7. Central Limit Theorem
6. Working with data
6.1. Finding data
6.2. Measurement
6.3. Common errors in data
6.4. Inspecting, cleaning, and organizing data in Python
6.5. Variable types
7. Programming 2
7.1. Defining functions
7.2. Loops
7.3. Conditional statements
8. Checking in
9. Prediction 1
9.2. An early prediction example
9.3. Linear association: correlation
9.4. Being careful with correlation
9.5. Correlation, prediction, and linear functions
9.6. Linear regression: fitting and interpreting
9.7. Linear regression: prediction and diagnostics
9.8. Inference and prediction for linear regression
10. Prediction 2
10.2. Classification examples
10.3. k-nearest neighbors (kNN)
10.4. Evaluating kNN
10.5. Making
\(kNN\)
more general
10.6. Predicting continuous outcomes
11. Machine learning
11.1. Stats vs. ML
11.2. Supervised ML
11.3. Unsupervised ML
12. Frontiers
12.1. Reinforcement learning
12.2. The
k
-armed bandit
12.3. Implementing
k
-armed bandit in Python
12.4. Natural language processing
12.5. From text to data
12.6. From text to data in Python
13. Inferences
13.1. Bayesian inference
13.2. Base rate fallacy
13.3. Conditioning on a collider
14. Ethics
14.1. Tuskegee Syphilis Study, 1932-1972
14.2. Four Principles of
Research
Ethics
14.3. Ethics and digital data
14.4. Algorithmic “fairness”
15. Conclusion
Index