Every machine learning algorithm is a bet — a set of assumptions about structure hidden in data. We learn to make those bets consciously, derive them mathematically, and ask who bears the cost when they're wrong.
18
weeks of depth
3
adventure tracks
0
exams, all projects
∇
first principles first
✊
liberatory pedagogy
Our Foundation
ML Isn't Neutral. Neither Is This Course.
Machine learning algorithms are not mathematical facts of nature. They are choices — choices about what to optimize, what to measure, whose data counts. We learn the math and the politics simultaneously. You cannot do one without the other.
⚙️
Most ML Courses Teach You This
Import sklearn. Call .fit(). Get accuracy. Submit homework. Graduate. Never understand what the library is actually doing. Never ask who designed these defaults and why.
🌱
This Course Teaches You This
Derive the loss function. Implement the gradient update from scratch. Then use the library knowing what it's doing. Then ask: what are the assumptions baked into this algorithm, who does it serve when it's right, and who does it harm when it's wrong?
// 01
Root Before Branch
You implement linear regression before you touch scikit-learn. You derive backprop before PyTorch. Amy Ko's research: reading before writing creates more robust, transferable learning.
// 02
Choose Your Depth
Three tracks share core concepts, diverge on depth. Novice: visual intuition + Colab notebooks. Builder: NumPy implementations. Architect: full mathematical derivations + research papers.
// 03
Uncertainty Is the Lesson
Amy Ko: "Understanding ML means understanding uncertainty." Every model is uncertain. Every prediction has a confidence interval. We learn to communicate uncertainty, not pretend it doesn't exist.
// 04
No Portfolio, No Grade
Adapted from Jeff Anderson's ungrading practice: you assign your final grade with evidence from your portfolio. Grades are tools for self-assessment, not performance metrics for the instructor's comfort.
Choose Your Adventure
Three Tracks. One Truth.
All three tracks cover the same core ML concepts every week. They diverge on project depth, mathematical rigor, and tooling. You self-select your track. You can move up at any point. Jeff Anderson's rule applies: spend as much time as possible at the edge of your productive struggle.
Track I · Novice
The Data Explorer
"You have questions about the world. Data has answers. Let's find them."
You'll use real-world datasets that connect to issues you care about — community health, education, housing, music. You'll learn each algorithm through a concrete problem, visualizing what the model is learning before writing a line of sklearn. Amy Ko's principle: read and understand before you write and execute.
🎯 Final: A data story — pick a public dataset, run a full ML analysis, write a plain-language report for your community explaining what you found, what the model can't see, and what people should know.
// Key Projects
P1Housing Price Predictor: linear regression on local Zillow data. Visual gradient descent before one line of sklearn.
P2Heart Disease Classifier: logistic regression. Precision/recall tradeoffs. When is false positive worse than false negative?
P3Spotify Genre Clusterer: K-means on audio features. Does the algorithm agree with your ears?
P4Movie Sentiment Analyzer: Naive Bayes on reviews. Test on reviews from your own cultural context.
P5Midterm: Bias Audit — pick a public ML model, document 5 failure modes, propose a technically grounded fix.
Track II · Builder
The Implementer
"You know Python. Now build the algorithms from the ground up."
Prerequisites: Python + some math
Tools: NumPy, Pandas, sklearn
Math: Calculus + Linear Algebra
// What You'll Do
You implement every algorithm from scratch before using the library version. The rule is immutable: you cannot call sklearn.linear_model.LinearRegression() until your NumPy version passes unit tests. You'll see the abstraction layer and understand exactly what it's hiding.
🎯 Final: A full ML pipeline — problem framing, data cleaning, feature engineering, 3+ model comparison, bias evaluation, and a written practitioner brief explaining your choices and their societal implications.
// Key Projects
P1Linear Regression from scratch: MSE loss, gradient descent, NumPy matrix ops. Must match sklearn within 0.001.
P3Decision Tree from scratch: information gain, CART algorithm, depth-limited pruning.
P4Neural Network: 3-layer MLP, backprop in NumPy, trained on tabular real-world data.
P5Midterm: Kaggle competition — custom pipeline only. No AutoML. Document every decision.
Track III · Architect
The Theorist
"You want the proofs, the edge cases, and the open research questions."
Prerequisites: Calc, Lin Alg, Stats
Tools: NumPy, PyTorch, LaTeX
Math: Real Analysis + Probability
// What You'll Do
You engage with the mathematical foundations directly. Proofs, derivations, and published research papers are the currency. You'll implement autograd, understand PAC learning theory, and engage critically with current research on fairness and robustness. Amy Ko: "studying computing requires human-centered methods — even for theory."
🎯 Final: A research-grade project — novel research question, literature review, implementation, experiments, and a preprint-style write-up with critical analysis of limitations and societal implications.
// Key Projects
P1Derive MLE from first principles for Gaussian, Bernoulli, and multinomial distributions.
P2Prove gradient descent convergence for L-smooth, μ-strongly convex functions. Implement and verify empirically.
P3Implement mini-autograd (like Karpathy's micrograd). Extend to support arbitrary compute graphs.
P5Midterm: Paper replication — replicate a fairness-in-ML result from NeurIPS/ICML. Document every gap between paper and reality.
Core Concepts
The ML Stack — From the Root Up
These are the foundational concepts we master across all 18 weeks. Every algorithm we study is a variation on these themes. If you understand these deeply, you can learn any new ML technique in days.
// 01
Loss Functions
J(θ) = (1/n)Σᵢ ℒ(yᵢ, f(xᵢ;θ))
Every learning algorithm is an optimization problem. The loss function is the compass. We study MSE, cross-entropy, hinge loss — not as recipes but as choices that encode assumptions about the world.
// 02
Gradient Descent
θ := θ - α · ∇J(θ)
The engine of learning. From batch to stochastic to mini-batch. Convergence guarantees. Adaptive methods. We build intuition first, then rigor. You'll understand why Adam works before you use it.
// 03
Bias-Variance Tradeoff
E[L] = Bias² + Variance + σ²
The central tension of all statistical learning. Underfitting, overfitting, regularization. Understanding why regularization works is more valuable than knowing how to call it.
// 04
Maximum Likelihood
θ̂_MLE = argmax P(data | θ)
Why do we minimize MSE for regression and cross-entropy for classification? Because they're the same thing: maximum likelihood estimation under different distributional assumptions. This unifies everything.
// 05
Probabilistic Thinking
P(y|x) = P(x|y)·P(y) / P(x)
All of ML is probability. Bayes, conditional independence, the generative vs. discriminative distinction. Ko: "understanding ML means understanding uncertainty." We make uncertainty visible, not hidden.
// 06
Feature Engineering
φ: x → ℝᵈ (good representation)
Often the highest-leverage skill in applied ML. What you feed the model matters more than which model you choose. And what features exist in a dataset encodes who was worth measuring.
// 07
Model Evaluation
CV Risk = (1/k)Σ ℒ(f̂₋ₛ, Dₛ)
Accuracy is a lie when classes are imbalanced. We study precision, recall, AUC, calibration, and fairness metrics — understanding why they conflict mathematically and ethically.
// 08
Neural Networks
h = σ(Wₗ·...·σ(W₁x + b₁) + bₗ)
Universal approximation, but at a cost. Backpropagation, activation functions, architectures. We implement from scratch before PyTorch. The abstraction is earned, not given.
// 09
Algorithmic Fairness
P(ŷ=1|A=0) ≠ P(ŷ=1|A=1)
Fairness is not one thing — it's a family of mathematical definitions that are mutually incompatible. Understanding why they conflict is the most important result in the course. There is no free lunch, ethically or mathematically.
Research-Grounded Pedagogy
What the Research Actually Says
This course is designed in direct conversation with Dr. Amy Ko's 25 years of computing education research and Jeff Anderson's classroom practice. Every pedagogical choice has a justification.
Amy Ko Research
Read Before You Write
Ko's research shows that teaching program reading before program writing creates more robust understanding. We apply this to ML: you analyze, trace, and critique existing algorithms before implementing them. Understanding precedes production.
Amy Ko Research
CS Assessments Aren't Fair
Ko's work using psychometrics to study CS assessments found they often disadvantage students not because of knowledge gaps but because of assessment design. Portfolio + ungrading sidesteps this entirely — your evidence speaks, not your performance under pressure.
Amy Ko Research
Scaffolded Struggle Works
Ko's Gidget research: framing challenges as authentic practice (not failure) keeps learners engaged with deliberate, step-by-step process. Our "2-minute question rule" and tiered project scaffolding are direct applications of this finding.
Jeff Anderson
Conquering College
Jeff Anderson's framework for first-generation students: the meta-skills of college — how to ask for help, form study groups, manage time, communicate with instructors — are taught explicitly, not assumed. This is part of our curriculum.
Jeff Anderson
Deep Learning vs. Surface
Anderson distinguishes deep learning (understanding that transfers) from surface learning (passing the test). Every course design choice — first principles, projects over exams, portfolio over grades — is oriented toward deep learning that serves you in 10 years.
Freire + Ko
Justice-Focused CS Requires Agency
Ko's research found justice-focused CS education is empowering but requires student trust and agency. We build that trust through transparent course design, flexible deadlines, and making the power dynamics of the classroom explicit. You are a co-creator, not a consumer.
18-Week Calendar
The Journey, Week by Week
We begin with the statistical foundations that unify all of ML — probability, distributions, estimation. We build upward through supervised, unsupervised, and deep learning. We end with the open problems: fairness, robustness, and alignment.
Wk
Theme & Question
Core Concepts + Math
Lab / Project
Unit 1 — Statistical Foundations: Probability, Estimation, Loss
01Unit 1
What Is Learning From Data?
The ML frame: inputs, outputs, loss, optimization. How does this differ from programming? Why is statistical learning hard?
Foundational
Probability Review + MLE Intro
Random variables, expectations, variance. MLE as a principle: find parameters that maximize probability of observed data. Gaussian MLE derived.
Math Root
Lab: Fit a Gaussian by Hand
Derive μ and σ² from MLE. Then use NumPy. Verify they match. Novice: visual interactive. Architect: full derivation with regularity conditions.
Project
02Unit 1
Supervised Learning Setup
Training set, test set, generalization. The fundamental question: why does training performance predict test performance? Introduce the i.i.d. assumption — and when it breaks.
Foundational
Loss Functions as Probabilistic Choices
MSE = MLE under Gaussian noise. Cross-entropy = MLE under Bernoulli. This is the unifying theorem. Understand it and every supervised algorithm makes sense.
Math Root
Lab: Loss Landscape Explorer
Visualize MSE and cross-entropy surfaces. Understand what "minimizing" means geometrically. Then: whose data was used to train a real model you interact with daily?
Critical
Unit 2 — Linear Methods: Regression, Classification, Regularization
03Unit 2
Linear Regression: The Foundation
The model: ŷ = wᵀx + b. MSE loss. Gradient derivation. Analytical solution (normal equations). The line between "fitting" and "overfitting." This is the root of all ML.
Predict Bay Area housing prices. From scratch before sklearn. Then: what variables does the model rely on? What does it encode about whose neighborhoods matter?
Critical
04Unit 2
Regularization: Bias vs. Variance
Ridge (L2) and Lasso (L1) as Bayesian priors. The bias-variance decomposition. Why more complex models aren't always better. Cross-validation as honest evaluation.
Foundational
Logistic Regression + Sigmoid
From regression to binary classification. The sigmoid function. Binary cross-entropy loss. Decision boundaries as hyperplanes. What does the model think the "boundary" means?
Lab: What Does "Accurate" Mean?
Precision, recall, F1, AUC-ROC, confusion matrix. Analyze a hiring classifier. Document where false positives vs. false negatives hit harder — and who bears the cost.
Critical
05Unit 2
Naive Bayes + Generative Models
The generative vs. discriminative distinction. Naive Bayes: where the "naive" assumption is, when it works, and when it catastrophically fails. Text classification.
Feature Engineering
Encoding categorical variables. Polynomial features. Normalization. The choices embedded in feature engineering. What gets measured, and why, encodes power.
Critical
Project: Spam/Sentiment Classifier
Build Naive Bayes text classifier. Test on text from your own cultural context (AAVE, Spanish, code-switching). Document failures. Propose a data collection fix.
Project
06Unit 2
Decision Trees + Ensembles
Information gain, CART. Random forests as variance reduction. Gradient boosting as sequential error correction. Interpretability vs. performance tension.
Model Selection + Pipelines
Hyperparameter tuning. Grid search. The multiple comparisons problem. Why your cross-validated results are probably still too optimistic.
Lab: Recidivism Prediction Audit
Analyze COMPAS-style decision tree on ProPublica data. Compute accuracy by race. Discuss: why do different fairness definitions conflict mathematically? What should policy do?
Critical
Unit 3 — Unsupervised Learning: Clustering, Dimensionality, Representation
07Unit 3
Learning Without Labels
Why is unsupervised learning harder to evaluate? K-means: Lloyd's algorithm, convergence, choosing k. The Elbow Method. What does it mean for the algorithm to "decide" what's similar?
Gaussian Mixture Models
Soft clustering. The EM algorithm as alternating optimization. K-means is a special case of GMM. Novice: visual. Builder: implement EM. Architect: derive ELBO.
Math Root
Lab: Market Segmentation
K-means on customer behavior data. Compare algorithm clusters to demographic patterns. Discuss: when does "segment" become "discriminate"?
Critical
08Unit 3
PCA + Dimensionality Reduction
The curse of dimensionality. PCA as variance maximization and as finding the data's "natural axes." SVD connection. What information is lost in compression?
Foundational
Word Embeddings + Representation
word2vec, GloVe — geometry of meaning. King - Man + Woman = Queen. The bias embedded in word vectors. Who decided what relationship was "analogous"?
Critical
Project: Embed Your Community
Train word embeddings on text from your community (Reddit, Twitter, local news). Visualize the geometry. Document what biases the algorithm learned and from where.
Community showcase. Present your first major project: the technical work, the critical analysis, and what you'd do differently. Peer + guest feedback.
Exhibition
Critical ML Guest Panel
Practitioner from healthcare, criminal justice, or hiring technology. Class debates: "What should ML never be allowed to decide?" Structured discussion, not lecture.
Critical
Final Project Pitch
Students pitch final project topics. Peer + instructor structured feedback. Learning partnerships form around shared domains. Begin iteration cycle.
Project
Unit 4 — Deep Learning: Neural Networks, Backpropagation, Architectures
10Unit 4
Neural Networks: The Full Picture
From linear regression to a 3-layer MLP in one step. Activation functions (sigmoid, ReLU, tanh) and why they exist. The universal approximation theorem — and its limits.
Implement a 3-layer neural network in NumPy. Train on a social issue dataset. Compare to logistic regression. Understand what the extra complexity buys you.
Project
11Unit 4
Convolutional Neural Networks
Translation invariance. Filters, pooling, feature maps. How spatial structure is exploited. From edge detection to face recognition — including its dangers.
Training Deep Networks
Vanishing gradients, batch normalization, dropout. Why deep learning is hard to train — and why modern tricks work. The gap between theory and practice.
Lab: Interpret What the Model Sees
Use saliency maps and Grad-CAM to visualize what a CNN is paying attention to. Does it match your expectations? Document the gap. Buolamwini: "Gender Shades" paper.
Critical
12Unit 4
Sequence Models + Transformers
RNNs, the vanishing gradient problem, LSTMs. Then self-attention as a solution. The transformer architecture — demystified. How does attention work mathematically?
Transfer Learning
Pre-training + fine-tuning. What gets transferred and why. The implicit world model in a pre-trained model. Who built that model, on whose data, for whose benefit?
Critical
Lab: Fine-Tune for Your Use Case
Fine-tune a small pre-trained model (BERT-tiny or similar) on a task meaningful to your community. Document what the pre-trained knowledge helps and hurts.
Project
Unit 5 — Advanced Topics: Fairness, Robustness, Bayesian Methods, RL
13Unit 5
Bayesian Machine Learning
Priors as assumptions. Posteriors as updated beliefs. Bayesian linear regression. The full Bayesian approach vs. MAP vs. MLE. When does being Bayesian matter?
Calibration + Uncertainty
A model that says "70% confident" should be right 70% of the time. Calibration plots. Temperature scaling. Ko: "uncertainty is the lesson." Most production ML ignores this.
Math Root
Lab: Is Your Model Calibrated?
Compute calibration curves for a medical prediction model. Analyze: what happens when an overconfident model informs a doctor's decision?
Critical
14Unit 5
Algorithmic Fairness
Demographic parity, equalized odds, predictive parity — and the impossibility theorem proving they can't all hold simultaneously. This is not a failure of engineering. It's a political choice.
Critical
Adversarial ML
FGSM attacks. The brittleness of deep learning. Certified robustness. Adversarial examples as a mirror: the model didn't learn what we thought it learned.
Foundational
Lab: Attack Your Own Model
Generate adversarial examples against a model you built earlier. Document what the attack reveals about what the model actually learned. Propose a defense and test it.
Project
15Unit 5
Reinforcement Learning Intro
MDPs, Q-learning, policy gradients. The reward function as a value system. Who defines the reward? Goodhart's Law. The alignment problem as an ML problem.
Human-in-the-Loop ML
Active learning, RLHF, annotation pipelines. Who are the annotators? What are they paid? Whose judgments define "correct" in supervised learning?
Critical
Lab: Design a Reward Function
Implement Q-learning on a gridworld. Design the reward yourself. Observe what the agent optimizes for. Reflect: what did you accidentally teach it to value?
Project
// Final Weeks — Synthesis, Exhibition, Celebration
16Final
ML in the Wild
The gap between research ML and production ML. Data pipelines, model monitoring, distribution shift. What happens when the world changes and the model doesn't know?
The Practitioner's Responsibility
Jeff Anderson: navigate vs. transform harmful systems. As a future ML practitioner, you will be asked to build things that harm people. How will you respond? What is your line?
Critical
Final Project Studio
Dedicated studio time. Learning conference check-ins. Peer feedback sessions. Document your process documentation, not just final results.
Project
17Final
Final Exhibition — Day 1
Community showcase. Present your complete project: the problem, the pipeline, the results, the failures, and the implications. Invited guests from the community.
Exhibition
Portfolio Learning Conferences
Individual meetings: you present your full portfolio and assign your grade with evidence. This is the most important evaluation moment of the course.
Portfolio Due
All projects, concept notes, critical essays, reflections, and final self-evaluation. The evidence of your learning journey, not just its destination.
18Finale
Final Exhibition — Day 2 + Celebration
Remaining presentations. Celebration of learning. Reflection on the semester. What does this course leave with you — and what do you leave in it?
Exhibition
Where Do You Go From Here?
Transfer pathways, careers in ethical ML, open-source contribution, communities doing justice-focused data work. You are a practitioner. Act accordingly.
Course Co-Evaluation
You evaluate the course. Your feedback co-creates the next version. Freire's dialogic education: the student teaches the teacher. This is that moment.
Portfolio + Ungrading
Evidence Over Performance. Always.
Adapted from Jeff Anderson's ungrading practice. The goal is not to perform mastery for a grade. The goal is to build real, transferable understanding you can use in 10 years. Evidence of learning — not test scores — is the currency.
📂
Your Portfolio
A living document of your learning journey. Process over product. First attempts, failures, revised understanding — all included. Ko: "we study programming with human-centered methods." Apply that to your own learning.
All project code with documented thinking
Concept notes in your own language
Critical essays: who built this, for whom?
Bi-weekly learning reflections
Evidence of peer teaching + feedback given
"Conquering College" meta-learning log
🔁
Three Feedback Loops
You receive feedback from three sources. The instructor is the smallest source — mirroring professional practice where self-assessment and peer review are the primary quality signals.
All feedback is specific and forward-looking, not evaluative
✍️
You Assign Your Grade
At the end of the course, you write a final self-evaluation with evidence from your portfolio. You assign your grade. The instructor reviews and recommends. If the evidence is there, it's confirmed. This is not a trick — it's how trust-based education works.
A: Deep engagement with all major concepts + strong projects
B: Solid understanding, some gaps, consistent effort
No D/F for students who show up and engage honestly
// Learning Strategy
How to Succeed in ML
These aren't hacks — they're how people actually learn hard math and code. Especially if you're first-gen in a field that wasn't built for you.
01
Derive Before You Cite
Before writing θ := θ - α∇J, derive it. What is J? Why do we subtract? What does the gradient point toward? Derivation is the deepest form of understanding.
02
Implement Before Import
You cannot call sklearn.LinearRegression() until your NumPy version passes unit tests. This rule is the core of the course. Every abstraction is earned, not given.
03
Plot Everything
Loss curves. Decision boundaries. Confusion matrices. Calibration plots. In ML, seeing IS understanding. If you haven't plotted it, you haven't understood it yet.
04
Sanity Check with n=5
Before running on 10,000 rows, run on 5. Hand-verify every output. Does the loss go down? Are the shapes right? Bugs in ML fail silently — you have to check.
05
Speak the Math Aloud
Explain gradient descent out loud without writing a formula. If you can't, you don't understand it yet. Talk to a classmate. Talk to yourself. Verbalization catches gaps that code hides.
06
Ask Who Bears the Cost
Every time a model "works," ask: who was in the training set, and who wasn't? What does high accuracy hide? This question is not optional — it is part of the assignment.
// How It All Connects
The ML Concept Map
Every algorithm in this course connects back to the same root question: what does it mean for a machine to learn from data?
The Root
Loss Function J(θ) ↓ MLE Derivation ↓ Gradient Descent ↓ Every Supervised Model
Linear Methods
Linear Regression ↓ Logistic Regression ↓ Regularization ↓ SVMs / Decision Trees