CS 185 · 3 Units · 18 Weeks · Student Curriculum Draft

Intro to
Machine
Learning

Every machine learning algorithm is a bet — a set of assumptions about structure hidden in data. We learn to make those bets consciously, derive them mathematically, and ask who bears the cost when they're wrong.

18 weeks of depth

3 adventure tracks

0 exams, all projects

∇ first principles first

✊ liberatory pedagogy

Our Foundation

ML Isn't Neutral.
Neither Is This Course.

Machine learning algorithms are not mathematical facts of nature. They are choices — choices about what to optimize, what to measure, whose data counts. We learn the math and the politics simultaneously. You cannot do one without the other.

⚙️

Most ML Courses Teach You This

Import sklearn. Call .fit(). Get accuracy. Submit homework. Graduate. Never understand what the library is actually doing. Never ask who designed these defaults and why.

🌱

This Course Teaches You This

Derive the loss function. Implement the gradient update from scratch. Then use the library knowing what it's doing. Then ask: what are the assumptions baked into this algorithm, who does it serve when it's right, and who does it harm when it's wrong?

// 01

Root Before Branch

You implement linear regression before you touch scikit-learn. You derive backprop before PyTorch. Amy Ko's research: reading before writing creates more robust, transferable learning.

// 02

Choose Your Depth

Three tracks share core concepts, diverge on depth. Novice: visual intuition + Colab notebooks. Builder: NumPy implementations. Architect: full mathematical derivations + research papers.

// 03

Uncertainty Is the Lesson

Amy Ko: "Understanding ML means understanding uncertainty." Every model is uncertain. Every prediction has a confidence interval. We learn to communicate uncertainty, not pretend it doesn't exist.

// 04

No Portfolio, No Grade

Adapted from Jeff Anderson's ungrading practice: you assign your final grade with evidence from your portfolio. Grades are tools for self-assessment, not performance metrics for the instructor's comfort.

Choose Your Adventure

Three Tracks.
One Truth.

All three tracks cover the same core ML concepts every week. They diverge on project depth, mathematical rigor, and tooling. You self-select your track. You can move up at any point. Jeff Anderson's rule applies: spend as much time as possible at the edge of your productive struggle.

// What You'll Do

You'll use real-world datasets that connect to issues you care about — community health, education, housing, music. You'll learn each algorithm through a concrete problem, visualizing what the model is learning before writing a line of sklearn. Amy Ko's principle: read and understand before you write and execute.

🎯 Final: A data story — pick a public dataset, run a full ML analysis, write a plain-language report for your community explaining what you found, what the model can't see, and what people should know.

// Key Projects

P1Housing Price Predictor: linear regression on local Zillow data. Visual gradient descent before one line of sklearn.

P2Heart Disease Classifier: logistic regression. Precision/recall tradeoffs. When is false positive worse than false negative?

P3Spotify Genre Clusterer: K-means on audio features. Does the algorithm agree with your ears?

P4Movie Sentiment Analyzer: Naive Bayes on reviews. Test on reviews from your own cultural context.

P5Midterm: Bias Audit — pick a public ML model, document 5 failure modes, propose a technically grounded fix.

// What You'll Do

You implement every algorithm from scratch before using the library version. The rule is immutable: you cannot call sklearn.linear_model.LinearRegression() until your NumPy version passes unit tests. You'll see the abstraction layer and understand exactly what it's hiding.

🎯 Final: A full ML pipeline — problem framing, data cleaning, feature engineering, 3+ model comparison, bias evaluation, and a written practitioner brief explaining your choices and their societal implications.

// Key Projects

P1Linear Regression from scratch: MSE loss, gradient descent, NumPy matrix ops. Must match sklearn within 0.001.

P2Logistic Regression + SGD: binary cross-entropy, mini-batch updates, learning rate schedule.

P3Decision Tree from scratch: information gain, CART algorithm, depth-limited pruning.

P4Neural Network: 3-layer MLP, backprop in NumPy, trained on tabular real-world data.

P5Midterm: Kaggle competition — custom pipeline only. No AutoML. Document every decision.

// What You'll Do

You engage with the mathematical foundations directly. Proofs, derivations, and published research papers are the currency. You'll implement autograd, understand PAC learning theory, and engage critically with current research on fairness and robustness. Amy Ko: "studying computing requires human-centered methods — even for theory."

🎯 Final: A research-grade project — novel research question, literature review, implementation, experiments, and a preprint-style write-up with critical analysis of limitations and societal implications.

// Key Projects

P1Derive MLE from first principles for Gaussian, Bernoulli, and multinomial distributions.

P2Prove gradient descent convergence for L-smooth, μ-strongly convex functions. Implement and verify empirically.

P3Implement mini-autograd (like Karpathy's micrograd). Extend to support arbitrary compute graphs.

P4Kernel methods: derive SVM dual, implement kernel trick, analyze RKHS properties.

P5Midterm: Paper replication — replicate a fairness-in-ML result from NeurIPS/ICML. Document every gap between paper and reality.

Core Concepts

The ML Stack —
From the Root Up

These are the foundational concepts we master across all 18 weeks. Every algorithm we study is a variation on these themes. If you understand these deeply, you can learn any new ML technique in days.

// 01

Loss Functions

J(θ) = (1/n)Σᵢ ℒ(yᵢ, f(xᵢ;θ))

Every learning algorithm is an optimization problem. The loss function is the compass. We study MSE, cross-entropy, hinge loss — not as recipes but as choices that encode assumptions about the world.

// 02

Gradient Descent

θ := θ - α · ∇J(θ)

The engine of learning. From batch to stochastic to mini-batch. Convergence guarantees. Adaptive methods. We build intuition first, then rigor. You'll understand why Adam works before you use it.

// 03

Bias-Variance Tradeoff

E[L] = Bias² + Variance + σ²

The central tension of all statistical learning. Underfitting, overfitting, regularization. Understanding why regularization works is more valuable than knowing how to call it.

// 04

Maximum Likelihood

θ̂_MLE = argmax P(data | θ)

Why do we minimize MSE for regression and cross-entropy for classification? Because they're the same thing: maximum likelihood estimation under different distributional assumptions. This unifies everything.

// 05

Probabilistic Thinking

P(y|x) = P(x|y)·P(y) / P(x)

All of ML is probability. Bayes, conditional independence, the generative vs. discriminative distinction. Ko: "understanding ML means understanding uncertainty." We make uncertainty visible, not hidden.

// 06

Feature Engineering

φ: x → ℝᵈ (good representation)

Often the highest-leverage skill in applied ML. What you feed the model matters more than which model you choose. And what features exist in a dataset encodes who was worth measuring.

// 07

Model Evaluation

CV Risk = (1/k)Σ ℒ(f̂₋ₛ, Dₛ)

Accuracy is a lie when classes are imbalanced. We study precision, recall, AUC, calibration, and fairness metrics — understanding why they conflict mathematically and ethically.

// 08

Neural Networks

h = σ(Wₗ·...·σ(W₁x + b₁) + bₗ)

Universal approximation, but at a cost. Backpropagation, activation functions, architectures. We implement from scratch before PyTorch. The abstraction is earned, not given.

// 09

Algorithmic Fairness

P(ŷ=1|A=0) ≠ P(ŷ=1|A=1)

Fairness is not one thing — it's a family of mathematical definitions that are mutually incompatible. Understanding why they conflict is the most important result in the course. There is no free lunch, ethically or mathematically.

Research-Grounded Pedagogy

What the Research
Actually Says

This course is designed in direct conversation with Dr. Amy Ko's 25 years of computing education research and Jeff Anderson's classroom practice. Every pedagogical choice has a justification.

Amy Ko Research

Read Before You Write

Ko's research shows that teaching program reading before program writing creates more robust understanding. We apply this to ML: you analyze, trace, and critique existing algorithms before implementing them. Understanding precedes production.

Amy Ko Research

CS Assessments Aren't Fair

Ko's work using psychometrics to study CS assessments found they often disadvantage students not because of knowledge gaps but because of assessment design. Portfolio + ungrading sidesteps this entirely — your evidence speaks, not your performance under pressure.

Amy Ko Research

Scaffolded Struggle Works

Ko's Gidget research: framing challenges as authentic practice (not failure) keeps learners engaged with deliberate, step-by-step process. Our "2-minute question rule" and tiered project scaffolding are direct applications of this finding.

Jeff Anderson

Conquering College

Jeff Anderson's framework for first-generation students: the meta-skills of college — how to ask for help, form study groups, manage time, communicate with instructors — are taught explicitly, not assumed. This is part of our curriculum.

Jeff Anderson

Deep Learning vs. Surface

Anderson distinguishes deep learning (understanding that transfers) from surface learning (passing the test). Every course design choice — first principles, projects over exams, portfolio over grades — is oriented toward deep learning that serves you in 10 years.

Freire + Ko

Justice-Focused CS Requires Agency

Ko's research found justice-focused CS education is empowering but requires student trust and agency. We build that trust through transparent course design, flexible deadlines, and making the power dynamics of the classroom explicit. You are a co-creator, not a consumer.

18-Week Calendar

The Journey,
Week by Week

We begin with the statistical foundations that unify all of ML — probability, distributions, estimation. We build upward through supervised, unsupervised, and deep learning. We end with the open problems: fairness, robustness, and alignment.

Theme & Question

Core Concepts + Math

Lab / Project

Unit 1 — Statistical Foundations: Probability, Estimation, Loss

01Unit 1

What Is Learning From Data?

The ML frame: inputs, outputs, loss, optimization. How does this differ from programming? Why is statistical learning hard?

Foundational

Probability Review + MLE Intro

Random variables, expectations, variance. MLE as a principle: find parameters that maximize probability of observed data. Gaussian MLE derived.

Math Root

Lab: Fit a Gaussian by Hand

Derive μ and σ² from MLE. Then use NumPy. Verify they match. Novice: visual interactive. Architect: full derivation with regularity conditions.

Project

02Unit 1

Supervised Learning Setup

Training set, test set, generalization. The fundamental question: why does training performance predict test performance? Introduce the i.i.d. assumption — and when it breaks.

Foundational

Loss Functions as Probabilistic Choices

MSE = MLE under Gaussian noise. Cross-entropy = MLE under Bernoulli. This is the unifying theorem. Understand it and every supervised algorithm makes sense.

Math Root

Lab: Loss Landscape Explorer

Visualize MSE and cross-entropy surfaces. Understand what "minimizing" means geometrically. Then: whose data was used to train a real model you interact with daily?

Critical

Unit 2 — Linear Methods: Regression, Classification, Regularization

03Unit 2

Linear Regression: The Foundation

The model: ŷ = wᵀx + b. MSE loss. Gradient derivation. Analytical solution (normal equations). The line between "fitting" and "overfitting." This is the root of all ML.

Foundational

Gradient Descent Deep Dive

Batch GD, SGD, mini-batch. Learning rate sensitivity. Convergence conditions. Novice: interactive visualizer. Builder: full NumPy SGD. Architect: prove convergence for L-smooth functions.

Math Root

Project: Rent Predictor

Predict Bay Area housing prices. From scratch before sklearn. Then: what variables does the model rely on? What does it encode about whose neighborhoods matter?

Critical

04Unit 2

Regularization: Bias vs. Variance

Ridge (L2) and Lasso (L1) as Bayesian priors. The bias-variance decomposition. Why more complex models aren't always better. Cross-validation as honest evaluation.

Foundational

Logistic Regression + Sigmoid

From regression to binary classification. The sigmoid function. Binary cross-entropy loss. Decision boundaries as hyperplanes. What does the model think the "boundary" means?

Lab: What Does "Accurate" Mean?

Precision, recall, F1, AUC-ROC, confusion matrix. Analyze a hiring classifier. Document where false positives vs. false negatives hit harder — and who bears the cost.

Critical

05Unit 2

Naive Bayes + Generative Models

The generative vs. discriminative distinction. Naive Bayes: where the "naive" assumption is, when it works, and when it catastrophically fails. Text classification.

Feature Engineering

Encoding categorical variables. Polynomial features. Normalization. The choices embedded in feature engineering. What gets measured, and why, encodes power.

Critical

Project: Spam/Sentiment Classifier

Build Naive Bayes text classifier. Test on text from your own cultural context (AAVE, Spanish, code-switching). Document failures. Propose a data collection fix.

Project

06Unit 2

Decision Trees + Ensembles

Information gain, CART. Random forests as variance reduction. Gradient boosting as sequential error correction. Interpretability vs. performance tension.

Model Selection + Pipelines

Hyperparameter tuning. Grid search. The multiple comparisons problem. Why your cross-validated results are probably still too optimistic.

Lab: Recidivism Prediction Audit

Analyze COMPAS-style decision tree on ProPublica data. Compute accuracy by race. Discuss: why do different fairness definitions conflict mathematically? What should policy do?

Critical

Unit 3 — Unsupervised Learning: Clustering, Dimensionality, Representation

07Unit 3

Learning Without Labels

Why is unsupervised learning harder to evaluate? K-means: Lloyd's algorithm, convergence, choosing k. The Elbow Method. What does it mean for the algorithm to "decide" what's similar?

Gaussian Mixture Models

Soft clustering. The EM algorithm as alternating optimization. K-means is a special case of GMM. Novice: visual. Builder: implement EM. Architect: derive ELBO.

Math Root

Lab: Market Segmentation

K-means on customer behavior data. Compare algorithm clusters to demographic patterns. Discuss: when does "segment" become "discriminate"?

Critical

08Unit 3

PCA + Dimensionality Reduction

The curse of dimensionality. PCA as variance maximization and as finding the data's "natural axes." SVD connection. What information is lost in compression?

Foundational

Word Embeddings + Representation

word2vec, GloVe — geometry of meaning. King - Man + Woman = Queen. The bias embedded in word vectors. Who decided what relationship was "analogous"?

Critical

Project: Embed Your Community

Train word embeddings on text from your community (Reddit, Twitter, local news). Visualize the geometry. Document what biases the algorithm learned and from where.

Project

// Midpoint — Portfolio Exhibition + Project Design Sprint

09Mid

Portfolio Exhibition

Community showcase. Present your first major project: the technical work, the critical analysis, and what you'd do differently. Peer + guest feedback.

Exhibition

Critical ML Guest Panel

Practitioner from healthcare, criminal justice, or hiring technology. Class debates: "What should ML never be allowed to decide?" Structured discussion, not lecture.

Critical

Final Project Pitch

Students pitch final project topics. Peer + instructor structured feedback. Learning partnerships form around shared domains. Begin iteration cycle.

Project

Unit 4 — Deep Learning: Neural Networks, Backpropagation, Architectures

10Unit 4

Neural Networks: The Full Picture

From linear regression to a 3-layer MLP in one step. Activation functions (sigmoid, ReLU, tanh) and why they exist. The universal approximation theorem — and its limits.

Foundational

Backpropagation: Every Step

Computational graphs. Forward pass. Chain rule applied. Backward pass. Gradient accumulation. Builder: NumPy. Architect: implement autograd from scratch. Novice: trace visually.

Math Root

Lab: MLP on Tabular Data

Implement a 3-layer neural network in NumPy. Train on a social issue dataset. Compare to logistic regression. Understand what the extra complexity buys you.

Project

11Unit 4

Convolutional Neural Networks

Translation invariance. Filters, pooling, feature maps. How spatial structure is exploited. From edge detection to face recognition — including its dangers.

Training Deep Networks

Vanishing gradients, batch normalization, dropout. Why deep learning is hard to train — and why modern tricks work. The gap between theory and practice.

Lab: Interpret What the Model Sees

Use saliency maps and Grad-CAM to visualize what a CNN is paying attention to. Does it match your expectations? Document the gap. Buolamwini: "Gender Shades" paper.

Critical

12Unit 4

Sequence Models + Transformers

RNNs, the vanishing gradient problem, LSTMs. Then self-attention as a solution. The transformer architecture — demystified. How does attention work mathematically?

Transfer Learning

Pre-training + fine-tuning. What gets transferred and why. The implicit world model in a pre-trained model. Who built that model, on whose data, for whose benefit?

Critical

Lab: Fine-Tune for Your Use Case

Fine-tune a small pre-trained model (BERT-tiny or similar) on a task meaningful to your community. Document what the pre-trained knowledge helps and hurts.

Project

Unit 5 — Advanced Topics: Fairness, Robustness, Bayesian Methods, RL

13Unit 5

Bayesian Machine Learning

Priors as assumptions. Posteriors as updated beliefs. Bayesian linear regression. The full Bayesian approach vs. MAP vs. MLE. When does being Bayesian matter?

Calibration + Uncertainty

A model that says "70% confident" should be right 70% of the time. Calibration plots. Temperature scaling. Ko: "uncertainty is the lesson." Most production ML ignores this.

Math Root

Lab: Is Your Model Calibrated?

Compute calibration curves for a medical prediction model. Analyze: what happens when an overconfident model informs a doctor's decision?

Critical

14Unit 5

Algorithmic Fairness

Demographic parity, equalized odds, predictive parity — and the impossibility theorem proving they can't all hold simultaneously. This is not a failure of engineering. It's a political choice.

Critical

Adversarial ML

FGSM attacks. The brittleness of deep learning. Certified robustness. Adversarial examples as a mirror: the model didn't learn what we thought it learned.

Foundational

Lab: Attack Your Own Model

Generate adversarial examples against a model you built earlier. Document what the attack reveals about what the model actually learned. Propose a defense and test it.

Project

15Unit 5

Reinforcement Learning Intro

MDPs, Q-learning, policy gradients. The reward function as a value system. Who defines the reward? Goodhart's Law. The alignment problem as an ML problem.

Human-in-the-Loop ML

Active learning, RLHF, annotation pipelines. Who are the annotators? What are they paid? Whose judgments define "correct" in supervised learning?

Critical

Lab: Design a Reward Function

Implement Q-learning on a gridworld. Design the reward yourself. Observe what the agent optimizes for. Reflect: what did you accidentally teach it to value?

Project

// Final Weeks — Synthesis, Exhibition, Celebration

16Final

ML in the Wild

The gap between research ML and production ML. Data pipelines, model monitoring, distribution shift. What happens when the world changes and the model doesn't know?

The Practitioner's Responsibility

Jeff Anderson: navigate vs. transform harmful systems. As a future ML practitioner, you will be asked to build things that harm people. How will you respond? What is your line?

Critical

Final Project Studio

Dedicated studio time. Learning conference check-ins. Peer feedback sessions. Document your process documentation, not just final results.

Project

17Final

Final Exhibition — Day 1

Community showcase. Present your complete project: the problem, the pipeline, the results, the failures, and the implications. Invited guests from the community.

Exhibition

Portfolio Learning Conferences

Individual meetings: you present your full portfolio and assign your grade with evidence. This is the most important evaluation moment of the course.

Portfolio Due

All projects, concept notes, critical essays, reflections, and final self-evaluation. The evidence of your learning journey, not just its destination.

18Finale

Final Exhibition — Day 2 + Celebration

Remaining presentations. Celebration of learning. Reflection on the semester. What does this course leave with you — and what do you leave in it?

Exhibition

Where Do You Go From Here?

Transfer pathways, careers in ethical ML, open-source contribution, communities doing justice-focused data work. You are a practitioner. Act accordingly.

Course Co-Evaluation

You evaluate the course. Your feedback co-creates the next version. Freire's dialogic education: the student teaches the teacher. This is that moment.

Portfolio + Ungrading

Evidence Over Performance.
Always.

Adapted from Jeff Anderson's ungrading practice. The goal is not to perform mastery for a grade. The goal is to build real, transferable understanding you can use in 10 years. Evidence of learning — not test scores — is the currency.

📂

Your Portfolio

A living document of your learning journey. Process over product. First attempts, failures, revised understanding — all included. Ko: "we study programming with human-centered methods." Apply that to your own learning.

All project code with documented thinking
Concept notes in your own language
Critical essays: who built this, for whom?
Bi-weekly learning reflections
Evidence of peer teaching + feedback given
"Conquering College" meta-learning log

🔁

Three Feedback Loops

You receive feedback from three sources. The instructor is the smallest source — mirroring professional practice where self-assessment and peer review are the primary quality signals.

Self-directed: 2-min question rule, first-principles checks
Peer: weekly learning partnerships + group sessions
Instructor: bi-weekly coaching conferences (not grading)
All feedback is specific and forward-looking, not evaluative

✍️

You Assign Your Grade

At the end of the course, you write a final self-evaluation with evidence from your portfolio. You assign your grade. The instructor reviews and recommends. If the evidence is there, it's confirmed. This is not a trick — it's how trust-based education works.

A: Deep engagement with all major concepts + strong projects
B: Solid understanding, some gaps, consistent effort
C: Surface coverage, significant gaps, limited project depth
No D/F for students who show up and engage honestly

// Learning Strategy

How to Succeed in ML

These aren't hacks — they're how people actually learn hard math and code. Especially if you're first-gen in a field that wasn't built for you.

Derive Before You Cite

Before writing θ := θ - α∇J, derive it. What is J? Why do we subtract? What does the gradient point toward? Derivation is the deepest form of understanding.

Implement Before Import

You cannot call sklearn.LinearRegression() until your NumPy version passes unit tests. This rule is the core of the course. Every abstraction is earned, not given.

Plot Everything

Loss curves. Decision boundaries. Confusion matrices. Calibration plots. In ML, seeing IS understanding. If you haven't plotted it, you haven't understood it yet.

Sanity Check with n=5

Before running on 10,000 rows, run on 5. Hand-verify every output. Does the loss go down? Are the shapes right? Bugs in ML fail silently — you have to check.

Speak the Math Aloud

Explain gradient descent out loud without writing a formula. If you can't, you don't understand it yet. Talk to a classmate. Talk to yourself. Verbalization catches gaps that code hides.

Ask Who Bears the Cost

Every time a model "works," ask: who was in the training set, and who wasn't? What does high accuracy hide? This question is not optional — it is part of the assignment.

// How It All Connects

The ML Concept Map

Every algorithm in this course connects back to the same root question: what does it mean for a machine to learn from data?

The Root

Loss Function J(θ)
↓ MLE Derivation
↓ Gradient Descent
↓ Every Supervised Model

Linear Methods

Linear Regression
↓ Logistic Regression
↓ Regularization
↓ SVMs / Decision Trees

Deep Learning

Perceptron → MLP
↓ Backpropagation
↓ CNNs / Transformers
↓ Transfer Learning

Unsupervised

K-Means / GMM
↓ EM Algorithm
↓ PCA / Embeddings
↓ Representation Bias

Evaluation

Bias-Variance Tradeoff
↓ Cross-Validation
↓ Calibration
↓ Fairness Metrics

Critical Thread

What data was used?
↓ Who is in the sample?
↓ What does it encode?
↓ Who bears the error?

// Real Data

Practice with Real Datasets

Every algorithm should be practiced on data that has real consequence. These datasets are curated for both technical learning and critical analysis.

ProPublica COMPAS

Recidivism prediction scores. Foundational for bias auditing, precision/recall analysis, and fairness impossibility theorem work.

→ Get Dataset

Census Income (Adult)

Binary classification benchmark. Predict income >50K. Rich in demographic features for fairness analysis across class, gender, race.

→ Get Dataset

ACS PUMS (Census Bureau)

American Community Survey. 50-state housing and income microdata. Much richer than Adult for intersectional analysis. Used in FolkTables benchmark.

→ Get Dataset

Spotify Tracks Dataset

~600K tracks with audio features (tempo, energy, valence). Excellent for clustering, regression, and recommendation system work.

→ Get Dataset

Hate Speech + Offensive Language

25K tweets labeled across three categories. NLP classification + critical: study how annotator demographics affect labeling.

→ Get Dataset

MIMIC-III (Clinical)

De-identified ICU health records from 46K patients. Requires credentialing. Excellent for discussing clinical AI bias and Obermeyer et al. (2019).

→ Get Dataset

// Mastery

Earn Your
Algorithm Badge

Complete units to unlock badges. Tracked locally — a record of your learning, not a grade.

🌱

Data Explorer

Complete 3+ units

🔨

ML Implementer

Complete 6+ units

🔬

ML Theorist

Complete all 9 units

Intro to Machine Learning

ML Isn't Neutral.Neither Is This Course.

Three Tracks.One Truth.

The ML Stack —From the Root Up

What the ResearchActually Says

The Journey,Week by Week

Evidence Over Performance.Always.

How to Succeed in ML

The ML Concept Map

Practice with Real Datasets

Earn YourAlgorithm Badge

Intro to
Machine
Learning

ML Isn't Neutral.
Neither Is This Course.

Three Tracks.
One Truth.

The ML Stack —
From the Root Up

What the Research
Actually Says

The Journey,
Week by Week

Evidence Over Performance.
Always.

Earn Your
Algorithm Badge