Arrays. Trees. Graphs. Sorting. These aren't abstract puzzles. They're the tools behind every app, every search engine, every social media feed — and every surveillance system. We learn them by building things with them, for our communities.
A hash table doesn't just store data — it makes assumptions about what data is "equal." A priority queue doesn't just order items — it defines what "matters most." These are not neutral technical choices. They reflect values, and the people who program them carry those values forward.
This course treats each data structure as a mini-engineering challenge with a real-world application. You won't just implement a linked list. You'll build a music playlist system and ask: what happens when the playlist runs out? Who decides what plays next?
We draw on liberatory pedagogy throughout. The goal isn't to prepare you to pass a whiteboard interview — though you'll be ready for that too. The goal is to give you the tools to build technology that serves your community rather than exploits it.
No isolated drills. Every data structure is introduced through a real application you build and present.
Novice uses the structure as a library. Intermediate builds it from scratch. Advanced analyzes its tradeoffs and extends it.
Big-O is introduced through human stories — "What happens when your city doubles in size?" — before it's introduced as math.
Who does your algorithm hurt when it's slow? Who benefits from a fast lookup? Power is encoded in performance decisions.
Click any card to see the three-track project, complexity requirements, and key concepts for that data structure or algorithm.
Build a class election voting system using Python lists. Add candidates, count votes, announce results. Reflect: how does the order of the array affect who "wins" ties?
Implement a dynamic array in Python with manual resizing (double when full). Profile append performance. Plot amortized cost. Show why 2x growth achieves O(1) amortized.
Use ctypes to inspect Python list memory layout. Compare row-major vs column-major storage. Benchmark cache performance. Implement a numpy-style stride array with custom indexing.
Build a playlist app using Python's deque (which is a doubly linked list). Add songs, skip to next/prev, loop. Reflect: when a playlist "forgets" a song, what's actually happening in memory?
Implement Node class with prev/next pointers. Build full doubly linked list with insert, delete, search, reverse. Build a browser history system on top of it. Add LRU cache behavior.
Implement a skip list — a probabilistic linked list that achieves O(log n) search. Compare to binary search tree. Prove expected complexity. When would you use this over a BST?
Build a simple text editor with undo (stack) and a print queue simulation (queue). Run the classroom printer simulation: who gets printed first? Is FIFO fair? What if some jobs take 10x longer?
Implement a stack to check balanced parentheses, brackets, and braces in real code files. Extend to check valid HTML. Then implement a call stack simulator that traces recursive function calls.
Implement the "next greater element" and "largest rectangle in histogram" using a monotonic stack. Build a sliding window max using a deque. Profile space-time tradeoff against brute force.
Build a student directory using Python dicts. Add, remove, search students by ID. Now try: what if two students have the same ID? Run a "birthday paradox" demo with the class to feel collision probability.
Implement hash table with chaining and open addressing. Write your own hash function. Profile load factor vs performance. Build a word frequency counter on a 10MB text file. Beat Python dict on your custom benchmark.
Implement a Bloom filter for approximate set membership. Show space vs false positive tradeoff. Apply to a "safe browsing" use case. Then: how do browsers hash URLs to check them without sending them to Google?
Build a family tree with a visual tool. Navigate ancestor/descendant relationships. Run in-order traversal "by hand" on the board. Connect: how does a file system use a tree? Why is `/home/user` a "path"?
Implement a BST to store and search a library of books by ISBN. Write all three traversals. Build a visualization with matplotlib. Show what happens when you insert sorted data — and why trees go bad.
Implement BST with all operations. Then implement a simple decision tree classifier that shares the same node structure. Show: every decision tree in ML is a BST with rules instead of numbers. What does that mean for fairness?
Simulate an emergency room using Python's heapq. Patients arrive with severity scores. Use a priority queue to always treat the most critical first. Ask: is "most critical first" always fair? Who decides severity?
Implement a binary heap array in Python. Write sift-up and sift-down. Implement heapsort. Show that heapify is O(n) not O(n log n). Apply to a real-time event scheduler for a community calendar app.
Solve the "streaming median" problem using two heaps. Prove it achieves O(log n) per update. Apply to real-time income inequality analysis on a streaming dataset. Connect to Dijkstra's algorithm.
Draw a social graph of your class (who knows who from before). Run BFS by hand to find the "six degrees" path. Now: Facebook, Instagram, LinkedIn are all graphs. Who profits from knowing your edges?
Model Bay Area transit as a weighted graph using adjacency lists. Implement BFS, DFS, and Dijkstra. Find the shortest path between two points. Compare to Google Maps. Analyze who gets underserved routes.
Implement PageRank from scratch on a real Twitter/academic citation graph. Detect communities with BFS. Run cycle detection. Show how PageRank amplifies existing power. Read about its use in predictive policing.
Use a visual sorting simulator (sort.visualized.io). Race bubble sort vs merge sort on 1000 elements. Watch them. Time them. Ask: when does the order you start with change the winner? What's the "worst case" for quicksort?
Implement merge sort and quicksort from scratch. Apply to a real Census dataset. Sort by income, by zip code, by race. Profile with timeit. Show that sort stability matters when sorting by multiple keys.
Implement radix sort on integers and strings. Prove it beats the comparative lower bound. Profile on 10M elements. Show why radix sort isn't always "better." Write a rigorous proof of the comparative lower bound.
Play 20 questions / number guessing with optimal binary search. Visualize how each guess eliminates half the space. Apply to a real scenario: searching a dictionary. Count steps vs linear search. What does Google use?
Build a sorted patient database. Implement binary search. Handle: first occurrence, last occurrence, rotated sorted array. Use two-pointer to find all patients in an age range. Profile vs linear scan.
Solve "search on monotonic answer space" problems: minimize max, maximize min, k-th smallest in matrix. Prove these are binary search problems in disguise. Apply to a resource allocation problem for your community.
Given a limited community center budget, choose programs to maximize community impact. Solve by hand with a table. This IS the knapsack problem. Discover DP by solving it bottom-up on a whiteboard together.
Implement Longest Common Subsequence to build a git-diff style tool. Show "edit distance" between two versions of a document. Apply to DNA sequences. Visualize the DP table filling bottom-up.
Implement 5 classic DP problems (coin change, longest increasing subseq, word break, matrix chain, edit distance). For each: prove optimal substructure, derive recurrence, analyze complexity. When is DP overkill?
Build a trie-based autocomplete for a language of your choice using Python dicts. Add words from a corpus in your language. Type a prefix, get suggestions. What happens when your language has different alphabet rules?
Implement an AVL tree with rotations. Show that insertion always maintains O(log n) height. Compare to unbalanced BST on sorted input. Build a simple in-memory database index on top of it.
Implement a compressed (Patricia) trie. Build an inverted index for a small document corpus. Support prefix search, wildcard search, and ranked results. Compare memory vs standard trie. How does Elasticsearch do this at scale?
Plot O(1), O(log n), O(n), O(n²), O(2ⁿ) on a spreadsheet up to n=1000. Tell the story in plain English: "If your city doubles, how much longer does this take?" Connect to real apps you use every day.
Write a formal complexity analysis of every algorithm you've built this semester. For each: derive Big-O, give best/worst/average case, measure empirically and compare to theory. Produce a 2-page written report.
Prove that the Traveling Salesman Problem is NP-Hard by reduction from 3-SAT. Find 2 approximation algorithms. Apply to a real scheduling problem. Discuss: what problems in your domain might be NP-hard in disguise?
| Wk | Structure / Algo | Novice Project | Intermediate Project | Advanced Project | Deliverable |
|---|---|---|---|---|---|
01 | Big-O FoundationsWhy complexity matters | Speed story plotting | Profiling your code | Formal complexity proofs | |
02 | Arrays & Dynamic ArraysMemory, indexing, amortization | Voting tally system | ArrayList from scratch | Memory layout explorer | P1 DUE |
03 | Linked ListsNodes, pointers, traversal | Playlist app | Doubly linked list + LRU | Skip list implementation | |
04 | Stacks & QueuesLIFO, FIFO, scheduling | Undo/redo editor | Balanced brackets | Monotonic stack problems | P2 DUE |
05 | Hash TablesHash functions, collisions | Student directory | Hash table from scratch | Bloom filters + privacy | |
06 | Hash Tables LabLoad factors, privacy implications | Word frequency counter | Open addressing variant | Password hashing research | P3 DUE |
07 | Binary Trees & BSTsTraversals, BST property | Family tree visualizer | Book search BST | BST → decision tree bridge | |
08 | Heaps & Priority QueuesHeapify, scheduling | ER triage simulator | Heap from scratch | Median maintenance | P4 DUE |
09 | Sorting AlgorithmsMerge, quick, radix, stability | Sorting race visualizer | Sort the census | Beat O(n log n) | |
10 | Search AlgorithmsBinary search, two pointers | Guess my number | Medical record lookup | Binary search on answer | P5 DUE |
11 | Graphs IRepresentations, BFS, DFS | Six degrees campus | Transit BFS/DFS | Cycle detection + topo sort | |
12 | Graphs IIDijkstra, PageRank, social nets | Friend network map | Route optimizer Dijkstra | PageRank from scratch | P6 DUE |
13 | Dynamic Programming IMemoization, tabulation | Community budget planner | Text diff tool | DP complexity proofs | |
14 | Balanced Trees & TriesAVL, red-black, autocomplete | Cultural autocomplete | AVL tree implementation | Compressed trie + search engine | P7 DUE |
15 | Complexity Deep DiveAmortized, NP intro | Algorithm comparison deck | Full semester audit | NP-hardness reduction | |
16 | 🎓 Portfolio DayShowcase + celebrate | Community demo day | GitHub portfolio review | Research lightning talks | FINAL |
We use a portfolio model tied to contract grading. You choose your grade commitment at week 1. Projects are submitted in a GitHub portfolio — your permanent proof of skill that transfers anywhere you go.
12 project-based assessments, one per data structure/algorithm. Graded on correctness, reflection, and communication — not speed.
Written Big-O analysis for each project. Show your work. Prove, don't just state. Empirical benchmarking counts equally with theoretical proof.
An end-to-end system combining at least 3 data structures. Built for a real community need. Presented live with a code walkthrough.
4 written reflections asking: who does this data structure serve? What assumptions does it make? What community could build something better?
Priority queues make choices about who matters. Hash tables make choices about what "same" means. Sorting algorithms make choices about what "order" means. We name these choices explicitly so students can change them.
Based on Seymour Papert's constructionism and Buck Institute for Education PBL principles: you learn DSA by building real things with DSA. Each project is the lesson — not an application of the lesson.
Bay Area transit networks. Student election systems. Emergency room triage. These aren't cute examples — they're systems that affect the lives of students in this room. Real context = real motivation.
Every project is GitHub-hosted. Every complexity analysis is written. The portfolio approach aligns with SJSU, Cal Poly, UC Berkeley expectations. Students leave with a portfolio, not just a grade.
The three-track system isn't tracking students — it's giving every student access to the same powerful ideas at their own entry point. Students choose their depth. Students can move tracks anytime.
An algorithm that hits worst-case O(n²) isn't a failure — it's information. We build a classroom culture where debugging, wrong answers, and failed experiments are the most educational moments of the week.