Every data structure is an argument about the world — a claim that some operations matter more than others. Every algorithm is a strategy, a tradeoff, a value judgment. We study them from the root and build something real with each one.
Every design choice in this course is grounded in research on how people actually learn to program — not how universities have traditionally tested programming. This means projects over exams, depth over breadth, and critical questions about why each data structure was invented.
Every major data structure culminates in a mini project where you build something real with it. A stack becomes a browser history system. A graph becomes a campus navigation app. You don't just implement — you deploy and reflect.
Same core concept every week, different depth and project scope. Novice Explorer focuses on building and understanding. Builder adds implementation rigor. Architect engages with proofs and research. Tracks are starting points, not ceilings.
Every new data structure: you trace through existing implementations and predict behavior before writing a single line. Ko's research shows this "read before write" approach dramatically improves debugging ability and structural understanding.
Who invented the hash table, and why? What social problem did the graph data structure first solve? Who gets credit for foundational algorithms — and who doesn't? Knowing the history makes the structures meaningful, not arbitrary.
No traditional exams. A living portfolio of every implementation, project, and reflection. You assign your grade at the end, with evidence. The 2-minute question rule: before asking for help, write a precise question that demonstrates what you already understand.
Ko's Gidget research: framing difficulty as authentic practice (not failure) keeps learners engaged. Each project has three layers: a working solution (required), an efficient solution (encouraged), and an "edge cases and analysis" layer (for those ready for it).
You've used arrays without thinking about what "contiguous memory" means. Here, we think about it. We build a dynamic array (like Python's list or Java's ArrayList) from scratch — understanding doubling strategy, amortized O(1) pushes, and why index access is fast but insertion is slow.
A simplified CSV processor — read a CSV file, store data in a 2D dynamic array, implement sort by any column, filter by value, compute column statistics. Why is a spreadsheet the right metaphor for an array? What are its limits?
An array is a neighborhood where houses have addresses in order. A linked list is a treasure hunt — each house has a note pointing to the next one. Fast at the front, slow everywhere else. We build singly-linked, doubly-linked, and circular. We ask: when is this actually better than an array?
Build a music player's playlist engine using a doubly-linked list. Support: play, skip, go back, shuffle (Fisher-Yates), loop. Why is a doubly-linked list perfect for "previous track"? What does array-based comparison reveal about tradeoffs?
You use stacks every time you hit Cmd-Z. Queues run every print job, task scheduler, and message broker. These abstract data types are everywhere — we implement them multiple ways, understand why the abstraction matters more than the implementation, and deploy them in something real.
Part 1: Build a command pattern text editor with full undo/redo using two stacks. Part 2: Build a priority task queue for a simulated hospital triage system. Why does triage require a priority queue, not a regular queue?
Hash tables are everywhere: Python dicts, Java HashMaps, database indexes, every set you've ever used. But how do they actually work? We build one from scratch, grapple with collisions (chaining vs. open addressing), understand load factors, and build something that makes O(1) real.
Build a text analysis tool that finds word frequency across a large text file (novel, song lyrics, congressional record) using your own hash table. Compare performance against a sorted array approach. Extend: detect plagiarism between two texts. Critical question: whose words get counted as "common"?
A binary search tree keeps data sorted for free — every insertion maintains the invariant. We implement from scratch, study all three traversals (and know WHY each traversal order matters), understand the degenerate case, and marvel at how balance changes everything.
Build a command-line file system explorer (like a mini tree command) using a BST to index all files. Support: search by name, sort by size, find all files of type X, print directory tree. How is a real file system different from a BST? What happens at scale?
A heap answers one question faster than anything else: "what's the minimum (or maximum)?" Not arbitrary access — just the best element. We build a binary heap stored in an array (the beautiful trick!), implement heapsort, and build a priority queue that solves a real scheduling problem.
Simulate an ER triage system. Patients arrive with severity scores. Your priority queue ensures the most critical patient is always seen next. Handle re-prioritization (decrease-key). Analyze: how does this system encode values about whose emergency "counts more"? What if severity scores encode bias?
Graphs model everything: social networks, road maps, the internet, recommendation systems, supply chains, biological pathways. We study adjacency matrix vs. list, directed vs. undirected, weighted vs. unweighted. Then we implement BFS, DFS, Dijkstra, and topological sort — understanding what each one finds and why.
Part 1: Build a campus (or city block) navigation system using Dijkstra's algorithm. Find shortest paths, identify bottlenecks. Part 2: Model a social network. Find connected components, detect communities, compute degree distribution. Critical question: what does a recommendation algorithm's graph encode about who connects to whom?
These structures solve problems that arrays, trees, and hash tables can't touch efficiently. A Trie makes autocomplete instant. Union-Find detects cycles in O(α(n)) — effectively O(1). A segment tree handles range queries in O(log n). We study them when we need them, not as abstract exercises.
Build a Trie-based autocomplete system trained on a large word list (English dictionary + a corpus meaningful to you — medical terms, cooking vocabulary, your language). Support prefix search, ranked suggestions by frequency, and deletion. Compare to a hash-based approach for prefix queries. Critical: whose vocabulary does the default word list center?
We implement and deeply understand six sorting algorithms: Bubble, Selection, Insertion, Merge, Quick, and Counting/Radix. Each one is not just an algorithm — it's a strategy with a story: who invented it, what problem it solved, and what it costs on different input shapes.
The information-theoretic lower bound is the jewel: no comparison sort can be faster than O(n log n) in the worst case. This is a theorem about the universe, not about programming skill. Understanding why it's true is one of the most beautiful moments in this course.
Build an animated sorting visualizer (web-based or terminal). Show each algorithm's steps. Then: given a real dataset (NBA stats, census data, your music library), benchmark all six. Document when each wins and loses. Present your findings as a one-page "algorithm recommendation guide" for a non-technical audience.
DP is not a data structure — it's a way of thinking. The key insight: if a problem has optimal substructure and overlapping subproblems, you can solve it once and cache the answer. We study DP through real problems: edit distance, longest common subsequence, 0/1 knapsack, coin change.
Every DP problem follows the same template: define your subproblem, find the recurrence, add memoization, convert to bottom-up tabulation. Once you see the pattern, it never leaves you.
Part 1: Build a spell checker using edit distance (Levenshtein) to suggest corrections. Part 2: Implement the Needleman-Wunsch algorithm for DNA sequence alignment. Both are DP. Critical question: the spell checker's "correct" spellings encode whose language norms — and this matters for correctness in medical, legal, and multilingual contexts.
We build from the ground up — arrays and memory first, then structure, then abstraction. Every unit culminates in a mini project exhibition. The final four weeks belong to your capstone: a substantial system built with your choice of structures.