Interactive Prototype · Project P4

Minimum Viable Curriculum Step-Through

Given a universe of 10 learning objectives and 8 candidate course modules, what is the smallest set of modules that covers every objective? This is the classic Set Cover problem in disguise. Set Cover is NP-complete, so finding the optimum on a real curriculum of hundreds of objectives is intractable — but the greedy approximation runs in near-linear time with a provable ln(n)+1 bound. Press Play to watch greedy pick one module at a time; the bar at the right shows the approximation gap against a brute-force optimum.

What this is and isn't. The instance below is synthetic and small enough that brute force finishes instantly, so you can see greedy side-by-side with the true optimum. In P4 the real question is whether MVC-distance — the number of "redundant" modules in an actual curriculum beyond the greedy solution on an instructor-annotated objectives graph — predicts student persistence and grade outcomes. This tool exists to make the algorithmic setup concrete; it does not claim the metric is validated.

Speed

Universe — 10 learning objectives

Covered: 0 / 10

Candidate modules — each is a subset of objectives

The algorithm, in three sentences

While any objective is still uncovered, pick the module that covers the largest number of currently-uncovered objectives; add it to the solution; repeat. Ties are broken by lexicographic order on module name. That's it — the entire loop is four lines in any reasonable language.

Why this is an approximation

Set Cover is NP-complete (the canonical reduction is from Vertex Cover; see Karp 1972), and it is known to be hard to approximate better than (1 − ε) ln n unless P = NP (Feige 1998, Dinur & Steurer 2014). The greedy algorithm achieves exactly the H(n) ≤ ln n + 1 bound where n is the size of the universe, and on most "natural" instances it hits or matches the optimum. That tight theoretical bound is the reason greedy is the right baseline for MVC: any improvement over greedy, on a real curriculum, is strong evidence that the structure being exploited is pedagogically meaningful rather than an artifact of the algorithm.

What MVC-distance would actually measure

In P4 the research question is: given a real community-college CS curriculum annotated with learning objectives, how many modules in the current offering are not in the greedy-MVC solution? That gap is the curriculum's "redundant load" — content that, under the graph-coverage interpretation, could be removed without losing any learning objective. The hypothesis is that curricula with higher MVC-distance also show higher DFW rates, longer time-to-completion, and lower belonging scores, because redundant load falls heaviest on students with the least slack. The hypothesis could be wrong. That's what the project would measure.

Limits of this visualizer

The instance is small. Set Cover is trivial at this scale; the visualization is for intuition, not for benchmarking.
The "coverage" relationship here is binary — a module either teaches an objective or it doesn't. Real curricula have partial coverage, spiral return, and prerequisite chains that this formulation ignores.
Greedy is deterministic given a tie-breaking rule. Real instructor choice is not, and the map between optimality and teachability is exactly the question P4 has to answer empirically.