What this is and isn't. The instance below is synthetic and small enough that brute force finishes instantly, so you can see greedy side-by-side with the true optimum. In P4 the real question is whether MVC-distance — the number of "redundant" modules in an actual curriculum beyond the greedy solution on an instructor-annotated objectives graph — predicts student persistence and grade outcomes. This tool exists to make the algorithmic setup concrete; it does not claim the metric is validated.

Universe — 10 learning objectives

Covered: 0 / 10

Candidate modules — each is a subset of objectives

    The algorithm, in three sentences

    While any objective is still uncovered, pick the module that covers the largest number of currently-uncovered objectives; add it to the solution; repeat. Ties are broken by lexicographic order on module name. That's it — the entire loop is four lines in any reasonable language.

    Why this is an approximation

    Set Cover is NP-complete (the canonical reduction is from Vertex Cover; see Karp 1972), and it is known to be hard to approximate better than (1 − ε) ln n unless P = NP (Feige 1998, Dinur & Steurer 2014). The greedy algorithm achieves exactly the H(n) ≤ ln n + 1 bound where n is the size of the universe, and on most "natural" instances it hits or matches the optimum. That tight theoretical bound is the reason greedy is the right baseline for MVC: any improvement over greedy, on a real curriculum, is strong evidence that the structure being exploited is pedagogically meaningful rather than an artifact of the algorithm.

    What MVC-distance would actually measure

    In P4 the research question is: given a real community-college CS curriculum annotated with learning objectives, how many modules in the current offering are not in the greedy-MVC solution? That gap is the curriculum's "redundant load" — content that, under the graph-coverage interpretation, could be removed without losing any learning objective. The hypothesis is that curricula with higher MVC-distance also show higher DFW rates, longer time-to-completion, and lower belonging scores, because redundant load falls heaviest on students with the least slack. The hypothesis could be wrong. That's what the project would measure.

    Limits of this visualizer