CS Education Research · McNair Scholar · 2025–2026
CS Education Researcher
I study why students leave STEM — and what computational tools can do about it. My work focuses on help-seeking behavior, persistence, and the structural features of introductory CS courses that shape whether students believe the field is for them.
Central Research Questions
"What structural features of introductory CS courses predict whether students ask for help when they need it — and how does that vary by student background?"Research focus · 2025–2026
What observable features of introductory CS course design — assignment framing, office hours culture, peer collaboration norms — predict whether students seek help when stuck? Do these predictors differ across demographic groups, and can they be measured from LMS and discussion forum data?
Methods: Learning analytics · LMS log analysis · Qualitative coding (Seymour & Hunter taxonomy)
Can a graph-based representation of CS curriculum structure reveal which dependency patterns create avoidable confusion bottlenecks? Do community college CS curricula show structural features that research university curricula don't — and do those features predict DFW rates?
Methods: Curriculum graph analysis · Institutional data · NLP analysis of syllabi
Do the departure reasons documented in Talking About Leaving Revisited — teaching quality, weed-out culture, belonging, help-seeking suppression — replicate at community colleges? Are there CC-specific departure pathways not captured in the original taxonomy?
Methods: Semi-structured interviews · Qualitative coding · Grounded theory · IRB study
Can NLP-based analysis of course materials reliably identify structural features associated with poor student outcomes — specifically features that suppress help-seeking or create motivational debt? Can such a tool be validated to the point of being useful to instructors?
Methods: NLP · Annotation studies · Tool validation · Instructional design literature
Most CS education researchers arrive at questions about student success from the outside — from data, from literature, from the lab. I arrived at them from inside the institution, from five different roles in which I watched the same students encounter the same invisible structural barriers from completely different vantage points.
In financial aid, I saw students leave STEM because a funding gap at the wrong moment made a hard class feel unsurvivable. In counseling, I heard students describe themselves as "not a math person" — a belief constructed from curricula that never explained why anything mattered. In the Teaching and Learning Center, I watched students spend hours on content their course objectives barely required. In learning communities and affinity groups, I saw students discover for the first time that their peers were struggling too — and that this shared knowledge was what let them stay.
The research questions I'm pursuing are not abstractions. They are formalized versions of things I watched happen to real students. I do not come to this research with the most advanced technical portfolio in my cohort. What I bring is an unusually complete picture of how students actually move through institutions — and an unusually strong conviction that the problems are structural, solvable, and worth a career.
This research program takes Seymour and Hunter's landmark finding seriously: departure from STEM correlates with teaching quality and structural factors, not academic ability. The computational question is whether we can build tools that make those structural factors visible and actionable for instructors.
The goal is not a better prediction model. The goal is curriculum and institutional design that builds students' belief in themselves, their tolerance for hard problems, and their capacity for deep learning — by design, not by accident.
Five research projects in various stages of development. Status is shown honestly: what exists, what's in progress, what's proposed. Each targets a specific venue.
■ PROPOSED ■ IN PROGRESS ■ SUBMITTED
What behavioral signals in LMS and discussion forum data predict whether students seek help when stuck — and do these signals differ by demographic group? This project builds a validated feature set for help-seeking suppression from Canvas logs and Piazza/Ed discussion data across 3–5 intro CS sections at Foothill College and SJSU. The technical core is NLP classification of question types combined with time-series analysis of help-seeking latency. The output is an open-source instructor dashboard.
Research Question
Which LMS behavioral features predict help-seeking suppression in CS1, and does the feature importance differ across first-generation, URM, and continuing-generation students?
Status
Proposed. IRB protocol in preparation. Data access conversation with Foothill College STEM faculty underway.
Can automated analysis of syllabi and assignment descriptions reliably identify structural features associated with poor student outcomes? This project builds a corpus of 100+ CS1/CS2 syllabi, develops an annotation schema grounded in Harel's necessity principle, and fine-tunes a lightweight NLP classifier on expert annotations. The "Pedagogical Debt" idea on this site is the theoretical motivation — this project is the empirical grounding. Inter-rater reliability is measured on real annotated data, not simulated.
Research Question
Can automated analysis of CS course materials identify motivational structure features that predict student help-seeking rates, and can this analysis be validated against instructor expert judgment (Cohen's κ ≥ 0.65)?
Current Status
In progress. Corpus construction underway (47 syllabi collected). Annotation schema drafted. Expert annotator recruitment planned for Spring 2026.
Seymour and Hunter documented that students who leave STEM are not academically weaker than those who stay — they leave for structural reasons: teaching quality, weed-out culture, help-seeking suppression, belonging. Their landmark studies were conducted at research universities. This project asks whether those findings replicate at community colleges, which serve a demographically different population and a different institutional structure. 20–30 semi-structured interviews, qualitative coding against the S&H taxonomy, open coding for CC-specific themes.
Research Question
Do the departure reasons documented by Seymour & Hunter at research universities replicate at community colleges, and are there CC-specific structural departure pathways not captured in the original taxonomy?
Why This First
This is the empirical foundation the other projects need. Before building computational tools to address structural problems, we need to confirm which structural problems actually exist in this context. This is also the most IRB-feasible first study.
A graph-based representation of CS curriculum structure, built as an annotation tool for instructors. The theoretical motivation is the Typed Dependency Grammar framework developed in this research agenda — but this project approaches it empirically, not formally. Instructors annotate their curriculum; the tool computes structural statistics (longest path, fan-in/fan-out, cycle detection); preliminary correlations with DFW rates are examined. This is the empirical version of the TDG idea.
Research Question
Can a graph-based representation of CS curriculum dependency structure reveal bottleneck patterns that predict student confusion and DFW rates, and do these patterns vary between community colleges and research universities?
Which observable features of introductory CS course design predict students' sense of belonging — and can those features be reliably coded from course materials alone? This project develops a coding scheme grounded in Walton & Brady's belonging literature and Margolis & Fisher's Unlocking the Clubhouse, applies it to a corpus of CS1 materials, and validates coding against belonging survey data from an IRB-approved prospective study.
Research Question
Which codeable features of CS1 course design predict student belonging scores (Walton 3-item scale), and can a reliable coding instrument be developed that instructors can use to audit their own materials?
Three prototype tools that operationalize ideas from the research agenda above. These are proof-of-concept implementations — the theoretical frameworks they demonstrate need empirical validation against real data. They are offered here as interactive illustrations, not as validated instruments.
Drag nodes to rearrange · Click to inspect · All data is synthetic
A live Typed Dependency Grammar for an introductory linear algebra curriculum. Each node is a learning objective; each typed edge encodes one of four dependency relationships. This is a synthetic example — the empirical question in P4 is whether real instructor-annotated graphs show predictive patterns for student outcomes.
Drag nodes to rearrange · Click any node to inspect it · Click legend items to filter by dependency type
Selected Node
Click any node in the graph to inspect its learning objective, type, and dependencies.
Paste any course description or syllabus excerpt below. This prototype scores it for motivational, scaffolding, and verification structure using heuristics grounded in Harel's necessity principle. Important: these are heuristic scores, not validated instrument outputs. P2 is the project to validate them.
Try the preset examples to see the difference between high-debt and low-debt instructional writing.
Watch a greedy MVC approximation algorithm run step by step on a synthetic curriculum graph. The key theoretical result: finding the optimal MVC is NP-complete (polynomial reduction from Set Cover). This visualizer illustrates the algorithm on a small example. The research question in P4 is whether MVC-distance from real instructor-annotated graphs predicts student outcomes.
Step Forward to advance the algorithm one step at a time
Step 0 of 7
Press "Step Forward" to begin the Minimum Viable Curriculum greedy algorithm. The red nodes are terminal objectives — what students must be able to do by the end.
Three project designs for an introductory CS course — grounded in Jeff Anderson's necessity principle, built so students are authors rather than consumers. Each is designed to activate intrinsic motivation before introducing the technical tool.
Students build a searchable map of local support resources for their own campus or city. The research memo comes before the code. The necessity hook: students survey three peers about resources they couldn't find.
Necessity Hook
Students first survey 3 peers about resource gaps they've experienced.
The search problem exists before a line of code is written.
Anderson Criteria Met
Necessity · Authenticity
Active Learning
Low-floor / High-ceiling
No Paywall (all open APIs)
Students build a scheduling app, habit tracker, or question-capture system — tools that mirror the conquering college assignments in Anderson's own class. The meta dimension is the point: students simultaneously learn to code and reflect on how they learn.
Necessity Hook
Week 1: students audit their current study system.
The gaps they find become the spec for what they build.
Connection to Research
Motivational debt ↓ when students build tools they actually need.
Self-efficacy measured before and after.
Students pick a local issue, collect or find public data, and build a visualization with a written narrative. Motivated directly by Anderson's "abuelita test": can you explain your findings to a non-technical family member?
Necessity Hook
Students write a 1-paragraph argument about their issue using only their intuition.
Then they find the data. The gap between the two is the whole course.
Abuelita Test
Final deliverable must include an explanation a non-technical family member could read.
If they can't explain it, they don't understand it.
The conceptual vocabulary underlying the research agenda. These are theoretical constructs — some have roots in existing literature, some are proposed extensions. The empirical projects above are designed to test whether these constructs are measurable and useful.
Projects are sequenced so that P3 (interviews) produces the conceptual foundation that P1 and P2 need. No project claims results it hasn't earned yet.
Submit IRB for P3 interview study. Collect 100+ syllabi for P2 corpus. Develop annotation schema. PhD program applications.
Conduct 20–30 semi-structured interviews with students who left STEM at Foothill. Code against S&H taxonomy. Begin grounded theory analysis.
Recruit expert annotators for P2. Begin NLP classifier training. In parallel, analyze LMS data for P1 help-seeking feature extraction.
Submit P3 (interview study) to ICER 2027. Submit P2 (SyllabusAudit) to Learning @ Scale 2027. Pilot P4 instructor annotation study.
Integrate P1–P5 into a coherent dissertation on structural predictors of help-seeking and STEM departure at community colleges. Advisor conversations underway.
I am a first-generation college student, McNair Scholar at San Jose State University, and Research Lead at the Foothill College Science Learning Institute. I am applying to PhD programs in CS education, learning sciences, and human-centered computing for Fall 2026.
Before this research, I spent years working in student services at Foothill College — financial aid, academic counseling, tutoring, learning communities, affinity groups — watching the same students encounter the same invisible structural barriers from every angle the institution offers. That experience is the origin of this work, not a detour from it.
My intellectual touchstones: Seymour & Hunter's Talking About Leaving Revisited, Jeff Anderson's applied linear algebra curriculum, and the SIGCSE community's sustained attention to who CS education is actually designed for. I am interested in research groups working at the intersection of learning analytics, qualitative CS education research, and tool-building for educational equity.
This work is dedicated to Jeff Anderson, whose Applied Linear Algebra Fundamentals textbook and twelve modeling criteria are evidence that the problem is solvable — and that solving it is worth a career.
// Research repository
git clone https://github.com/fansofhenry/cs-ed-research