CS Education Research · McNair Scholar · 2025–2026

Henry
Fan

CS Education Researcher

I study why students leave STEM — and what computational tools can do about it. My work focuses on help-seeking behavior, persistence, and the structural features of introductory CS courses that shape whether students believe the field is for them.

SJSU McNair Scholar Foothill College Science Learning Institute · Research Lead Applying to PhD programs · Fall 2026

Central Research Questions

"What structural features of introductory CS courses predict whether students ask for help when they need it — and how does that vary by student background?"
Research focus · 2025–2026
Intellectual Influences
Seymour & Hunter, Talking About Leaving Revisited
Learning analytics · Educational data mining
Human-centered computing · SIGCSE · ICER
Harel's necessity principle · Jeff Anderson
01
Research Agenda

Three Questions. One Thread.

My research sits at the intersection of CS education, learning analytics, and human-centered computing. The connecting thread is structural equity: the idea that students don't leave STEM because they can't do the work, but because the structures around them were not designed with them in mind.

I'm building toward a research program that combines qualitative methods (interview studies grounded in Seymour & Hunter), quantitative learning analytics, and computational tool-building. These are early-stage questions — they need data, methods, and collaboration to answer well.

Target venues: SIGCSE · ICER · EDM · LAK · Learning @ Scale

Q1

Help-Seeking and Course Structure

What observable features of introductory CS course design — assignment framing, office hours culture, peer collaboration norms — predict whether students seek help when stuck? Do these predictors differ across demographic groups, and can they be measured from LMS and discussion forum data?

Methods: Learning analytics · LMS log analysis · Qualitative coding (Seymour & Hunter taxonomy)

Q2

Curriculum Structure and Student Departure

Can a graph-based representation of CS curriculum structure reveal which dependency patterns create avoidable confusion bottlenecks? Do community college CS curricula show structural features that research university curricula don't — and do those features predict DFW rates?

Methods: Curriculum graph analysis · Institutional data · NLP analysis of syllabi

Q3

Replicating Seymour & Hunter at Community Colleges

Do the departure reasons documented in Talking About Leaving Revisited — teaching quality, weed-out culture, belonging, help-seeking suppression — replicate at community colleges? Are there CC-specific departure pathways not captured in the original taxonomy?

Methods: Semi-structured interviews · Qualitative coding · Grounded theory · IRB study

Q4

Computational Tools for Instructional Auditing

Can NLP-based analysis of course materials reliably identify structural features associated with poor student outcomes — specifically features that suppress help-seeking or create motivational debt? Can such a tool be validated to the point of being useful to instructors?

Methods: NLP · Annotation studies · Tool validation · Instructional design literature

02
Where These Questions Come From

The Institution Seen from Every Angle

"The research isn't despite the student services work. The research is the student services work, made formal."

Most CS education researchers arrive at questions about student success from the outside — from data, from literature, from the lab. I arrived at them from inside the institution, from five different roles in which I watched the same students encounter the same invisible structural barriers from completely different vantage points.

In financial aid, I saw students leave STEM because a funding gap at the wrong moment made a hard class feel unsurvivable. In counseling, I heard students describe themselves as "not a math person" — a belief constructed from curricula that never explained why anything mattered. In the Teaching and Learning Center, I watched students spend hours on content their course objectives barely required. In learning communities and affinity groups, I saw students discover for the first time that their peers were struggling too — and that this shared knowledge was what let them stay.

The research questions I'm pursuing are not abstractions. They are formalized versions of things I watched happen to real students. I do not come to this research with the most advanced technical portfolio in my cohort. What I bring is an unusually complete picture of how students actually move through institutions — and an unusually strong conviction that the problems are structural, solvable, and worth a career.

Financial Aid
Watched students withdraw from STEM not from lack of ability but because financial precarity made any obstacle feel final. Learned that structural interventions outperform individual counseling every time.
Academic Counseling
Heard students articulate fixed beliefs about their own mathematical ability — beliefs formed from curricula that never established why the material mattered. Introduced to Harel's necessity principle not from theory but from watching what happened when it was violated.
Teaching & Learning Center
Tutored students navigating content their course objectives barely required. Began asking: how much of what students are struggling with is actually necessary for where they are trying to go? This question shapes Q2 directly.
Learning Communities
Saw peer context transform students' relationship to difficulty. A student who believes they are uniquely confused is fragile. One who learns confusion is shared and structural becomes resilient. This is the mechanism behind help-seeking suppression that Q1 aims to measure.
Affinity Groups
Worked with students told in both explicit and subtle ways that STEM was not for them. Saw how belonging-based interventions changed persistence more reliably than content-based ones. This grounds the Seymour & Hunter replication in Q3.
"The students we lose are not students who couldn't do the work. They are students for whom the work was never designed."

This research program takes Seymour and Hunter's landmark finding seriously: departure from STEM correlates with teaching quality and structural factors, not academic ability. The computational question is whether we can build tools that make those structural factors visible and actionable for instructors.

The goal is not a better prediction model. The goal is curriculum and institutional design that builds students' belief in themselves, their tolerance for hard problems, and their capacity for deep learning — by design, not by accident.

Help-seeking Structural equity STEM persistence Learning analytics Open science
03
Flagship Projects · 2025–2027

Five research projects in various stages of development. Status is shown honestly: what exists, what's in progress, what's proposed. Each targets a specific venue.

■ PROPOSED ■ IN PROGRESS ■ SUBMITTED

P1
Learning Analytics · NLP · LMS Data
PROPOSED · TARGET SIGCSE 2027

HelpMap: Detecting Help-Seeking Suppression in Introductory CS

What behavioral signals in LMS and discussion forum data predict whether students seek help when stuck — and do these signals differ by demographic group? This project builds a validated feature set for help-seeking suppression from Canvas logs and Piazza/Ed discussion data across 3–5 intro CS sections at Foothill College and SJSU. The technical core is NLP classification of question types combined with time-series analysis of help-seeking latency. The output is an open-source instructor dashboard.

▶  Research design details

Research Question

Which LMS behavioral features predict help-seeking suppression in CS1, and does the feature importance differ across first-generation, URM, and continuing-generation students?

Status

Proposed. IRB protocol in preparation. Data access conversation with Foothill College STEM faculty underway.

Methods

  • Retrospective LMS log analysis (Canvas)
  • NLP classification of Piazza/Ed posts by question type
  • Logistic regression + gradient boosting
  • Qualitative coding of stratified sample (Seymour & Hunter taxonomy)

Datasets

  • Foothill College CS course logs (IRB required)
  • Public PSLC DataShop CS datasets
  • Publicly available CS1 discussion forums

Technical Stack

  • Python: pandas, scikit-learn, HuggingFace
  • D3.js instructor dashboard
  • R for statistical analysis

Expected Contribution

  • Validated feature set for help-seeking suppression
  • Open-source dashboard for instructors
  • Extension of Seymour & Hunter to behavioral data
P2
NLP · Static Analysis · Instructional Design
IN PROGRESS · TARGET LEARNING @ SCALE 2027

SyllabusAudit: NLP-Based Structural Analysis of CS Course Materials

Can automated analysis of syllabi and assignment descriptions reliably identify structural features associated with poor student outcomes? This project builds a corpus of 100+ CS1/CS2 syllabi, develops an annotation schema grounded in Harel's necessity principle, and fine-tunes a lightweight NLP classifier on expert annotations. The "Pedagogical Debt" idea on this site is the theoretical motivation — this project is the empirical grounding. Inter-rater reliability is measured on real annotated data, not simulated.

▶  Research design details

Research Question

Can automated analysis of CS course materials identify motivational structure features that predict student help-seeking rates, and can this analysis be validated against instructor expert judgment (Cohen's κ ≥ 0.65)?

Current Status

In progress. Corpus construction underway (47 syllabi collected). Annotation schema drafted. Expert annotator recruitment planned for Spring 2026.

Methods

  • Corpus: 100+ CS1/CS2 syllabi (public + IRB-approved)
  • Annotation schema based on necessity principle
  • Fine-tune BERT-class model on annotations
  • Validate against student outcome data

Annotation Scheme

  • Motivational debt: formalism before need
  • Scaffolding regularity: dependency load per unit
  • Verification opportunities: self-check affordances
  • Belonging signals: framing and example selection

Technical Stack

  • HuggingFace Transformers
  • Label Studio for annotation
  • Python NLP pipeline
  • R for inter-rater reliability

Open Science Plan

  • Annotation schema released as open standard
  • Annotated corpus released (IRB-permitting)
  • Model weights open-source
  • Web tool for instructor upload
P3
Qualitative Research · Interview Study · STEM Departure
PROPOSED · TARGET ICER 2027

Why They Left: A Seymour & Hunter Replication at Community Colleges

Seymour and Hunter documented that students who leave STEM are not academically weaker than those who stay — they leave for structural reasons: teaching quality, weed-out culture, help-seeking suppression, belonging. Their landmark studies were conducted at research universities. This project asks whether those findings replicate at community colleges, which serve a demographically different population and a different institutional structure. 20–30 semi-structured interviews, qualitative coding against the S&H taxonomy, open coding for CC-specific themes.

▶  Research design details

Research Question

Do the departure reasons documented by Seymour & Hunter at research universities replicate at community colleges, and are there CC-specific structural departure pathways not captured in the original taxonomy?

Why This First

This is the empirical foundation the other projects need. Before building computational tools to address structural problems, we need to confirm which structural problems actually exist in this context. This is also the most IRB-feasible first study.

Methods

  • 20–30 semi-structured interviews
  • Participants: students who left STEM at Foothill within 2 years
  • Coding against S&H taxonomy (deductive)
  • Open coding for emergent themes (inductive)
  • Grounded theory analysis + member-checking

Key Literature

  • Seymour & Hunter (1997, 2019)
  • Margolis & Fisher, Unlocking the Clubhouse
  • Walton & Brady on belonging
  • SEISMIC Consortium data
  • PERTS Mindset Meter

IRB Requirements

  • Full IRB protocol (not exempt)
  • Informed consent for recording
  • Participant confidentiality procedures
  • Deidentification before analysis

Expected Contribution

  • CC-specific extension of S&H taxonomy
  • Open coding manual for replication
  • Empirical grounding for P1 feature selection
  • Public dataset (deidentified transcripts, IRB-permitting)
P4
Graph Theory · Curriculum Design · Tool Development
PROPOSED · TARGET SIGCSE 2027

CurriculumGraph: Mapping and Analyzing CS Dependency Structures

A graph-based representation of CS curriculum structure, built as an annotation tool for instructors. The theoretical motivation is the Typed Dependency Grammar framework developed in this research agenda — but this project approaches it empirically, not formally. Instructors annotate their curriculum; the tool computes structural statistics (longest path, fan-in/fan-out, cycle detection); preliminary correlations with DFW rates are examined. This is the empirical version of the TDG idea.

▶  Research design details

Research Question

Can a graph-based representation of CS curriculum dependency structure reveal bottleneck patterns that predict student confusion and DFW rates, and do these patterns vary between community colleges and research universities?

Methods

  • Annotation tool: React-based graph builder
  • Pilot with 5–10 instructors at Foothill and SJSU
  • Graph structural analysis (NetworkX)
  • Correlation with grade distribution and DFW rates

Technical Stack

  • React annotation interface
  • Python NetworkX for analysis
  • D3.js for visualization
  • Statistical analysis in R
P5
Belonging · Course Design · Survey Methods
PROPOSED · TARGET ICER 2027

BelongingSignals: Coding Course Materials for Structural Belonging Features

Which observable features of introductory CS course design predict students' sense of belonging — and can those features be reliably coded from course materials alone? This project develops a coding scheme grounded in Walton & Brady's belonging literature and Margolis & Fisher's Unlocking the Clubhouse, applies it to a corpus of CS1 materials, and validates coding against belonging survey data from an IRB-approved prospective study.

▶  Research design details

Research Question

Which codeable features of CS1 course design predict student belonging scores (Walton 3-item scale), and can a reliable coding instrument be developed that instructors can use to audit their own materials?

Methods

  • Coding scheme development (Walton + Margolis & Fisher)
  • Corpus: CS1 materials from willing instructors
  • Prospective belonging survey (IRB-approved)
  • Validate coding against survey outcomes

Expected Output

  • Open coding instrument
  • Validated belonging feature set
  • Actionable instructor guidance
  • Connection to P1 (help-seeking) and P3 (departure)
04
Prototype Tools · Under Development

Three prototype tools that operationalize ideas from the research agenda above. These are proof-of-concept implementations — the theoretical frameworks they demonstrate need empirical validation against real data. They are offered here as interactive illustrations, not as validated instruments.

Drag nodes to rearrange · Click to inspect · All data is synthetic

04a
Curriculum Dependency Visualizer · Prototype for P4

A live Typed Dependency Grammar for an introductory linear algebra curriculum. Each node is a learning objective; each typed edge encodes one of four dependency relationships. This is a synthetic example — the empirical question in P4 is whether real instructor-annotated graphs show predictive patterns for student outcomes.

Drag nodes to rearrange · Click any node to inspect it · Click legend items to filter by dependency type

Dependency Types
Conceptual
Understanding A is logically necessary to understand B
Procedural
Fluency with A is needed to perform tasks in B
Motivational
Encountering A creates intellectual need for B
Social
Collaboration around A creates context for B
Show All
Reset all filters

Selected Node

Click any node in the graph to inspect its learning objective, type, and dependencies.

04b
Pedagogical Debt Analyzer · Prototype for P2

Paste any course description or syllabus excerpt below. This prototype scores it for motivational, scaffolding, and verification structure using heuristics grounded in Harel's necessity principle. Important: these are heuristic scores, not validated instrument outputs. P2 is the project to validate them.

Try the preset examples to see the difference between high-debt and low-debt instructional writing.

Course Material Input
Analysis will appear here after you paste text and click Analyze.
04c
Minimum Viable Curriculum Visualizer · Prototype for P4

Watch a greedy MVC approximation algorithm run step by step on a synthetic curriculum graph. The key theoretical result: finding the optimal MVC is NP-complete (polynomial reduction from Set Cover). This visualizer illustrates the algorithm on a small example. The research question in P4 is whether MVC-distance from real instructor-annotated graphs predicts student outcomes.

Step Forward to advance the algorithm one step at a time

Step 0 of 7

Algorithm State

Press "Step Forward" to begin the Minimum Viable Curriculum greedy algorithm. The red nodes are terminal objectives — what students must be able to do by the end.

14
Full Curriculum
MVC Size
0
Objectives Covered
MVC-Distance
Theorem (Paper P4 Proposal) MVC is NP-complete in general.
Reduction: Set Cover → MVC.

This greedy algorithm gives a ln|T|-approximation, matching the inapproximability lower bound.

The empirical question: does MVC-distance predict DFW rates in real curricula? That's what P4 aims to test.
05
Applied Curriculum · Community College CS

Three project designs for an introductory CS course — grounded in Jeff Anderson's necessity principle, built so students are authors rather than consumers. Each is designed to activate intrinsic motivation before introducing the technical tool.

Proj 01
HTML · CSS · JS · CommunityIntro CS · 4–6 weeks

Community Resource Aggregator

Students build a searchable map of local support resources for their own campus or city. The research memo comes before the code. The necessity hook: students survey three peers about resources they couldn't find.

▶  Expand project details

Necessity Hook

Students first survey 3 peers about resource gaps they've experienced.

The search problem exists before a line of code is written.

Anderson Criteria Met

Necessity · Authenticity
Active Learning
Low-floor / High-ceiling
No Paywall (all open APIs)

Technical Scope

  • HTML forms and DOM manipulation
  • Fetch API for reading a Google Sheet as JSON
  • Leaflet.js or Google Maps embed for map display
  • Basic search/filter logic in vanilla JS

The Deep Learning Layer

  • Students write a 1-page research memo before coding
  • Design decisions are documented and defended
  • Peer testing session mid-project with real users
  • Final reflection: what did the data reveal vs. what you assumed?
Proj 02
JavaScript · LocalStorage · Meta-LearningIntro CS · 3–5 weeks

Personal Learning System Builder

Students build a scheduling app, habit tracker, or question-capture system — tools that mirror the conquering college assignments in Anderson's own class. The meta dimension is the point: students simultaneously learn to code and reflect on how they learn.

▶  Expand project details

Necessity Hook

Week 1: students audit their current study system.

The gaps they find become the spec for what they build.

Connection to Research

Motivational debt ↓ when students build tools they actually need.

Self-efficacy measured before and after.

Technical Scope

  • HTML/CSS layout and responsive design
  • JS event handling and DOM updates
  • LocalStorage for data persistence
  • Optional: export to JSON or Google Sheets sync

The Deep Learning Layer

  • Students read one chapter of Ultralearning before building
  • They write a feature spec: what does this tool need to do and why?
  • Mid-project: use your own tool for one week, then refactor
  • Final: present the delta between v1 and v2 to the class
Proj 03
Data · APIs · Civic Tech · D3 or Chart.jsIntro CS · 5–7 weeks

Neighborhood Data Story

Students pick a local issue, collect or find public data, and build a visualization with a written narrative. Motivated directly by Anderson's "abuelita test": can you explain your findings to a non-technical family member?

▶  Expand project details

Necessity Hook

Students write a 1-paragraph argument about their issue using only their intuition.

Then they find the data. The gap between the two is the whole course.

Abuelita Test

Final deliverable must include an explanation a non-technical family member could read.

If they can't explain it, they don't understand it.

Technical Scope

  • Fetch API with a public dataset (Census, data.gov, local open data)
  • CSV or JSON parsing
  • Chart.js or D3.js for visualization
  • HTML/CSS narrative layout wrapping the charts

The Deep Learning Layer

  • Students must justify their data source selection in writing
  • At least one visualization must be revised after peer feedback
  • Written narrative required: what does this mean for real people?
  • Presentation to the class with Q&A — defend your choices
07
Theoretical Framework · Key Concepts

The conceptual vocabulary underlying the research agenda. These are theoretical constructs — some have roots in existing literature, some are proposed extensions. The empirical projects above are designed to test whether these constructs are measurable and useful.

G
P4 Proposal
Typed Dependency Grammar
A directed graph G = (L, D) encoding four types of curriculum dependency. The theoretical language for P4's empirical curriculum graph study.
M
P4 · P2
Motivational Reachability
Every advanced objective should be reachable via a chain of motivational edges. Formalizes Harel's necessity principle in graph terms.
N
All Projects
The Necessity Principle
"If math is the medicine, what is the headache?" From Harel — students must experience intellectual need before receiving the tool that answers it.
D
P2
Pedagogical Debt
The accumulated cost to student understanding of instructional shortcuts. Named by analogy with technical debt — a theoretical construct awaiting empirical validation.
P2
Static Analysis of Materials
Examining course materials without needing student data — just as a type checker examines source code without running the program.
*
P4
Minimum Viable Curriculum
The smallest instructionally sufficient curriculum for a given terminal objective set. NP-complete to find optimally; greedy approximation available.
Δ
P4
MVC-Distance
How far a real curriculum is from its minimum viable form. The empirical hypothesis: higher MVC-distance predicts lower persistence. Needs testing.
→?
P1
Help-Seeking Suppression
The behavioral pattern in which students who need help do not seek it. Predicted by Seymour & Hunter to be structurally caused. P1 operationalizes this as a measurable LMS feature.
σ
P3 · P5
Structural Belonging
The degree to which course design signals that students of varying backgrounds are expected to succeed. Distinct from interpersonal belonging — it is measurable from materials alone.
08
Execution Roadmap · 2025–2027

A Realistic Schedule

Projects are sequenced so that P3 (interviews) produces the conceptual foundation that P1 and P2 need. No project claims results it hasn't earned yet.

Now — Spring 2026

IRB Protocol + Corpus Construction

Submit IRB for P3 interview study. Collect 100+ syllabi for P2 corpus. Develop annotation schema. PhD program applications.

Summer 2026

P3: Departure Interviews

Conduct 20–30 semi-structured interviews with students who left STEM at Foothill. Code against S&H taxonomy. Begin grounded theory analysis.

Fall 2026

P2: Annotation Study + P1: LMS Analysis

Recruit expert annotators for P2. Begin NLP classifier training. In parallel, analyze LMS data for P1 help-seeking feature extraction.

Spring 2027

First Submissions

Submit P3 (interview study) to ICER 2027. Submit P2 (SyllabusAudit) to Learning @ Scale 2027. Pilot P4 instructor annotation study.

PhD Program

Dissertation Research

Integrate P1–P5 into a coherent dissertation on structural predictors of help-seeking and STEM departure at community colleges. Advisor conversations underway.

About the Researcher

I am a first-generation college student, McNair Scholar at San Jose State University, and Research Lead at the Foothill College Science Learning Institute. I am applying to PhD programs in CS education, learning sciences, and human-centered computing for Fall 2026.

Before this research, I spent years working in student services at Foothill College — financial aid, academic counseling, tutoring, learning communities, affinity groups — watching the same students encounter the same invisible structural barriers from every angle the institution offers. That experience is the origin of this work, not a detour from it.

My intellectual touchstones: Seymour & Hunter's Talking About Leaving Revisited, Jeff Anderson's applied linear algebra curriculum, and the SIGCSE community's sustained attention to who CS education is actually designed for. I am interested in research groups working at the intersection of learning analytics, qualitative CS education research, and tool-building for educational equity.

This work is dedicated to Jeff Anderson, whose Applied Linear Algebra Fundamentals textbook and twelve modeling criteria are evidence that the problem is solvable — and that solving it is worth a career.

"I do not come to this research with the most advanced technical portfolio. What I bring is an unusually complete picture of how students actually move through institutions."
Institution
San Jose State University · McNair Scholar
Research Role
Foothill College Science Learning Institute · Research Lead
Research Area
CS Education · Learning Analytics · STEM Persistence · Help-Seeking
Target Venues
SIGCSE · ICER · EDM · LAK · ACM Learning @ Scale
PhD Applications
UW Paul G. Allen · Georgia Tech · Stanford HAI · UMass Amherst · UC Irvine · CU Boulder

// Research repository

git clone https://github.com/fansofhenry/cs-ed-research