Henry Fan — Computational Education Research

Research Agenda

Three Questions. One Thread.

My research sits at the intersection of CS education, learning analytics, and human-centered computing. The connecting thread is structural equity: the idea that students don't leave STEM because they can't do the work, but because the structures around them were not designed with them in mind.

I'm building toward a research program that combines qualitative methods (interview studies grounded in Seymour & Hunter), quantitative learning analytics, and computational tool-building. These are early-stage questions — they need data, methods, and collaboration to answer well.

Target venues: SIGCSE · ICER · EDM · LAK · Learning @ Scale

Help-Seeking and Course Structure

What observable features of introductory CS course design — assignment framing, office hours culture, peer collaboration norms — predict whether students seek help when stuck? Do these predictors differ across demographic groups, and can they be measured from LMS and discussion forum data?

Methods: Learning analytics · LMS log analysis · Qualitative coding (Seymour & Hunter taxonomy)

Curriculum Structure and Student Departure

Can a graph-based representation of CS curriculum structure reveal which dependency patterns create avoidable confusion bottlenecks? Do community college CS curricula show structural features that research university curricula don't — and do those features predict DFW rates?

Methods: Curriculum graph analysis · Institutional data · NLP analysis of syllabi

Replicating Seymour & Hunter at Community Colleges

Do the departure reasons documented in Talking About Leaving Revisited — teaching quality, weed-out culture, belonging, help-seeking suppression — replicate at community colleges? Are there CC-specific departure pathways not captured in the original taxonomy?

Methods: Semi-structured interviews · Qualitative coding · Grounded theory · IRB study

Computational Tools for Instructional Auditing

Can NLP-based analysis of course materials reliably identify structural features associated with poor student outcomes — specifically features that suppress help-seeking or create motivational debt? Can such a tool be validated to the point of being useful to instructors?

Methods: NLP · Annotation studies · Tool validation · Instructional design literature

Where These Questions Come From

The Institution Seen from Every Angle

"The research isn't despite the student services work. The research is the student services work, made formal."

Most CS education researchers arrive at questions about student success from the outside — from data, from literature, from the lab. I arrived at them from inside the institution, from five different roles in which I watched the same students encounter the same invisible structural barriers from completely different vantage points.

In financial aid, I saw students leave STEM because a funding gap at the wrong moment made a hard class feel unsurvivable. In counseling, I heard students describe themselves as "not a math person" — a belief constructed from curricula that never explained why anything mattered. In the Teaching and Learning Center, I watched students spend hours on content their course objectives barely required. In learning communities and affinity groups, I saw students discover for the first time that their peers were struggling too — and that this shared knowledge was what let them stay.

The research questions I'm pursuing are not abstractions. They are formalized versions of things I watched happen to real students. I do not come to this research with the most advanced technical portfolio in my cohort. What I bring is an unusually complete picture of how students actually move through institutions — and an unusually strong conviction that the problems are structural, solvable, and worth a career.

Financial Aid

Watched students withdraw from STEM not from lack of ability but because financial precarity made any obstacle feel final. Learned that structural interventions outperform individual counseling every time.

→ Grounds the equity focus of Q1 and Q3

Academic Counseling

Heard students articulate fixed beliefs about their own mathematical ability — beliefs formed from curricula that never established why the material mattered. Introduced to Harel's necessity principle not from theory but from watching what happened when it was violated.

→ Motivational structure · Q2 and Q4

Teaching & Learning Center

Tutored students navigating content their course objectives barely required. Began asking: how much of what students are struggling with is actually necessary for where they are trying to go? This question shapes Q2 directly.

→ Curriculum structure · Q2

Learning Communities

Saw peer context transform students' relationship to difficulty. A student who believes they are uniquely confused is fragile. One who learns confusion is shared and structural becomes resilient. This is the mechanism behind help-seeking suppression that Q1 aims to measure.

→ Help-seeking behavior · Q1

Affinity Groups

Worked with students told in both explicit and subtle ways that STEM was not for them. Saw how belonging-based interventions changed persistence more reliably than content-based ones. This grounds the Seymour & Hunter replication in Q3.

→ STEM departure · Q3

"The students we lose are not students who couldn't do the work. They are students for whom the work was never designed."

This research program takes Seymour and Hunter's landmark finding seriously: departure from STEM correlates with teaching quality and structural factors, not academic ability. The computational question is whether we can build tools that make those structural factors visible and actionable for instructors.

The goal is not a better prediction model. The goal is curriculum and institutional design that builds students' belief in themselves, their tolerance for hard problems, and their capacity for deep learning — by design, not by accident.

Help-seeking Structural equity STEM persistence Learning analytics Open science

Flagship Projects · 2025–2027

Five research projects in various stages of development. Status is shown honestly: what exists, what's in progress, what's proposed. Each targets a specific venue.

■ PROPOSED ■ IN PROGRESS ■ SUBMITTED

Learning Analytics · NLP · LMS Data

PROPOSED · TARGET SIGCSE 2027

HelpMap: Detecting Help-Seeking Suppression in Introductory CS

What behavioral signals in LMS and discussion forum data predict whether students seek help when stuck — and do these signals differ by demographic group? This project builds a validated feature set for help-seeking suppression from Canvas logs and Piazza/Ed discussion data across 3–5 intro CS sections at Foothill College and SJSU. The technical core is NLP classification of question types combined with time-series analysis of help-seeking latency. The output is an open-source instructor dashboard.

▶ Research design details

Research Question

Which LMS behavioral features predict help-seeking suppression in CS1, and does the feature importance differ across first-generation, URM, and continuing-generation students?

Status

Proposed. IRB protocol in preparation. Data access conversation with Foothill College STEM faculty underway.

Methods

Retrospective LMS log analysis (Canvas)
NLP classification of Piazza/Ed posts by question type
Logistic regression + gradient boosting
Qualitative coding of stratified sample (Seymour & Hunter taxonomy)

Datasets

Foothill College CS course logs (IRB required)
Public PSLC DataShop CS datasets
Publicly available CS1 discussion forums

Technical Stack

Python: pandas, scikit-learn, HuggingFace
D3.js instructor dashboard
R for statistical analysis

Expected Contribution

Validated feature set for help-seeking suppression
Open-source dashboard for instructors
Extension of Seymour & Hunter to behavioral data

NLP · Static Analysis · Instructional Design

IN PROGRESS · TARGET LEARNING @ SCALE 2027

SyllabusAudit: NLP-Based Structural Analysis of CS Course Materials

Can automated analysis of syllabi and assignment descriptions reliably identify structural features associated with poor student outcomes? This project builds a corpus of 100+ CS1/CS2 syllabi, develops an annotation schema grounded in Harel's necessity principle, and fine-tunes a lightweight NLP classifier on expert annotations. The "Pedagogical Debt" idea on this site is the theoretical motivation — this project is the empirical grounding. Inter-rater reliability is measured on real annotated data, not simulated.

▶ Research design details

Research Question

Can automated analysis of CS course materials identify motivational structure features that predict student help-seeking rates, and can this analysis be validated against instructor expert judgment (Cohen's κ ≥ 0.65)?

Current Status

In progress. Corpus construction underway (47 syllabi collected). Annotation schema drafted. Expert annotator recruitment planned for Spring 2026.

Methods

Corpus: 100+ CS1/CS2 syllabi (public + IRB-approved)
Annotation schema based on necessity principle
Fine-tune BERT-class model on annotations
Validate against student outcome data

Annotation Scheme

Motivational debt: formalism before need
Scaffolding regularity: dependency load per unit
Verification opportunities: self-check affordances
Belonging signals: framing and example selection

Technical Stack

HuggingFace Transformers
Label Studio for annotation
Python NLP pipeline
R for inter-rater reliability

Open Science Plan

Annotation schema released as open standard
Annotated corpus released (IRB-permitting)
Model weights open-source
Web tool for instructor upload

Qualitative Research · Interview Study · STEM Departure

PROPOSED · TARGET ICER 2027

Why They Left: A Seymour & Hunter Replication at Community Colleges

Seymour and Hunter documented that students who leave STEM are not academically weaker than those who stay — they leave for structural reasons: teaching quality, weed-out culture, help-seeking suppression, belonging. Their landmark studies were conducted at research universities. This project asks whether those findings replicate at community colleges, which serve a demographically different population and a different institutional structure. 20–30 semi-structured interviews, qualitative coding against the S&H taxonomy, open coding for CC-specific themes.

▶ Research design details

Research Question

Do the departure reasons documented by Seymour & Hunter at research universities replicate at community colleges, and are there CC-specific structural departure pathways not captured in the original taxonomy?

Why This First

This is the empirical foundation the other projects need. Before building computational tools to address structural problems, we need to confirm which structural problems actually exist in this context. This is also the most IRB-feasible first study.

Methods

20–30 semi-structured interviews
Participants: students who left STEM at Foothill within 2 years
Coding against S&H taxonomy (deductive)
Open coding for emergent themes (inductive)
Grounded theory analysis + member-checking

Key Literature

Seymour & Hunter (1997, 2019)
Margolis & Fisher, Unlocking the Clubhouse
Walton & Brady on belonging
SEISMIC Consortium data
PERTS Mindset Meter

IRB Requirements

Full IRB protocol (not exempt)
Informed consent for recording
Participant confidentiality procedures
Deidentification before analysis

Expected Contribution

CC-specific extension of S&H taxonomy
Open coding manual for replication
Empirical grounding for P1 feature selection
Public dataset (deidentified transcripts, IRB-permitting)

Graph Theory · Curriculum Design · Tool Development

PROPOSED · TARGET SIGCSE 2027

CurriculumGraph: Mapping and Analyzing CS Dependency Structures

A graph-based representation of CS curriculum structure, built as an annotation tool for instructors. The theoretical motivation is the Typed Dependency Grammar framework developed in this research agenda — but this project approaches it empirically, not formally. Instructors annotate their curriculum; the tool computes structural statistics (longest path, fan-in/fan-out, cycle detection); preliminary correlations with DFW rates are examined. This is the empirical version of the TDG idea.

▶ Research design details

Research Question

Can a graph-based representation of CS curriculum dependency structure reveal bottleneck patterns that predict student confusion and DFW rates, and do these patterns vary between community colleges and research universities?

Methods

Annotation tool: React-based graph builder
Pilot with 5–10 instructors at Foothill and SJSU
Graph structural analysis (NetworkX)
Correlation with grade distribution and DFW rates

Technical Stack

React annotation interface
Python NetworkX for analysis
D3.js for visualization
Statistical analysis in R

Belonging · Course Design · Survey Methods

PROPOSED · TARGET ICER 2027

BelongingSignals: Coding Course Materials for Structural Belonging Features

Which observable features of introductory CS course design predict students' sense of belonging — and can those features be reliably coded from course materials alone? This project develops a coding scheme grounded in Walton & Brady's belonging literature and Margolis & Fisher's Unlocking the Clubhouse, applies it to a corpus of CS1 materials, and validates coding against belonging survey data from an IRB-approved prospective study.

▶ Research design details

Research Question

Which codeable features of CS1 course design predict student belonging scores (Walton 3-item scale), and can a reliable coding instrument be developed that instructors can use to audit their own materials?

Methods

Coding scheme development (Walton + Margolis & Fisher)
Corpus: CS1 materials from willing instructors
Prospective belonging survey (IRB-approved)
Validate coding against survey outcomes

Expected Output

Open coding instrument
Validated belonging feature set
Actionable instructor guidance
Connection to P1 (help-seeking) and P3 (departure)

Prototype Tools · Under Development

Three prototype tools that operationalize ideas from the research agenda above. These are proof-of-concept implementations — the theoretical frameworks they demonstrate need empirical validation against real data. They are offered here as interactive illustrations, not as validated instruments.

Drag nodes to rearrange · Click to inspect · All data is synthetic

04a

Curriculum Dependency Visualizer · Prototype for P4

A live Typed Dependency Grammar for an introductory linear algebra curriculum. Each node is a learning objective; each typed edge encodes one of four dependency relationships. This is a synthetic example — the empirical question in P4 is whether real instructor-annotated graphs show predictive patterns for student outcomes.

Drag nodes to rearrange · Click any node to inspect it · Click legend items to filter by dependency type

Dependency Types

Conceptual

Understanding A is logically necessary to understand B

Procedural

Fluency with A is needed to perform tasks in B

Motivational

Encountering A creates intellectual need for B

Show All

Reset all filters

Selected Node

Click any node in the graph to inspect its learning objective, type, and dependencies.

04b

Pedagogical Debt Analyzer · Prototype for P2

Paste any course description or syllabus excerpt below. This prototype scores it for motivational, scaffolding, and verification structure using heuristics grounded in Harel's necessity principle. Important: these are heuristic scores, not validated instrument outputs. P2 is the project to validate them.

Try the preset examples to see the difference between high-debt and low-debt instructional writing.

Course Material Input

Analysis will appear here after you paste text and click Analyze.

04c

Minimum Viable Curriculum Visualizer · Prototype for P4

Watch a greedy MVC approximation algorithm run step by step on a synthetic curriculum graph. The key theoretical result: finding the optimal MVC is NP-complete (polynomial reduction from Set Cover). This visualizer illustrates the algorithm on a small example. The research question in P4 is whether MVC-distance from real instructor-annotated graphs predicts student outcomes.

Step Forward to advance the algorithm one step at a time

Step 0 of 7

Algorithm State

Press "Step Forward" to begin the Minimum Viable Curriculum greedy algorithm. The red nodes are terminal objectives — what students must be able to do by the end.

Full Curriculum

—

MVC Size

Objectives Covered

—

MVC-Distance

Theorem (Paper P4 Proposal) MVC is NP-complete in general.
Reduction: Set Cover → MVC.

This greedy algorithm gives a ln|T|-approximation, matching the inapproximability lower bound.

The empirical question: does MVC-distance predict DFW rates in real curricula? That's what P4 aims to test.

Applied Curriculum · Community College CS

Three project designs for an introductory CS course — grounded in Jeff Anderson's necessity principle, built so students are authors rather than consumers. Each is designed to activate intrinsic motivation before introducing the technical tool.

Proj 01

HTML · CSS · JS · CommunityIntro CS · 4–6 weeks

Community Resource Aggregator

Students build a searchable map of local support resources for their own campus or city. The research memo comes before the code. The necessity hook: students survey three peers about resources they couldn't find.

▶ Expand project details

Necessity Hook

Students first survey 3 peers about resource gaps they've experienced.

The search problem exists before a line of code is written.

Anderson Criteria Met

Necessity · Authenticity
Active Learning
Low-floor / High-ceiling
No Paywall (all open APIs)

Technical Scope

HTML forms and DOM manipulation
Fetch API for reading a Google Sheet as JSON
Leaflet.js or Google Maps embed for map display
Basic search/filter logic in vanilla JS

The Deep Learning Layer

Students write a 1-page research memo before coding
Design decisions are documented and defended
Peer testing session mid-project with real users
Final reflection: what did the data reveal vs. what you assumed?

Proj 02

JavaScript · LocalStorage · Meta-LearningIntro CS · 3–5 weeks

Personal Learning System Builder

Students build a scheduling app, habit tracker, or question-capture system — tools that mirror the conquering college assignments in Anderson's own class. The meta dimension is the point: students simultaneously learn to code and reflect on how they learn.

▶ Expand project details

Necessity Hook

Week 1: students audit their current study system.

The gaps they find become the spec for what they build.

Connection to Research

Motivational debt ↓ when students build tools they actually need.

Self-efficacy measured before and after.

Technical Scope

HTML/CSS layout and responsive design
JS event handling and DOM updates
LocalStorage for data persistence
Optional: export to JSON or Google Sheets sync

The Deep Learning Layer

Students read one chapter of Ultralearning before building
They write a feature spec: what does this tool need to do and why?
Mid-project: use your own tool for one week, then refactor
Final: present the delta between v1 and v2 to the class

Proj 03

Data · APIs · Civic Tech · D3 or Chart.jsIntro CS · 5–7 weeks

Neighborhood Data Story

Students pick a local issue, collect or find public data, and build a visualization with a written narrative. Motivated directly by Anderson's "abuelita test": can you explain your findings to a non-technical family member?

▶ Expand project details

Necessity Hook

Students write a 1-paragraph argument about their issue using only their intuition.

Then they find the data. The gap between the two is the whole course.

Abuelita Test

Final deliverable must include an explanation a non-technical family member could read.

If they can't explain it, they don't understand it.

Technical Scope

Fetch API with a public dataset (Census, data.gov, local open data)
CSV or JSON parsing
Chart.js or D3.js for visualization
HTML/CSS narrative layout wrapping the charts

The Deep Learning Layer

Students must justify their data source selection in writing
At least one visualization must be revised after peer feedback
Written narrative required: what does this mean for real people?
Presentation to the class with Q&A — defend your choices

Theoretical Framework · Key Concepts

The conceptual vocabulary underlying the research agenda. These are theoretical constructs — some have roots in existing literature, some are proposed extensions. The empirical projects above are designed to test whether these constructs are measurable and useful.

P4 Proposal

Typed Dependency Grammar

A directed graph G = (L, D) encoding four types of curriculum dependency. The theoretical language for P4's empirical curriculum graph study.

P4 · P2

Motivational Reachability

Every advanced objective should be reachable via a chain of motivational edges. Formalizes Harel's necessity principle in graph terms.

All Projects

The Necessity Principle

"If math is the medicine, what is the headache?" From Harel — students must experience intellectual need before receiving the tool that answers it.

Pedagogical Debt

The accumulated cost to student understanding of instructional shortcuts. Named by analogy with technical debt — a theoretical construct awaiting empirical validation.

⊢

Static Analysis of Materials

Examining course materials without needing student data — just as a type checker examines source code without running the program.

Minimum Viable Curriculum

The smallest instructionally sufficient curriculum for a given terminal objective set. NP-complete to find optimally; greedy approximation available.

MVC-Distance

How far a real curriculum is from its minimum viable form. The empirical hypothesis: higher MVC-distance predicts lower persistence. Needs testing.

→?

Help-Seeking Suppression

The behavioral pattern in which students who need help do not seek it. Predicted by Seymour & Hunter to be structurally caused. P1 operationalizes this as a measurable LMS feature.

P3 · P5

Structural Belonging

The degree to which course design signals that students of varying backgrounds are expected to succeed. Distinct from interpersonal belonging — it is measurable from materials alone.

About the Researcher

I am a first-generation college student, McNair Scholar at San Jose State University, and Research Lead at the Foothill College Science Learning Institute. I am applying to PhD programs in CS education, learning sciences, and human-centered computing for Fall 2026.

Before this research, I spent years working in student services at Foothill College — financial aid, academic counseling, tutoring, learning communities, affinity groups — watching the same students encounter the same invisible structural barriers from every angle the institution offers. That experience is the origin of this work, not a detour from it.

My intellectual touchstones: Seymour & Hunter's Talking About Leaving Revisited, Jeff Anderson's applied linear algebra curriculum, and the SIGCSE community's sustained attention to who CS education is actually designed for. I am interested in research groups working at the intersection of learning analytics, qualitative CS education research, and tool-building for educational equity.

This work is dedicated to Jeff Anderson, whose Applied Linear Algebra Fundamentals textbook and twelve modeling criteria are evidence that the problem is solvable — and that solving it is worth a career.

"I do not come to this research with the most advanced technical portfolio. What I bring is an unusually complete picture of how students actually move through institutions."

Institution

San Jose State University · McNair Scholar

Research Role

Foothill College Science Learning Institute · Research Lead

Research Area

CS Education · Learning Analytics · STEM Persistence · Help-Seeking

Target Venues

SIGCSE · ICER · EDM · LAK · ACM Learning @ Scale

PhD Applications

UW Paul G. Allen · Georgia Tech · Stanford HAI · UMass Amherst · UC Irvine · CU Boulder

henry@henryfan.org

Download CV (PDF) →

// Research repository

git clone https://github.com/fansofhenry/cs-ed-research

HenryFan

Three Questions. One Thread.

Help-Seeking and Course Structure

Curriculum Structure and Student Departure

Replicating Seymour & Hunter at Community Colleges

Computational Tools for Instructional Auditing

The Institution Seen from Every Angle

HelpMap: Detecting Help-Seeking Suppression in Introductory CS

Methods

Datasets

Technical Stack

Expected Contribution

SyllabusAudit: NLP-Based Structural Analysis of CS Course Materials

Methods

Annotation Scheme

Technical Stack

Open Science Plan

Why They Left: A Seymour & Hunter Replication at Community Colleges

Methods

Key Literature

IRB Requirements

Expected Contribution

CurriculumGraph: Mapping and Analyzing CS Dependency Structures

Methods

Technical Stack

BelongingSignals: Coding Course Materials for Structural Belonging Features

Methods

Expected Output

Community Resource Aggregator

Technical Scope

The Deep Learning Layer

Personal Learning System Builder

Technical Scope

The Deep Learning Layer

Neighborhood Data Story

Technical Scope

The Deep Learning Layer

A Realistic Schedule

IRB Protocol + Corpus Construction

P3: Departure Interviews

P2: Annotation Study + P1: LMS Analysis

First Submissions

Dissertation Research

About the Researcher

Henry
Fan