Projects — Henry Fan

Proposed · Target ICER 2027 Qualitative Research · Interview Study · STEM Departure

P3 · Why They Left: A Seymour & Hunter Replication at Community Colleges

Seymour and Hunter documented that students who leave STEM are not academically weaker than those who stay — they leave for structural reasons: teaching quality, weed-out culture, help-seeking suppression, belonging. Their landmark studies were conducted at research universities. This project asks whether those findings replicate at community colleges, which serve a demographically different population within a different institutional structure.

Full research design

Research Question

Do the departure reasons documented by Seymour & Hunter at research universities replicate at community colleges, and are there CC-specific structural departure pathways not captured in the original taxonomy?

Why This First

This is the empirical foundation the other projects need. Before building computational tools to address structural problems, I need to confirm which structural problems actually exist in this context. This is also the most IRB-feasible first study.

Methods

20–30 semi-structured interviews with students who left STEM at Foothill College within 2 years. Coding against the Seymour & Hunter taxonomy (deductive) plus open coding for emergent themes (inductive). Grounded theory analysis with member-checking.

Key Literature

Seymour & Hunter (1997, 2019); Margolis & Fisher, Unlocking the Clubhouse; Walton & Brady on belonging; SEISMIC Consortium data; PERTS Mindset Meter.

IRB Requirements

Full IRB protocol (not exempt). Informed consent for recording. Participant confidentiality procedures. Deidentification before analysis.

Expected Contribution

CC-specific extension of the Seymour & Hunter taxonomy. Open coding manual for replication. Empirical grounding for P1 feature selection. Public dataset (deidentified transcripts, IRB-permitting).

Addresses Research Question 3

In Progress · Target Learning @ Scale 2027 NLP · Static Analysis · Instructional Design

P2 · SyllabusAudit: NLP-Based Structural Analysis of CS Course Materials

Can automated analysis of syllabi and assignment descriptions reliably identify structural features associated with poor student outcomes? This project builds a corpus of 100+ CS1/CS2 syllabi, develops an annotation schema grounded in Harel's necessity principle, and trains a lightweight NLP classifier on expert annotations. The validation question is central: can this match expert judgment with sufficient reliability?

Full research design

Research Question

Can automated analysis of CS course materials identify motivational structure features that predict student help-seeking rates, and can this analysis be validated against instructor expert judgment (Cohen's κ ≥ 0.65)?

Current Status

Corpus construction in early stages. A public-syllabus seed set is being assembled from the public course-listings of California community colleges; the full target is n = 120 with IRB clearance for the non-public portion. Annotation schema drafted against the Tool-2 rubric (four dimensions, rule-level audit trail). Annotator recruitment is planned for Spring 2026 once the seed set is stable.

Methods

Target corpus of ~120 CS1/CS2 syllabi (public + IRB-approved). Annotation schema grounded in Harel's necessity principle, the Karabenick / Newman help-seeking literature, and the Walton & Cohen belonging instrument. Three trained annotators score each syllabus on four dimensions against a written rubric; the rule-based Tool-2 analyzer is scored against the mean human score per dimension. Primary test: Cohen's κ ≥ 0.65 across all four dimensions, Spearman ρ ≥ 0.70 between rule-based and human scores. Pre-registration committed before annotation begins.

Annotation Scheme

Four dimensions: motivational debt (formalism introduced before student need), scaffolding regularity (dependency load per unit), verification opportunities (self-check affordances), and belonging signals (framing and example selection).

Technical Stack

HuggingFace Transformers, Label Studio for annotation, Python NLP pipeline, R for inter-rater reliability analysis.

Open Science Plan

Annotation schema released as open standard. Annotated corpus released (IRB-permitting). Model weights open-source. Web tool for instructor upload.

Addresses Research Question 4

Proposed · Target SIGCSE 2027 Learning Analytics · NLP · LMS Data

P1 · HelpMap: Detecting Help-Seeking Suppression in Introductory CS

What behavioral signals in LMS and discussion forum data predict whether students seek help when stuck — and do these signals differ by demographic group? This project builds a validated feature set for help-seeking suppression from Canvas logs and discussion forum data across intro CS sections at Foothill College and SJSU.

Full research design

Research Question

Which LMS behavioral features predict help-seeking suppression in CS1, and does the feature importance differ across first-generation, URM, and continuing-generation students?

Analytical Approach

I hypothesize that help-seeking suppression will be detectable through three feature families: temporal patterns (latency between assignment access and first forum post), linguistic patterns (question specificity and hedging language in posts), and engagement patterns (ratio of reading to posting behavior). I chose logistic regression as the primary model for interpretability — the goal is instructor-actionable features, not prediction accuracy. Gradient boosting serves as a comparison to test whether nonlinear interactions among features substantially improve detection, which would suggest the underlying phenomenon is more complex than a linear model captures.

Datasets

Foothill College CS course logs (IRB required). Public PSLC DataShop CS datasets. Publicly available CS1 discussion forums.

Expected Contribution

Validated feature set for help-seeking suppression. Open-source instructor dashboard. Extension of Seymour & Hunter's qualitative findings to behavioral data.

Addresses Research Question 1. Feature selection informed by P3 interview findings.

Proposed · Target SIGCSE 2027 Graph Theory · Curriculum Design · Tool Development

P4 · CurriculumGraph: Mapping and Analyzing CS Dependency Structures

A graph-based representation of CS curriculum structure, built as an annotation tool for instructors. Instructors annotate their curriculum with typed dependencies; the tool computes structural statistics (longest path, fan-in/fan-out, bottleneck detection). The empirical question is whether these structural features predict DFW rates — and whether community college curricula show different structural signatures than research university curricula.

Full research design

Research Question

Can a graph-based representation of CS curriculum dependency structure reveal bottleneck patterns that predict student confusion and DFW rates, and do these patterns vary between community colleges and research universities?

Methods

React-based annotation tool for instructors. Pilot with 5–10 instructors at Foothill and SJSU. Graph structural analysis using NetworkX. Correlation with grade distribution and DFW rates.

Connection to Existing Work

Extends existing prerequisite-chain research (e.g., Auvinen et al.) by adding typed dependencies — conceptual, procedural, motivational, and social — which allow finer-grained analysis of where curricula create unnecessary confusion versus necessary productive struggle.

Addresses Research Question 2. Working interactive tool available on the tools page.

Proposed · Target ICER 2027 Belonging · Course Design · Survey Methods

P5 · BelongingSignals: Coding Course Materials for Structural Belonging Features

Which observable features of introductory CS course design predict students' sense of belonging — and can those features be reliably coded from course materials alone? This project develops a coding scheme grounded in Walton & Brady's belonging literature and Margolis & Fisher's Unlocking the Clubhouse, applies it to CS1 materials, and validates coding against belonging survey data.

Full research design

Research Question

Which codeable features of CS1 course design predict student belonging scores (Walton 3-item scale), and can a reliable coding instrument be developed that instructors can use to audit their own materials?

Methods

Coding scheme development (Walton + Margolis & Fisher). Corpus of CS1 materials from willing instructors. Prospective belonging survey (IRB-approved). Validation of coding against survey outcomes.

Expected Output

Open coding instrument. Validated belonging feature set. Actionable instructor guidance. Connection to P1 (help-seeking) and P3 (departure).

Addresses Research Questions 1 and 3.

Last updated: April 2026

Curriculum Design Projects

In parallel with the empirical research above, I design curriculum that operationalizes the same structural equity principles. These are instructional designs, not courses I have taught as instructor of record — they are the intervention side of the same question as the research, and each is written to be a potential research site once implemented.

Curriculum Design · Constructionism · Completed

Build a Computer from Scratch: A 20-Week Cross-STEM Signature Project

Teams of community college students build a working 8-bit breadboard computer from logic gates. Seven learning science frameworks. Explicit STEM bridges to physics, discrete math, linear algebra, differential equations, and chemistry. Three-track agency system. Portfolio assessment with student-proposed grades. Designed as both a curriculum and a future research site for studying the effects of physical computing on belonging, help-seeking, and persistence in introductory CS.

Full project page → · Teaching philosophy →

Platform Design · Next.js · Supabase · Design artifact

ProjectBridge: A Project-Based Learning Platform for Community Colleges

A design artifact for a campus-wide platform where students would discover, join, lead, and document meaningful projects. The redesign document covers platform architecture, data models (users, projects, milestones, updates, collaborator requests), accountability systems, AI integration strategy, and a phased adoption playbook. Intended as the intervention side of the same structural-equity questions the research investigates — not a deployed product.

Read the full redesign →

Curriculum Framework · 6 Courses · 12 Pages · Completed

Teaching Computing Differently: A Six-Course CS Curriculum for Community Colleges

A complete curriculum framework: CS 180 (AI), CS 185 (ML), CS 210 (Data Structures), CS 175 (How Things Work), Math 2B (Linear Algebra), ENGR 11 (MATLAB). All courses: no exams, no required textbooks, portfolio assessment, three-track system, equity as design. Grounded in Papert, Ko, Anderson, Freire, hooks, Knuth, Noble, and Benjamin. Open-access.

Visit the curriculum site → · Teaching philosophy →

Five Projects, 2025–2027

Research Question

Why This First

Methods

Key Literature

IRB Requirements

Expected Contribution

Research Question

Current Status

Methods

Annotation Scheme

Technical Stack

Open Science Plan

Research Question

Analytical Approach

Datasets

Expected Contribution

Research Question

Methods

Connection to Existing Work

Research Question

Methods

Expected Output

Curriculum Design Projects