Research Projects
Five Projects, 2025–2027
Projects are listed in the order I intend to execute them. P3 (interviews) produces the conceptual foundation that P1 and P2 need. Status is shown honestly: what exists, what's in progress, what's proposed. No project claims results it hasn't earned yet.
P3 · Why They Left: A Seymour & Hunter Replication at Community Colleges
Seymour and Hunter documented that students who leave STEM are not academically weaker than those who stay — they leave for structural reasons: teaching quality, weed-out culture, help-seeking suppression, belonging. Their landmark studies were conducted at research universities. This project asks whether those findings replicate at community colleges, which serve a demographically different population within a different institutional structure.
Full research design
Research Question
Do the departure reasons documented by Seymour & Hunter at research universities replicate at community colleges, and are there CC-specific structural departure pathways not captured in the original taxonomy?
Why This First
This is the empirical foundation the other projects need. Before building computational tools to address structural problems, I need to confirm which structural problems actually exist in this context. This is also the most IRB-feasible first study.
Methods
20–30 semi-structured interviews with students who left STEM at Foothill College within 2 years. Coding against the Seymour & Hunter taxonomy (deductive) plus open coding for emergent themes (inductive). Grounded theory analysis with member-checking.
Key Literature
Seymour & Hunter (1997, 2019); Margolis & Fisher, Unlocking the Clubhouse; Walton & Brady on belonging; SEISMIC Consortium data; PERTS Mindset Meter.
IRB Requirements
Full IRB protocol (not exempt). Informed consent for recording. Participant confidentiality procedures. Deidentification before analysis.
Expected Contribution
CC-specific extension of the Seymour & Hunter taxonomy. Open coding manual for replication. Empirical grounding for P1 feature selection. Public dataset (deidentified transcripts, IRB-permitting).
Addresses Research Question 3
P2 · SyllabusAudit: NLP-Based Structural Analysis of CS Course Materials
Can automated analysis of syllabi and assignment descriptions reliably identify structural features associated with poor student outcomes? This project builds a corpus of 100+ CS1/CS2 syllabi, develops an annotation schema grounded in Harel's necessity principle, and trains a lightweight NLP classifier on expert annotations. The validation question is central: can this match expert judgment with sufficient reliability?
Full research design
Research Question
Can automated analysis of CS course materials identify motivational structure features that predict student help-seeking rates, and can this analysis be validated against instructor expert judgment (Cohen's κ ≥ 0.65)?
Current Status
Corpus construction underway: 47 syllabi collected as of February 2026. Annotation schema drafted. Expert annotator recruitment planned for Spring 2026.
Methods
Corpus of 100+ CS1/CS2 syllabi (public + IRB-approved). Annotation schema based on the necessity principle. Fine-tune a BERT-class model on expert annotations. Validate against student outcome data. Inter-rater reliability measured on real annotated data.
Annotation Scheme
Four dimensions: motivational debt (formalism introduced before student need), scaffolding regularity (dependency load per unit), verification opportunities (self-check affordances), and belonging signals (framing and example selection).
Technical Stack
HuggingFace Transformers, Label Studio for annotation, Python NLP pipeline, R for inter-rater reliability analysis.
Open Science Plan
Annotation schema released as open standard. Annotated corpus released (IRB-permitting). Model weights open-source. Web tool for instructor upload.
Addresses Research Question 4
P1 · HelpMap: Detecting Help-Seeking Suppression in Introductory CS
What behavioral signals in LMS and discussion forum data predict whether students seek help when stuck — and do these signals differ by demographic group? This project builds a validated feature set for help-seeking suppression from Canvas logs and discussion forum data across intro CS sections at Foothill College and SJSU.
Full research design
Research Question
Which LMS behavioral features predict help-seeking suppression in CS1, and does the feature importance differ across first-generation, URM, and continuing-generation students?
Analytical Approach
I hypothesize that help-seeking suppression will be detectable through three feature families: temporal patterns (latency between assignment access and first forum post), linguistic patterns (question specificity and hedging language in posts), and engagement patterns (ratio of reading to posting behavior). I chose logistic regression as the primary model for interpretability — the goal is instructor-actionable features, not prediction accuracy. Gradient boosting serves as a comparison to test whether nonlinear interactions among features substantially improve detection, which would suggest the underlying phenomenon is more complex than a linear model captures.
Datasets
Foothill College CS course logs (IRB required). Public PSLC DataShop CS datasets. Publicly available CS1 discussion forums.
Expected Contribution
Validated feature set for help-seeking suppression. Open-source instructor dashboard. Extension of Seymour & Hunter's qualitative findings to behavioral data.
Addresses Research Question 1. Feature selection informed by P3 interview findings.
P4 · CurriculumGraph: Mapping and Analyzing CS Dependency Structures
A graph-based representation of CS curriculum structure, built as an annotation tool for instructors. Instructors annotate their curriculum with typed dependencies; the tool computes structural statistics (longest path, fan-in/fan-out, bottleneck detection). The empirical question is whether these structural features predict DFW rates — and whether community college curricula show different structural signatures than research university curricula.
Full research design
Research Question
Can a graph-based representation of CS curriculum dependency structure reveal bottleneck patterns that predict student confusion and DFW rates, and do these patterns vary between community colleges and research universities?
Methods
React-based annotation tool for instructors. Pilot with 5–10 instructors at Foothill and SJSU. Graph structural analysis using NetworkX. Correlation with grade distribution and DFW rates.
Connection to Existing Work
Extends existing prerequisite-chain research (e.g., Auvinen et al.) by adding typed dependencies — conceptual, procedural, motivational, and social — which allow finer-grained analysis of where curricula create unnecessary confusion versus necessary productive struggle.
Addresses Research Question 2. Interactive prototype available on the prototypes page.
P5 · BelongingSignals: Coding Course Materials for Structural Belonging Features
Which observable features of introductory CS course design predict students' sense of belonging — and can those features be reliably coded from course materials alone? This project develops a coding scheme grounded in Walton & Brady's belonging literature and Margolis & Fisher's Unlocking the Clubhouse, applies it to CS1 materials, and validates coding against belonging survey data.
Full research design
Research Question
Which codeable features of CS1 course design predict student belonging scores (Walton 3-item scale), and can a reliable coding instrument be developed that instructors can use to audit their own materials?
Methods
Coding scheme development (Walton + Margolis & Fisher). Corpus of CS1 materials from willing instructors. Prospective belonging survey (IRB-approved). Validation of coding against survey outcomes.
Expected Output
Open coding instrument. Validated belonging feature set. Actionable instructor guidance. Connection to P1 (help-seeking) and P3 (departure).
Addresses Research Questions 1 and 3.
Last updated: March 2026
Curriculum Design Projects
In parallel with the empirical research above, I design and teach curriculum that operationalizes the same structural equity principles. These are not separate from the research — they are the intervention side of the same question, and each is a potential research site.
Curriculum Design · Constructionism · Completed
Build a Computer from Scratch: A 20-Week Cross-STEM Signature Project
Teams of community college students build a working 8-bit breadboard computer from logic gates. Seven learning science frameworks. Explicit STEM bridges to physics, discrete math, linear algebra, differential equations, and chemistry. Three-track agency system. Portfolio assessment with student-proposed grades. Designed as both a curriculum and a future research site for studying the effects of physical computing on belonging, help-seeking, and persistence in introductory CS.
Platform Design · Next.js · Supabase · Completed
ProjectBridge: A Project-Based Learning Platform for Community Colleges
A campus-wide platform where students discover, join, lead, and document meaningful projects. Includes project directory, collaborator marketplace, build journals, milestone tracking, weekly check-ins, and faculty dashboards. Designed with first-generation community college students as the primary user. Built with Next.js 14, TypeScript, Tailwind CSS, Supabase (PostgreSQL + Row Level Security). Deployed on Vercel.
Curriculum Framework · 6 Courses · 12 Pages · Completed
Teaching Computing Differently: A Six-Course CS Curriculum for Community Colleges
A complete curriculum framework: CS 180 (AI), CS 185 (ML), CS 210 (Data Structures), CS 175 (How Things Work), Math 2B (Linear Algebra), ENGR 11 (MATLAB). All courses: no exams, no required textbooks, portfolio assessment, three-track system, equity as design. Grounded in Papert, Ko, Anderson, Freire, hooks, Knuth, Noble, and Benjamin. Open-access.