Research Projects
Five Projects, 2025–2027
Projects are listed in the order I intend to execute them. P3 (interviews) produces the conceptual foundation that P1 and P2 need. Status is shown honestly: what exists, what's in progress, what's proposed. No project claims results it hasn't earned yet.
P3 · Why They Left: A Seymour & Hunter Replication at Community Colleges
Seymour and Hunter documented that students who leave STEM are not academically weaker than those who stay — they leave for structural reasons: teaching quality, weed-out culture, help-seeking suppression, belonging. Their landmark studies were conducted at research universities. This project asks whether those findings replicate at community colleges, which serve a demographically different population within a different institutional structure.
Full research design
Research Question
Do the departure reasons documented by Seymour & Hunter at research universities replicate at community colleges, and are there CC-specific structural departure pathways not captured in the original taxonomy?
Why This First
This is the empirical foundation the other projects need. Before building computational tools to address structural problems, I need to confirm which structural problems actually exist in this context. This is also the most IRB-feasible first study.
Methods
20–30 semi-structured interviews with students who left STEM at Foothill College within 2 years. Coding against the Seymour & Hunter taxonomy (deductive) plus open coding for emergent themes (inductive). Grounded theory analysis with member-checking.
Key Literature
Seymour & Hunter (1997, 2019); Margolis & Fisher, Unlocking the Clubhouse; Walton & Brady on belonging; SEISMIC Consortium data; PERTS Mindset Meter.
IRB Requirements
Full IRB protocol (not exempt). Informed consent for recording. Participant confidentiality procedures. Deidentification before analysis.
Expected Contribution
CC-specific extension of the Seymour & Hunter taxonomy. Open coding manual for replication. Empirical grounding for P1 feature selection. Public dataset (deidentified transcripts, IRB-permitting).
Addresses Research Question 3
P2 · SyllabusAudit: NLP-Based Structural Analysis of CS Course Materials
Can automated analysis of syllabi and assignment descriptions reliably identify structural features associated with poor student outcomes? This project builds a corpus of 100+ CS1/CS2 syllabi, develops an annotation schema grounded in Harel's necessity principle, and trains a lightweight NLP classifier on expert annotations. The validation question is central: can this match expert judgment with sufficient reliability?
Full research design
Research Question
Can automated analysis of CS course materials identify motivational structure features that predict student help-seeking rates, and can this analysis be validated against instructor expert judgment (Cohen's κ ≥ 0.65)?
Current Status
Corpus construction in early stages. A public-syllabus seed set is being assembled from the public course-listings of California community colleges; the full target is n = 120 with IRB clearance for the non-public portion. Annotation schema drafted against the Tool-2 rubric (four dimensions, rule-level audit trail). Annotator recruitment is planned for Spring 2026 once the seed set is stable.
Methods
Target corpus of ~120 CS1/CS2 syllabi (public + IRB-approved). Annotation schema grounded in Harel's necessity principle, the Karabenick / Newman help-seeking literature, and the Walton & Cohen belonging instrument. Three trained annotators score each syllabus on four dimensions against a written rubric; the rule-based Tool-2 analyzer is scored against the mean human score per dimension. Primary test: Cohen's κ ≥ 0.65 across all four dimensions, Spearman ρ ≥ 0.70 between rule-based and human scores. Pre-registration committed before annotation begins.
Annotation Scheme
Four dimensions: motivational debt (formalism introduced before student need), scaffolding regularity (dependency load per unit), verification opportunities (self-check affordances), and belonging signals (framing and example selection).
Technical Stack
HuggingFace Transformers, Label Studio for annotation, Python NLP pipeline, R for inter-rater reliability analysis.
Open Science Plan
Annotation schema released as open standard. Annotated corpus released (IRB-permitting). Model weights open-source. Web tool for instructor upload.
Addresses Research Question 4
P1 · HelpMap: Detecting Help-Seeking Suppression in Introductory CS
What behavioral signals in LMS and discussion forum data predict whether students seek help when stuck — and do these signals differ by demographic group? This project builds a validated feature set for help-seeking suppression from Canvas logs and discussion forum data across intro CS sections at Foothill College and SJSU.
Full research design
Research Question
Which LMS behavioral features predict help-seeking suppression in CS1, and does the feature importance differ across first-generation, URM, and continuing-generation students?
Analytical Approach
I hypothesize that help-seeking suppression will be detectable through three feature families: temporal patterns (latency between assignment access and first forum post), linguistic patterns (question specificity and hedging language in posts), and engagement patterns (ratio of reading to posting behavior). I chose logistic regression as the primary model for interpretability — the goal is instructor-actionable features, not prediction accuracy. Gradient boosting serves as a comparison to test whether nonlinear interactions among features substantially improve detection, which would suggest the underlying phenomenon is more complex than a linear model captures.
Datasets
Foothill College CS course logs (IRB required). Public PSLC DataShop CS datasets. Publicly available CS1 discussion forums.
Expected Contribution
Validated feature set for help-seeking suppression. Open-source instructor dashboard. Extension of Seymour & Hunter's qualitative findings to behavioral data.
Addresses Research Question 1. Feature selection informed by P3 interview findings.
P4 · CurriculumGraph: Mapping and Analyzing CS Dependency Structures
A graph-based representation of CS curriculum structure, built as an annotation tool for instructors. Instructors annotate their curriculum with typed dependencies; the tool computes structural statistics (longest path, fan-in/fan-out, bottleneck detection). The empirical question is whether these structural features predict DFW rates — and whether community college curricula show different structural signatures than research university curricula.
Full research design
Research Question
Can a graph-based representation of CS curriculum dependency structure reveal bottleneck patterns that predict student confusion and DFW rates, and do these patterns vary between community colleges and research universities?
Methods
React-based annotation tool for instructors. Pilot with 5–10 instructors at Foothill and SJSU. Graph structural analysis using NetworkX. Correlation with grade distribution and DFW rates.
Connection to Existing Work
Extends existing prerequisite-chain research (e.g., Auvinen et al.) by adding typed dependencies — conceptual, procedural, motivational, and social — which allow finer-grained analysis of where curricula create unnecessary confusion versus necessary productive struggle.
Addresses Research Question 2. Working interactive tool available on the tools page.
P5 · BelongingSignals: Coding Course Materials for Structural Belonging Features
Which observable features of introductory CS course design predict students' sense of belonging — and can those features be reliably coded from course materials alone? This project develops a coding scheme grounded in Walton & Brady's belonging literature and Margolis & Fisher's Unlocking the Clubhouse, applies it to CS1 materials, and validates coding against belonging survey data.
Full research design
Research Question
Which codeable features of CS1 course design predict student belonging scores (Walton 3-item scale), and can a reliable coding instrument be developed that instructors can use to audit their own materials?
Methods
Coding scheme development (Walton + Margolis & Fisher). Corpus of CS1 materials from willing instructors. Prospective belonging survey (IRB-approved). Validation of coding against survey outcomes.
Expected Output
Open coding instrument. Validated belonging feature set. Actionable instructor guidance. Connection to P1 (help-seeking) and P3 (departure).
Addresses Research Questions 1 and 3.
Last updated: April 2026
Curriculum Design Projects
In parallel with the empirical research above, I design curriculum that operationalizes the same structural equity principles. These are instructional designs, not courses I have taught as instructor of record — they are the intervention side of the same question as the research, and each is written to be a potential research site once implemented.
Curriculum Design · Constructionism · Completed
Build a Computer from Scratch: A 20-Week Cross-STEM Signature Project
Teams of community college students build a working 8-bit breadboard computer from logic gates. Seven learning science frameworks. Explicit STEM bridges to physics, discrete math, linear algebra, differential equations, and chemistry. Three-track agency system. Portfolio assessment with student-proposed grades. Designed as both a curriculum and a future research site for studying the effects of physical computing on belonging, help-seeking, and persistence in introductory CS.
Platform Design · Next.js · Supabase · Design artifact
ProjectBridge: A Project-Based Learning Platform for Community Colleges
A design artifact for a campus-wide platform where students would discover, join, lead, and document meaningful projects. The redesign document covers platform architecture, data models (users, projects, milestones, updates, collaborator requests), accountability systems, AI integration strategy, and a phased adoption playbook. Intended as the intervention side of the same structural-equity questions the research investigates — not a deployed product.
Curriculum Framework · 6 Courses · 12 Pages · Completed
Teaching Computing Differently: A Six-Course CS Curriculum for Community Colleges
A complete curriculum framework: CS 180 (AI), CS 185 (ML), CS 210 (Data Structures), CS 175 (How Things Work), Math 2B (Linear Algebra), ENGR 11 (MATLAB). All courses: no exams, no required textbooks, portfolio assessment, three-track system, equity as design. Grounded in Papert, Ko, Anderson, Freire, hooks, Knuth, Noble, and Benjamin. Open-access.