BrittleBench Research Plan
This document is the navigational outline for the full project. The canonical, pre-registerable research protocol is ../PROTOCOL.md.
Phase R1 — Problem Definition
Reference: PROTOCOL.md Section 1
- R1.1 Problem statement — COMPLETED in PROTOCOL.md Section 1.1.
- R1.2 Why this matters — COMPLETED in PROTOCOL.md Section 1.2.
- R1.3 Prior work survey — COMPLETED in PROTOCOL.md Section 1.3.
- R1.4 Gap this study fills — COMPLETED in PROTOCOL.md Section 1.4.
- R1.5 Scope boundaries — COMPLETED in PROTOCOL.md Section 1.5.
Phase R2 — Research Questions
Reference: PROTOCOL.md Section 2
- R2.1 Primary research question — COMPLETED in PROTOCOL.md Section 2.1.
- R2.2 Secondary research questions — COMPLETED in PROTOCOL.md Section 2.2.
- R2.3 Question hierarchy — COMPLETED in PROTOCOL.md Section 2.3.
- R2.4 Falsifiability check — COMPLETED in PROTOCOL.md Section 2.4.
Phase R3 — Hypotheses
Reference: PROTOCOL.md Section 3
- R3.1 Hypotheses per research question — COMPLETED in PROTOCOL.md Section 3.1.
- R3.2 Null hypotheses — COMPLETED in PROTOCOL.md Section 3.2.
- R3.3 Expected effect sizes — COMPLETED in PROTOCOL.md Section 3.3.
- R3.4 Pre-registered predictions — COMPLETED in PROTOCOL.md Section 3.4.
Phase R4 — Definitions and Operationalization
Reference: PROTOCOL.md Section 4
- R4.1 Detection rule definition — COMPLETED in PROTOCOL.md Section 4.1.
- R4.2 Functional equivalence definition — COMPLETED in PROTOCOL.md Section 4.2.
- R4.3 Robustness score definition — COMPLETED in PROTOCOL.md Section 4.3.
- R4.4 Brittleness pattern definition — COMPLETED in PROTOCOL.md Section 4.4.
- R4.5 Unit of analysis — COMPLETED in PROTOCOL.md Section 4.5.
Phase R5 — Methodology Design
Reference: PROTOCOL.md Section 5
- R5.1 Methodological approach — COMPLETED in PROTOCOL.md Section 5.1.
- R5.2 Sampling strategy — COMPLETED in PROTOCOL.md Section 5.2.
- R5.3 Independent variables — COMPLETED in PROTOCOL.md Section 5.3.
- R5.4 Dependent variables — COMPLETED in PROTOCOL.md Section 5.4.
- R5.5 Control variables — COMPLETED in PROTOCOL.md Section 5.5.
- R5.6 Confounders and mitigation — COMPLETED in PROTOCOL.md Section 5.6.
- R5.7 Statistical methods — COMPLETED in PROTOCOL.md Section 5.7.
- R5.8 Estimation precision plan — COMPLETED in PROTOCOL.md Section 5.8.
Phase R6 — Evidence and Validation
Reference: PROTOCOL.md Section 6
- R6.1 Evidence standards — COMPLETED in PROTOCOL.md Section 6.1.
- R6.2 Internal validity threats and mitigations — COMPLETED in PROTOCOL.md Section 6.2.
- R6.3 External validity and generalizability — COMPLETED in PROTOCOL.md Section 6.3.
- R6.4 Construct validity — COMPLETED in PROTOCOL.md Section 6.4.
- R6.5 Reliability strategy — COMPLETED in PROTOCOL.md Section 6.5.
Phase R7 — Threats to Validity
Reference: PROTOCOL.md Section 7 and threats-to-validity.md
- R7.1 Conclusion validity threats — COMPLETED in PROTOCOL.md Section 7.1 and threats-to-validity.md.
- R7.2 Internal validity threats — COMPLETED in PROTOCOL.md Section 7.2 and threats-to-validity.md.
- R7.3 Construct validity threats — COMPLETED in PROTOCOL.md Section 7.3 and threats-to-validity.md.
- R7.4 External validity threats — COMPLETED in PROTOCOL.md Section 7.4 and threats-to-validity.md.
- R7.5 Ethical validity threats — COMPLETED in PROTOCOL.md Section 7.5 and threats-to-validity.md.
- R7.6 Replication threats — COMPLETED in PROTOCOL.md Section 7.6 and threats-to-validity.md.
Phase R8 — Ethics and Responsible Research
Reference: PROTOCOL.md Section 8
- R8.1 Defender benefit greater than attacker benefit — COMPLETED in PROTOCOL.md Section 8.1.
- R8.2 Disclosure approach — COMPLETED in PROTOCOL.md Section 8.2.
- R8.3 No novel-attack policy — COMPLETED in PROTOCOL.md Section 8.3.
- R8.4 Dataset sanitization — COMPLETED in PROTOCOL.md Section 8.4.
- R8.5 Tone policy — COMPLETED in PROTOCOL.md Section 8.5.
Phase R9 — Protocol Status
Reference: PROTOCOL.md Section 9
- R9.1 Lock status — COMPLETED; currently UNLOCKED until final review and preregistration.
- R9.2 Lock date — COMPLETED; not yet established, to be recorded on lock.
- R9.3 Falsification criteria — COMPLETED in PROTOCOL.md Section 9.3.
- R9.4 Public pre-registration link — COMPLETED as not yet submitted; URL to be inserted on lock.
Phase E1 — Corpus Collection
Execution-phase work. Do not start until PROTOCOL.md Section 9.1 reads LOCKED.
Phase E2 — Ground-Truth Sample Acquisition
Execution-phase work. Do not start until PROTOCOL.md Section 9.1 reads LOCKED.
Phase E3 — Mutation Generation
Execution-phase work. Do not start until PROTOCOL.md Section 9.1 reads LOCKED.
Phase E4 — Evaluation Pipeline
Execution-phase work. Do not start until PROTOCOL.md Section 9.1 reads LOCKED.
Phase A1 — Data Cleaning and Quality Checks
Analysis-phase work. Requires completed execution-phase artifacts.
Phase A2 — Statistical Analysis
Analysis-phase work. Must distinguish pre-registered confirmatory analyses from exploratory analyses.
Phase A3 — Findings and Paper Draft
Publication-preparation work. Findings must map back to pre-registered questions and hypotheses.
Phase P1 — Internal Review
Review phase for protocol adherence, reproducibility, and responsible disclosure obligations.
Phase P2 — External or Community Review
Review phase for feedback from trusted reviewers before public release.
Phase P3 — Dataset and Artifact Release
Release phase for sanitized public artifacts and any restricted-access procedure.
Phase P4 — Publication
Final publication phase, including paper, citation metadata, DOI updates, and changelog closure.