Skip to main content

BrittleBench Research Plan

This document is the navigational outline for the full project. The canonical, pre-registerable research protocol is ../PROTOCOL.md.

Phase R1 — Problem Definition

Reference: PROTOCOL.md Section 1

  • R1.1 Problem statement — COMPLETED in PROTOCOL.md Section 1.1.
  • R1.2 Why this matters — COMPLETED in PROTOCOL.md Section 1.2.
  • R1.3 Prior work survey — COMPLETED in PROTOCOL.md Section 1.3.
  • R1.4 Gap this study fills — COMPLETED in PROTOCOL.md Section 1.4.
  • R1.5 Scope boundaries — COMPLETED in PROTOCOL.md Section 1.5.

Phase R2 — Research Questions

Reference: PROTOCOL.md Section 2

  • R2.1 Primary research question — COMPLETED in PROTOCOL.md Section 2.1.
  • R2.2 Secondary research questions — COMPLETED in PROTOCOL.md Section 2.2.
  • R2.3 Question hierarchy — COMPLETED in PROTOCOL.md Section 2.3.
  • R2.4 Falsifiability check — COMPLETED in PROTOCOL.md Section 2.4.

Phase R3 — Hypotheses

Reference: PROTOCOL.md Section 3

  • R3.1 Hypotheses per research question — COMPLETED in PROTOCOL.md Section 3.1.
  • R3.2 Null hypotheses — COMPLETED in PROTOCOL.md Section 3.2.
  • R3.3 Expected effect sizes — COMPLETED in PROTOCOL.md Section 3.3.
  • R3.4 Pre-registered predictions — COMPLETED in PROTOCOL.md Section 3.4.

Phase R4 — Definitions and Operationalization

Reference: PROTOCOL.md Section 4

  • R4.1 Detection rule definition — COMPLETED in PROTOCOL.md Section 4.1.
  • R4.2 Functional equivalence definition — COMPLETED in PROTOCOL.md Section 4.2.
  • R4.3 Robustness score definition — COMPLETED in PROTOCOL.md Section 4.3.
  • R4.4 Brittleness pattern definition — COMPLETED in PROTOCOL.md Section 4.4.
  • R4.5 Unit of analysis — COMPLETED in PROTOCOL.md Section 4.5.

Phase R5 — Methodology Design

Reference: PROTOCOL.md Section 5

  • R5.1 Methodological approach — COMPLETED in PROTOCOL.md Section 5.1.
  • R5.2 Sampling strategy — COMPLETED in PROTOCOL.md Section 5.2.
  • R5.3 Independent variables — COMPLETED in PROTOCOL.md Section 5.3.
  • R5.4 Dependent variables — COMPLETED in PROTOCOL.md Section 5.4.
  • R5.5 Control variables — COMPLETED in PROTOCOL.md Section 5.5.
  • R5.6 Confounders and mitigation — COMPLETED in PROTOCOL.md Section 5.6.
  • R5.7 Statistical methods — COMPLETED in PROTOCOL.md Section 5.7.
  • R5.8 Estimation precision plan — COMPLETED in PROTOCOL.md Section 5.8.

Phase R6 — Evidence and Validation

Reference: PROTOCOL.md Section 6

  • R6.1 Evidence standards — COMPLETED in PROTOCOL.md Section 6.1.
  • R6.2 Internal validity threats and mitigations — COMPLETED in PROTOCOL.md Section 6.2.
  • R6.3 External validity and generalizability — COMPLETED in PROTOCOL.md Section 6.3.
  • R6.4 Construct validity — COMPLETED in PROTOCOL.md Section 6.4.
  • R6.5 Reliability strategy — COMPLETED in PROTOCOL.md Section 6.5.

Phase R7 — Threats to Validity

Reference: PROTOCOL.md Section 7 and threats-to-validity.md

  • R7.1 Conclusion validity threats — COMPLETED in PROTOCOL.md Section 7.1 and threats-to-validity.md.
  • R7.2 Internal validity threats — COMPLETED in PROTOCOL.md Section 7.2 and threats-to-validity.md.
  • R7.3 Construct validity threats — COMPLETED in PROTOCOL.md Section 7.3 and threats-to-validity.md.
  • R7.4 External validity threats — COMPLETED in PROTOCOL.md Section 7.4 and threats-to-validity.md.
  • R7.5 Ethical validity threats — COMPLETED in PROTOCOL.md Section 7.5 and threats-to-validity.md.
  • R7.6 Replication threats — COMPLETED in PROTOCOL.md Section 7.6 and threats-to-validity.md.

Phase R8 — Ethics and Responsible Research

Reference: PROTOCOL.md Section 8

  • R8.1 Defender benefit greater than attacker benefit — COMPLETED in PROTOCOL.md Section 8.1.
  • R8.2 Disclosure approach — COMPLETED in PROTOCOL.md Section 8.2.
  • R8.3 No novel-attack policy — COMPLETED in PROTOCOL.md Section 8.3.
  • R8.4 Dataset sanitization — COMPLETED in PROTOCOL.md Section 8.4.
  • R8.5 Tone policy — COMPLETED in PROTOCOL.md Section 8.5.

Phase R9 — Protocol Status

Reference: PROTOCOL.md Section 9

  • R9.1 Lock status — COMPLETED; currently UNLOCKED until final review and preregistration.
  • R9.2 Lock date — COMPLETED; not yet established, to be recorded on lock.
  • R9.3 Falsification criteria — COMPLETED in PROTOCOL.md Section 9.3.
  • R9.4 Public pre-registration link — COMPLETED as not yet submitted; URL to be inserted on lock.

Phase E1 — Corpus Collection

Execution-phase work. Do not start until PROTOCOL.md Section 9.1 reads LOCKED.

Phase E2 — Ground-Truth Sample Acquisition

Execution-phase work. Do not start until PROTOCOL.md Section 9.1 reads LOCKED.

Phase E3 — Mutation Generation

Execution-phase work. Do not start until PROTOCOL.md Section 9.1 reads LOCKED.

Phase E4 — Evaluation Pipeline

Execution-phase work. Do not start until PROTOCOL.md Section 9.1 reads LOCKED.

Phase A1 — Data Cleaning and Quality Checks

Analysis-phase work. Requires completed execution-phase artifacts.

Phase A2 — Statistical Analysis

Analysis-phase work. Must distinguish pre-registered confirmatory analyses from exploratory analyses.

Phase A3 — Findings and Paper Draft

Publication-preparation work. Findings must map back to pre-registered questions and hypotheses.

Phase P1 — Internal Review

Review phase for protocol adherence, reproducibility, and responsible disclosure obligations.

Phase P2 — External or Community Review

Review phase for feedback from trusted reviewers before public release.

Phase P3 — Dataset and Artifact Release

Release phase for sanitized public artifacts and any restricted-access procedure.

Phase P4 — Publication

Final publication phase, including paper, citation metadata, DOI updates, and changelog closure.