What the validator does
- C1 — Structural gate. Every driver must declare a typed claim citation:
Table N, row X, column Y, reported value Vplus the literal quoted text and PDF page number. Drivers that don't measure the paper's exact protocol must declareprotocol_match = "proxy"(microslice eval, community fine-tune, etc.) or"unknown"(measuring a metric the paper doesn't report). Only"exact"drivers can publish a public WRONG. - C2 — Textual gate. The validator fetches the cited paper's PDF, extracts text with
pdfjs-dist, and verifies that the cited quoted text appears within ±200 characters of the cited reported value. Citations that don't match the PDF — made-up, mis-cited, or category-confused — are rejected at C2. - Never upgrades. The validator can only downgrade. A driver that proposes
not_reproducedwith a non-exactprotocol gets downgraded topartial(orpendingfor unknown protocols). It never goes the other way.
See src/lib/verdict-validator.ts and src/lib/claim-citation.ts for the implementation. The 7-WRONG retraction post-mortem lives at docs/red-team/2026-05-13-summary.md.
Live numbers
Counts below are live queries against the production database — every verdict currently visible on the platform.
Protocol-match breakdown
A driver's protocol-match tier tells you how close the reproduction is to the paper's exact protocol. Only exact can publish a public WRONG.
| Tier | Verdicts | Share | Can publish WRONG? |
|---|---|---|---|
| exact | 3 | 5.5% | Yes — gated by C2. |
| proxy | 34 | 61.8% | No — auto-downgraded to PARTIAL. |
| unknown | 13 | 23.6% | No — auto-downgraded to PENDING. |
| — | 5 | 9.1% | Pre-validator rows (legacy / not_attempted-only drivers). |
Recent verdicts gated by the validator
Every row below has a populated claim_citation column — the driver passed through C1/C2 before the verdict landed.
| Paper | Status | Protocol | Run |
|---|---|---|---|
| 2 OLMo 2 Furious2501.00656 | PARTIAL | unknown | 2026-05-15 |
| RoBERTa: A Robustly Optimized BERT Pretraining Approach1907.11692 | REPRODUCED | proxy | 2026-05-15 |
| Mamba: Linear-Time Sequence Modeling with Selective State Spaces2312.00752 | REPRODUCED | proxy | 2026-05-15 |
| DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter1910.01108 | REPRODUCED | proxy | 2026-05-15 |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding1810.04805 | REPRODUCED | exact | 2026-05-15 |
| Stable LM 2 1.6B Technical Report2402.17834 | PARTIAL | unknown | 2026-05-15 |
| OLMoE: Open Mixture-of-Experts Language Models2409.02060 | PARTIAL | unknown | 2026-05-15 |
| DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning2501.12948 | PARTIAL | unknown | 2026-05-15 |
| SmolLM2: When Smol Goes Big — Data-Centric Training of a Small Language Model2502.02737 | PARTIAL | unknown | 2026-05-15 |
| Qwen2.5 Technical Report2412.15115 | PARTIAL | unknown | 2026-05-15 |
What the validator prevents
The 2026-05-13 red-team rollup identified 7 then-public WRONG verdicts as false positives. Every one was a citation problem — the paper headline the driver was comparing against was wrong (made-up, mis-cited Table, mis-cited row, or a category that the paper doesn't even report). The full audit lives at /legal/retractions. The validator's C1 and C2 gates are how the platform self- corrected: a driver cannot publish a WRONG today without a structurally-typed citation that has been verified against the actual PDF.
Three sibling surfaces complete the self-correction quadrant: /anchors (runtime drift probe across the execution stack), /legal/retractions (historical false positives, append-only log), and /skipped (refusal transparency — papers paperiswrong deliberately did not reproduce).
For developers
Every reproduction driver under scripts/run-reproduction-*.ts is required by a CI lint (tests/unit/scripts/validator-wiring-lint.test.ts) to wire the validator. Adding a new driver without a citation fails the build. Drivers that legitimately never publish a non-not_attempted verdict (closed-weights papers, retracted reproductions, image generation) are on an explicit allowlist with a written justification.
The public API exposes the citation and protocol-match columns on every verdict at /api/v1/verdicts and /api/v1/papers/:arxivId. The on-page ClaimCitationCard renders the same data inline with every verdict on /p/[arxivId].