Before / after
Before — original verdict
WRONG
agent_version
v0.1.0-xlm-r-xnli-zeroshot
After — current verdict
PARTIAL
The reproduction was re-run under a corrected protocol and now sits at PARTIAL. Click through to the paper page for the live evidence + claim citation.
Why the original verdict was incorrect
The driver loaded joeddav/xlm-roberta-large-xnli, a checkpoint that is fine-tuned on XNLI-train plus MNLI. The 89.1 English-XNLI headline in the paper is a zero-shot transfer number from a model fine-tuned on English MNLI only. The driver was measuring a different model; the rolled-back row records the correct zero-shot protocol as a PARTIAL.
Evidence trail
- Audit thread — long-form post-mortem covering all seven 2026-05-13 retractions, including this one.
- Rollback PR — the GitHub pull request that landed the corrected driver and flipped the verdict row to
is_current=false. - All public retractions — the append-only retraction log under PRD §17.X.8(d).
- Verdict Validator — the C1/C2 gates that prevent this class of mistake from shipping again.
What changed structurally
The 2026-05-13 retraction rollup landed two structural fixes so the citation-side failure that caused the original incorrect verdict cannot ship the same way again:
- Typed claim citation per verdict. Every reproduction driver now declares a structured
CLAIM_CITATION(Table, row, column, reported value, quoted text, PDF page) before its Modal job runs. The original verdict on this paper was published against a non-citable headline — that path is now closed by the build-failing validator-wiring lint. - PDF-verified textual gate. The Verdict Validator fetches the cited paper's PDF and checks that the quoted text appears within ±200 characters of the cited reported value. Made-up, mis-cited, or category-confused citations fail this gate and the verdict is auto-downgraded.