Validator — paperiswrong

What the validator does

Pins one reviewed claim. A trusted policy identifies the exact arXiv paper, PDF page, table header, row, column, printed value, metric, and full quoted row. A loose number or sentence fragment is not enough.
Checks the real PDF locator. The validator extracts the cited PDF with pdfjs-dist, then requires the reviewed page, header, row, column position, quote, and value to agree. Values copied from a sibling row or column fail.
Binds the measured outcome. The v3 receipt records the job, paper version, agent version, protocol tier, target value, measured value, supported outcome, policy hash, citation hash, and PDF hash. It validates only while all of those fields still match.
Fails closed. Missing policy, changed locator, incomplete context, unsupported numeric outcome, or unreadable PDF becomes pending with no public score or citation. A validated proxy reproduction is shown conservatively as PARTIAL, and a proxy can never publish WRONG. Only a receipt-backed exact protocol can support NOT REPRODUCED.

See src/lib/verdict-validator.ts and src/lib/claim-citation.ts for the implementation. The 7-WRONG retraction post-mortem lives at docs/red-team/2026-05-13-summary.md.

Live numbers

Counts reflect current publicly visible POST rows. Rows without a public decisive result are separate from receipt-backed claim results.

Current POST verdicts

publicly visible, is_current=true

Receipt-backed results

No decisive claim results are currently public.

No public decisive result

pending, not attempted, or failed closed

Public WRONG results

current and receipt-backed

Protocol-match breakdown

Counts cover all current POST rows, including pending and not-attempted rows. A tier describes the intended protocol; it does not mean the claim passed receipt validation. Only exact can publish a public WRONG.

Tier	Verdicts	Share	Can publish WRONG?
exact	3	5.6%	Yes — with a valid claim-outcome-v3 receipt.
proxy	34	63.0%	No — auto-downgraded to PARTIAL.
unknown	13	24.1%	No — auto-downgraded to PENDING.
—	4	7.4%	No protocol tier recorded; commonly older or not attempted.

Recent receipt-backed claim results

Every row below carries a current, citation-bound claim_citation.validation receipt binding the reviewed PDF locator and measured outcome.

No receipt-backed claim results are currently public. Pending and not-attempted rows do not expose unchecked scores or citations.

What the validator prevents

The 2026-05-13 red-team rollup identified 8 then-public WRONG verdicts as false positives. Every one was a citation problem — the paper headline the driver was comparing against was wrong (made-up, mis-cited Table, mis-cited row, or a category that the paper doesn't even report). The full audit lives at /legal/retractions. Receipt-bound claim checks are how the platform self-corrected: a decisive claim cannot appear today unless its citation, PDF locator, job context, measurement, and outcome all validate together.

Three sibling surfaces complete the self-correction quadrant: /anchors (runtime drift probe across the execution stack), /legal/retractions (historical false positives, append-only log), and /skipped (refusal transparency — papers paperiswrong deliberately did not reproduce).

For developers

Every reproduction driver under scripts/run-reproduction-*.ts is required by a CI lint (tests/unit/scripts/validator-wiring-lint.test.ts) to wire the validator. Adding a new driver without a citation fails the build. Drivers that legitimately never publish a non-not_attempted verdict (closed-weights papers, retracted reproductions, image generation) are on an explicit allowlist. CI also proves each executable exemption is hard-disabled before credentials, Modal, or database code can run.

The public API exposes receipt-gated claim results at /api/v1/claims and receipt coverage at /api/v1/validator. The on-page ClaimCitationCard renders the same data inline with every verdict on /p/[arxivId].