By running the code. Publicly. On every paper on arXiv.
“Wrong” is a technical term — our reproduction did not match this claim's reported numbers. See methodology →
POST verdicts published in the last 24 hours.
Every reproduction starts from the official repo, runs in a clean Modal sandbox, and logs every command. No re-implementation.
Right of reply isn't a footer — it's a sidebar. Every verdict links to the author's response in the same view.
Each WRONG label is one click from the diff, the logs, and the reproduction job. Always.
A chronological feed of every paper that didn't reproduce on our reproduction job, sortable by venue, lab, and confidence. The signature page of paperiswrong.
Open the Wall →