Quiet day · no new POST verdicts in 24h

We tell you which ML papers
are wrong.

By running the code. Publicly. On every paper on arXiv.

“Wrong” is a technical term — our reproduction did not match this claim's reported numbers. See methodology →

Try:1810.04805 2312.00752 2407.21783

21k

papers indexed

186

reproductions run

0 public WRONG since the validator landed·0 of 54 verdicts claim-validated·0/10 anchors healthy

Latest verdicts

Most recent current POST verdicts, newest first.

See the Wall of Wrong →

2 OLMo 2 Furious

· arXiv 2024 · cs.CL

reported → reproduced— → pending

RoBERTa: A Robustly Optimized BERT Pretraining Approach

· arXiv preprint · cs.CL

reported → reproduced— → pending

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

· COLM 2024 · cs.LG

reported → reproduced— → pending

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

· NeurIPS 2019 EMC^2 Workshop · cs.CL

reported → reproduced— → pending

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

· NAACL 2019 · cs.CL

reported → reproduced— → pending

Stable LM 2 1.6B Technical Report

· arXiv 2024 · cs.CL

reported → reproduced— → pending

OLMoE: Open Mixture-of-Experts Language Models

· arXiv 2024 · cs.CL

reported → reproduced— → pending

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

· arXiv 2025 · cs.CL

reported → reproduced— → pending

SmolLM2: When Smol Goes Big — Data-Centric Training of a Small Language Model

· arXiv 2025 · cs.CL

reported → reproduced— → pending

Qwen2.5 Technical Report

· arXiv 2024 · cs.CL

reported → reproduced— → pending

▶

We run your code

Every reproduction starts from the official repo, runs in a clean Modal sandbox, and logs every command. No re-implementation.

⎈

Authors get the last word

Right of reply isn't a footer — it's a sidebar. Every verdict links to the author's response in the same view.

⌬

Evidence in one click

Each WRONG label is one click from the diff, the logs, and the reproduction job. Always.

The Wall of Wrong

A chronological feed of every paper that didn't reproduce on our reproduction job, sortable by venue, lab, and confidence. The signature page of paperiswrong.

Open the Wall →