Counters
Reproductions run
186
every POST verdict in the database
Papers indexed
11k
arXiv ingest
Verdicts (24h)
0
POST verdicts with computed_at > now − 24h
Reproduce rate
75%
(REPRODUCED + 0.5·PARTIAL) / total current decisive POST
Trust signals
The headline trust signal isn't the reproduce rate. It's how the platform handles its own errors.
- 7 retractions on file — read the full log (every retracted verdict preserved with date, reason, and audit thread). The 2026-05-13 rollup retraction of all seven then-public WRONG verdicts is the credibility test the platform actively chose to take.
- Every reproduction driver attaches a typed citation to the paper claim it's comparing against — see the audit thread at /legal/retractions/2026-05-13 for what happens without one.
- The headline verdict label WRONGis a technical term: “our reproduction did not match this paper's reported numbers.” See /methodology/wrong.
Recent verdicts (10)
- 2 OLMo 2 Furious
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Stable LM 2 1.6B Technical Report
- OLMoE: Open Mixture-of-Experts Language Models
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- SmolLM2: When Smol Goes Big — Data-Centric Training of a Small Language Model
- Qwen2.5 Technical Report
Programmatic access
Every number above is reachable through the public REST API:
GET /api/v1/verdicts?limit=100— paginated verdict listing withclaim_citation+protocol_matchon every row.GET /api/v1/papers/<arxivId>— per-paper detail including the most-recent POST + PRE verdicts.GET /api/v1/health— health-check.