50 verdicts on this pageWall of Wrong (full timeline) →
Earlier(50)
PARTIAL
2 OLMo 2 Furious
· arXiv 2024 · cs.CL
reported → reproduced— → 0.6627
conf 0.55
REPRODUCED
RoBERTa: A Robustly Optimized BERT Pretraining Approach
· arXiv preprint · cs.CL
reported → reproduced— → 0.9053
conf 0.85
REPRODUCED
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
· COLM 2024 · cs.LG
reported → reproduced— → 34.8765
conf 0.65
REPRODUCED
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
· NeurIPS 2019 EMC^2 Workshop · cs.CL
reported → reproduced— → 0.9150
conf 0.80
REPRODUCED
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
· NAACL 2019 · cs.CL
reported → reproduced— → 0.9467
conf 0.80
PARTIAL
Stable LM 2 1.6B Technical Report
· arXiv 2024 · cs.CL
reported → reproduced— → 0.6064
conf 0.55
PARTIAL
OLMoE: Open Mixture-of-Experts Language Models
· arXiv 2024 · cs.CL
reported → reproduced— → 0.6466
conf 0.55
PARTIAL
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
· arXiv 2025 · cs.CL
reported → reproduced— → 0.5482
conf 0.55
PARTIAL
SmolLM2: When Smol Goes Big — Data-Centric Training of a Small Language Model
· arXiv 2025 · cs.CL
reported → reproduced— → 0.6345
conf 0.55
PARTIAL
Qwen2.5 Technical Report
· arXiv 2024 · cs.CL
reported → reproduced— → 0.5562
conf 0.55
REPRODUCED
Yi: Open Foundation Models by 01.AI
· arXiv 2024 · cs.CL
reported → reproduced— → 0.7077
conf 0.80
PARTIAL
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
· arXiv 2024 · cs.CL
reported → reproduced— → 0.6948
conf 0.55
PARTIAL
Searching for MobileNetV3
· ICCV 2019 · cs.CV
reported → reproduced— → 0.9900
conf 0.50
REPRODUCED
Mistral 7B
· arXiv 2023 · cs.CL
reported → reproduced— → 0.7980
conf 0.80
PARTIAL
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
· COLM 2024 · cs.CL
reported → reproduced— → 0.2923
conf 0.55
PARTIAL
Gemma: Open Models Based on Gemini Research and Technology
· arXiv 2024 · cs.CL
reported → reproduced— → 0.6760
conf 0.60
PARTIAL
TinyLlama: An Open-Source Small Language Model
· arXiv 2024 · cs.CL
reported → reproduced— → 0.5710
conf 0.60
REPRODUCED
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
· ICLR 2023 · cs.CL
reported → reproduced— → 0.9134
conf 0.85
REPRODUCED
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
· arXiv 2022 · cs.CL
reported → reproduced— → 0.3433
conf 0.80
PARTIAL
Code Llama: Open Foundation Models for Code
· arXiv 2023 · cs.CL
reported → reproduced— → 1.7093
conf 0.60
PARTIAL
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
· arXiv 2024 · cs.SE
reported → reproduced— → 2.3338
conf 0.60
PARTIAL
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
· ICML 2023 · cs.CL
reported → reproduced— → 0.6166
conf 0.60
REPRODUCED
OPT: Open Pre-trained Transformer Language Models
· arXiv 2022 · cs.CL
reported → reproduced— → 0.5896
conf 0.80
PARTIAL
Swin Transformer V2: Scaling Up Capacity and Resolution
· CVPR 2022 · cs.CV
reported → reproduced— → 0.9076
conf 0.55
REPRODUCED
XLNet: Generalized Autoregressive Pretraining for Language Understanding
· NeurIPS 2019 · cs.CL
reported → reproduced— → 0.8721
conf 0.85
REPRODUCED
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
· EMNLP 2019 · cs.CL
reported → reproduced— → 0.8496
conf 0.80
PARTIAL
StarCoder: may the source be with you!
· arXiv 2023 · cs.CL
reported → reproduced— → 3.3910
conf 0.60
REPRODUCED
LoRA: Low-Rank Adaptation of Large Language Models
· ICLR 2022 · cs.CL
reported → reproduced— → 0.8900
conf 0.80
PARTIAL
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
· ICML 2021 · cs.CV
reported → reproduced— → 1.0000
conf 0.55
REPRODUCED
Qwen2 Technical Report
· arXiv 2024 · cs.CL
reported → reproduced— → 0.5175
conf 0.80
PARTIAL
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
· ICML 2019 · cs.LG
reported → reproduced— → 0.9859
conf 0.50
PARTIAL
The Falcon Series of Open Language Models
· arXiv 2023 · cs.CL
reported → reproduced— → 0.7540
conf 0.60
PARTIAL
Textbooks Are All You Need II: phi-1.5 technical report
· arXiv 2023 · cs.CL
reported → reproduced— → 0.7129
conf 0.60
PARTIAL
Big Bird: Transformers for Longer Sequences
· NeurIPS 2020 · cs.LG
reported → reproduced— → 153.5130
conf 0.50
PARTIAL
Scaling Instruction-Finetuned Language Models
· arXiv 2022 · cs.LG
reported → reproduced— → 0.4056
conf 0.60
PARTIAL
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
· ICML 2023 · cs.CV
reported → reproduced— → 0.3039
conf 0.60
REPRODUCED
DINOv2: Learning Robust Visual Features without Supervision
· TMLR 2024 · cs.CV
reported → reproduced— → 0.9967
conf 0.80
REPRODUCED
Emerging Properties in Self-Supervised Vision Transformers
· ICCV 2021 · cs.CV
reported → reproduced— → 0.9733
conf 0.80
PENDING
A ConvNet for the 2020s
· CVPR 2022 · cs.CV
reported → reproduced— → pending
conf —
PARTIAL
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
· ACL 2020 · cs.CL
reported → reproduced— → 0.4214
conf 0.60
REPRODUCED
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
· ICLR 2021 · cs.CL
reported → reproduced— → 0.9170
conf 0.85
REPRODUCED
Learning Transferable Visual Models From Natural Language Supervision
· ICML 2021 · cs.CV
reported → reproduced— → 0.8867
conf 0.75
REPRODUCED
Robust Speech Recognition via Large-Scale Weak Supervision
· arXiv preprint (Whisper) · cs.CL
reported → reproduced— → 0.0905
conf 0.75
REPRODUCED
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
· ICLR 2021 · cs.CV
reported → reproduced— → 0.9767
conf 0.80
PARTIAL
Deep Residual Learning for Image Recognition
· CVPR 2016 · cs.CV
reported → reproduced— → 0.4200
conf 0.45
REPRODUCED
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
· ICLR 2020 · cs.CL
reported → reproduced— → 0.8767
conf 0.80
REPRODUCED
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
· JMLR 2020 · cs.LG
reported → reproduced— → 0.8150
conf 0.80
REPRODUCED
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
· ICLR 2020 · cs.CL
reported → reproduced— → 0.9067
conf 0.80
REPRODUCED
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
· ACL 2020 · cs.CL
reported → reproduced— → 0.8383
conf 0.80
PENDING
Llama 2: Open Foundation and Fine-Tuned Chat Models
· arXiv preprint · cs.CL
reported → reproduced— → pending
conf —
“WRONG” is a technical term — it means a headline numerical claim did not reproduce on our reproduction job, within published tolerance. Authors have right of reply, prominently. Definition →