{"paper":{"arxiv_id":"2407.10671","title":"Qwen2 Technical Report","abstract":"This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. We have open-sourced the model weights on Hugging Face and ModelScope, and supplementary materials including example code on GitHub.","primary_category":"cs.CL","venue":"arXiv 2024","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"8a78a7f4-2d53-497c-852d-7ac24308ea5a","version":1,"source_url":"https://arxiv.org/abs/2407.10671","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"c3564366-baed-4a12-bd7b-f875ce0bd388","kind":"POST","status":"reproduced","score":0.5175175175175175,"confidence":0.8,"agent_version":"v0.1.0-qwen2-lambada-microslice","computed_at":"2026-05-14T23:39:10.283Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2407.10671","section":"HF Open LLM Leaderboard (external)","row":"Qwen2-0.5B","column":"LAMBADA OpenAI acc","reported_value":50,"reported_metric":"accuracy","quoted_text":"Qwen2","pdf_page":null,"notes":"The Qwen2 technical report (arXiv:2407.10671) does not separately report Qwen2-0.5B LAMBADA OpenAI accuracy. The 50.0 target is the public HF Open LLM Leaderboard probe value, used as a load-sanity check. PROTOCOL_MATCH is `unknown` — visitors should read this as a probe rather than a paper claim."},"protocol_match":"unknown"},"verdicts":{"post":{"id":"c3564366-baed-4a12-bd7b-f875ce0bd388","kind":"POST","status":"reproduced","score":0.5175175175175175,"confidence":0.8,"agent_version":"v0.1.0-qwen2-lambada-microslice","computed_at":"2026-05-14T23:39:10.283Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2407.10671","section":"HF Open LLM Leaderboard (external)","row":"Qwen2-0.5B","column":"LAMBADA OpenAI acc","reported_value":50,"reported_metric":"accuracy","quoted_text":"Qwen2","pdf_page":null,"notes":"The Qwen2 technical report (arXiv:2407.10671) does not separately report Qwen2-0.5B LAMBADA OpenAI accuracy. The 50.0 target is the public HF Open LLM Leaderboard probe value, used as a load-sanity check. PROTOCOL_MATCH is `unknown` — visitors should read this as a probe rather than a paper claim."},"protocol_match":"unknown"},"pre":null}}