{"paper":{"arxiv_id":"2412.15115","title":"Qwen2.5 Technical Report","abstract":"In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. Open-weight offerings include base and instruction-tuned models, with quantized versions available.","primary_category":"cs.CL","venue":"arXiv 2024","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"17a45169-2b38-4d0a-8858-57903ee9b7ca","version":1,"source_url":"https://arxiv.org/abs/2412.15115","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"a623fa57-5b71-423c-9dd1-617b10b7efd3","kind":"POST","status":"partial","score":0.5562248995983935,"confidence":0.55,"agent_version":"v0.1.0-qwen25-winogrande-microslice","computed_at":"2026-05-15T16:17:44.682Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2412.15115","section":"Results","row":"Qwen2.5-0.5B-Instruct","column":"MMLU 5-shot","reported_value":47.5,"reported_metric":"accuracy","quoted_text":"47.5","pdf_page":8,"notes":"Qwen2.5 paper (arXiv:2412.15115) reports Qwen2.5-0.5B-Instruct MMLU 5-shot ~ 47.5 in its results tables. Driver measures WinoGrande zero-shot on `Qwen/Qwen2.5-0.5B-Instruct` instead — paper does not report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"verdicts":{"post":{"id":"a623fa57-5b71-423c-9dd1-617b10b7efd3","kind":"POST","status":"partial","score":0.5562248995983935,"confidence":0.55,"agent_version":"v0.1.0-qwen25-winogrande-microslice","computed_at":"2026-05-15T16:17:44.682Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2412.15115","section":"Results","row":"Qwen2.5-0.5B-Instruct","column":"MMLU 5-shot","reported_value":47.5,"reported_metric":"accuracy","quoted_text":"47.5","pdf_page":8,"notes":"Qwen2.5 paper (arXiv:2412.15115) reports Qwen2.5-0.5B-Instruct MMLU 5-shot ~ 47.5 in its results tables. Driver measures WinoGrande zero-shot on `Qwen/Qwen2.5-0.5B-Instruct` instead — paper does not report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"pre":null}}