{"paper":{"arxiv_id":"2501.00656","title":"2 OLMo 2 Furious","abstract":"We present OLMo 2, the next generation of our fully open language models. OLMo 2 includes dense autoregressive models with improved architecture and training recipe, pretraining data mixtures, and instruction tuning recipes. Our modified model architecture and training recipe achieve both better training stability and improved per-token efficiency. Our updated pretraining data mixture introduces a new, specialized data mix called Dolmino Mix 1124, which significantly improves model capabilities across many downstream task benchmarks when introduced via late-stage curriculum training. Finally, we incorporate best practices from Tülu 3 to develop OLMo 2-Instruct, focusing on permissive data and extending our final-stage reinforcement learning with verifiable rewards (RLVR). Our OLMo 2 base models sit at the Pareto frontier of performance to compute, often matching or outperforming open-weight only models like Llama 3.1 and Qwen 2.5 while using fewer FLOPs and with fully transparent training data, code, and recipe.","primary_category":"cs.CL","venue":"arXiv 2024","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"96be666d-baff-46e7-83c8-0434a4f246c7","version":1,"source_url":"https://arxiv.org/abs/2501.00656","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"f13205f6-aa84-42df-9ef6-4427659aca0e","kind":"POST","status":"partial","score":0.6626506024096385,"confidence":0.55,"agent_version":"v0.1.0-olmo2-winogrande-microslice","computed_at":"2026-05-15T19:56:36.638Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2501.00656","section":"Table 8","row":"OLMo-2-7B-Instruct","column":"MMLU 5-shot","reported_value":61.2,"reported_metric":"accuracy","quoted_text":"61.2","pdf_page":16,"notes":"OLMo 2 paper (arXiv:2501.00656) Table 8 reports OLMo-2-7B-Instruct MMLU 5-shot ~ 61.2. Driver measures WinoGrande zero-shot on `allenai/OLMo-2-1124-7B-Instruct` instead — paper does not report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"verdicts":{"post":{"id":"f13205f6-aa84-42df-9ef6-4427659aca0e","kind":"POST","status":"partial","score":0.6626506024096385,"confidence":0.55,"agent_version":"v0.1.0-olmo2-winogrande-microslice","computed_at":"2026-05-15T19:56:36.638Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2501.00656","section":"Table 8","row":"OLMo-2-7B-Instruct","column":"MMLU 5-shot","reported_value":61.2,"reported_metric":"accuracy","quoted_text":"61.2","pdf_page":16,"notes":"OLMo 2 paper (arXiv:2501.00656) Table 8 reports OLMo-2-7B-Instruct MMLU 5-shot ~ 61.2. Driver measures WinoGrande zero-shot on `allenai/OLMo-2-1124-7B-Instruct` instead — paper does not report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"pre":null}}