{"paper":{"arxiv_id":"2409.02060","title":"OLMoE: Open Mixture-of-Experts Language Models","abstract":"We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, define and analyze new routing properties showing high specialization in our model, and open-source all our work: model weights, training data, code, and logs.","primary_category":"cs.CL","venue":"arXiv 2024","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"d86596e7-17c7-4ad9-aadf-c6591c3e051f","version":1,"source_url":"https://arxiv.org/abs/2409.02060","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"3b503726-e995-4557-8009-b6d5119c6e44","kind":"POST","status":"partial","score":0.6465863453815262,"confidence":0.55,"agent_version":"v0.1.0-olmoe-winogrande-microslice","computed_at":"2026-05-15T17:51:44.730Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2409.02060","section":"Table 4","row":"OLMoE-1B-7B-Instruct","column":"MMLU 5-shot","reported_value":54.1,"reported_metric":"accuracy","quoted_text":"54.1","pdf_page":8,"notes":"OLMoE paper (arXiv:2409.02060) Table 4 reports OLMoE-1B-7B-Instruct MMLU 5-shot ~ 54.1. Driver measures WinoGrande zero-shot on `allenai/OLMoE-1B-7B-0125-Instruct` instead — paper doesn't report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"verdicts":{"post":{"id":"3b503726-e995-4557-8009-b6d5119c6e44","kind":"POST","status":"partial","score":0.6465863453815262,"confidence":0.55,"agent_version":"v0.1.0-olmoe-winogrande-microslice","computed_at":"2026-05-15T17:51:44.730Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2409.02060","section":"Table 4","row":"OLMoE-1B-7B-Instruct","column":"MMLU 5-shot","reported_value":54.1,"reported_metric":"accuracy","quoted_text":"54.1","pdf_page":8,"notes":"OLMoE paper (arXiv:2409.02060) Table 4 reports OLMoE-1B-7B-Instruct MMLU 5-shot ~ 54.1. Driver measures WinoGrande zero-shot on `allenai/OLMoE-1B-7B-0125-Instruct` instead — paper doesn't report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"pre":null}}