{"paper":{"arxiv_id":"2502.02737","title":"SmolLM2: When Smol Goes Big — Data-Centric Training of a Small Language Model","abstract":"Large Language Models (LLMs) have come to dominate machine learning, but their accessibility and deployment remain challenging due to their scale. The emergence of small language models (SLMs) presents a promising direction, offering capable performance with reduced compute requirements. In this paper, we introduce SmolLM2, a state-of-the-art small (1.7B parameter) language model trained on ~11 trillion tokens of data using a multi-stage training process that mixes web text with specialized math, code, and instruction-following data. We employ a manual rebalancing process to inform stage transitions and create new specialized datasets (FineMath, Stack-Edu, and SmolTalk) at the stages where existing datasets were problematically small or low-quality. SmolLM2 demonstrates superior performance compared to other recent small LMs including Qwen2.5-1.5B and Llama3.2-1B. To facilitate future research on LM development as well as applications of small LMs, we release both SmolLM2 as well as the datasets we prepared in the course of this project.","primary_category":"cs.CL","venue":"arXiv 2025","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"3b9655e7-3a42-4b7f-840b-e75b1938a52a","version":1,"source_url":"https://arxiv.org/abs/2502.02737","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"626b641c-21df-4405-bb07-cd1e5f5f95bd","kind":"POST","status":"partial","score":0.6345381526104418,"confidence":0.55,"agent_version":"v0.1.0-smollm2-winogrande-microslice","computed_at":"2026-05-15T16:33:10.626Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2502.02737","section":"Table 4","row":"SmolLM2-1.7B-Instruct","column":"MMLU 5-shot","reported_value":50.8,"reported_metric":"accuracy","quoted_text":"50.8","pdf_page":8,"notes":"SmolLM2 paper (arXiv:2502.02737) reports SmolLM2-1.7B-Instruct MMLU 5-shot ~ 50.8 in its final-stage evaluation tables. Driver measures WinoGrande zero-shot on `HuggingFaceTB/SmolLM2-1.7B-Instruct` instead — paper does not report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"verdicts":{"post":{"id":"626b641c-21df-4405-bb07-cd1e5f5f95bd","kind":"POST","status":"partial","score":0.6345381526104418,"confidence":0.55,"agent_version":"v0.1.0-smollm2-winogrande-microslice","computed_at":"2026-05-15T16:33:10.626Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2502.02737","section":"Table 4","row":"SmolLM2-1.7B-Instruct","column":"MMLU 5-shot","reported_value":50.8,"reported_metric":"accuracy","quoted_text":"50.8","pdf_page":8,"notes":"SmolLM2 paper (arXiv:2502.02737) reports SmolLM2-1.7B-Instruct MMLU 5-shot ~ 50.8 in its final-stage evaluation tables. Driver measures WinoGrande zero-shot on `HuggingFaceTB/SmolLM2-1.7B-Instruct` instead — paper does not report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"pre":null}}