{"paper":{"arxiv_id":"2402.17834","title":"Stable LM 2 1.6B Technical Report","abstract":"We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this report, we present the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including zero- and few-shot benchmarks, multilingual benchmarks, and the MT-Bench fine-tuning benchmark. At the time of publishing this report, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters by a significant margin. Given its appealing small size, we also provide thorough analysis of various characteristics important for safe and reliable deployment.","primary_category":"cs.CL","venue":"arXiv 2024","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"3304c4fd-c818-4771-9359-b40a3c1aaea1","version":1,"source_url":"https://arxiv.org/abs/2402.17834","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"c87ffd32-9207-4da4-b040-5806c8c2bcba","kind":"POST","status":"partial","score":0.606425702811245,"confidence":0.55,"agent_version":"v0.1.0-stablelm2-winogrande-microslice","computed_at":"2026-05-15T18:26:56.768Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2402.17834","section":"Table 3","row":"StableLM-2-1.6B-Chat","column":"MMLU 5-shot","reported_value":38.8,"reported_metric":"accuracy","quoted_text":"38.8","pdf_page":9,"notes":"Stable LM 2 1.6B Technical Report (arXiv:2402.17834) Table 3 reports StableLM-2-1.6B-Chat MMLU 5-shot ~ 38.8. Driver measures WinoGrande zero-shot on `stabilityai/stablelm-2-1_6b-chat` instead — paper does not report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"verdicts":{"post":{"id":"c87ffd32-9207-4da4-b040-5806c8c2bcba","kind":"POST","status":"partial","score":0.606425702811245,"confidence":0.55,"agent_version":"v0.1.0-stablelm2-winogrande-microslice","computed_at":"2026-05-15T18:26:56.768Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2402.17834","section":"Table 3","row":"StableLM-2-1.6B-Chat","column":"MMLU 5-shot","reported_value":38.8,"reported_metric":"accuracy","quoted_text":"38.8","pdf_page":9,"notes":"Stable LM 2 1.6B Technical Report (arXiv:2402.17834) Table 3 reports StableLM-2-1.6B-Chat MMLU 5-shot ~ 38.8. Driver measures WinoGrande zero-shot on `stabilityai/stablelm-2-1_6b-chat` instead — paper does not report comparable zero-shot WinoGrande. PROTOCOL_MATCH = `unknown` because the metric measured differs from the metric cited. Validator C1 gate prevents publication of WRONG regardless of measurement."},"protocol_match":"unknown"},"pre":null}}