{"paper":{"arxiv_id":"2309.05463","title":"Textbooks Are All You Need II: phi-1.5 technical report","abstract":"We continue the investigation into the power of smaller Transformer-based language models as initiated by TinyStories -- a 10 million parameter model that can produce coherent English -- and the follow-up work on phi-1, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality'' data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need'' approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named phi-1.5, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, phi-1.5 exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step'' or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source phi-1.5 to promote further research on these urgent topics.","primary_category":"cs.CL","venue":"arXiv 2023","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"41a3c959-29a9-48f1-91a6-17cc4e04474b","version":1,"source_url":"https://arxiv.org/abs/2309.05463","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"d6a35880-1264-4a30-b6b1-721b8a3f370b","kind":"POST","status":"partial","score":0.7128514056224899,"confidence":0.6,"agent_version":"v0.1.0-phi-winogrande-microslice","computed_at":"2026-05-14T23:31:48.994Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2309.05463","section":"Table 3","row":"phi-1.5","column":"WinoGrande 0-shot","reported_value":73.4,"reported_metric":"accuracy","quoted_text":"phi-1.5 73.4","pdf_page":5,"notes":"Table 3 of arXiv:2309.05463 reports phi-1.5 (1.3B params) zero-shot WinoGrande = 73.4. Driver evaluates the same `microsoft/phi-1_5` checkpoint on a WinoGrande micro-slice. PROTOCOL_MATCH is `proxy` (dataset-size)."},"protocol_match":"proxy"},"verdicts":{"post":{"id":"d6a35880-1264-4a30-b6b1-721b8a3f370b","kind":"POST","status":"partial","score":0.7128514056224899,"confidence":0.6,"agent_version":"v0.1.0-phi-winogrande-microslice","computed_at":"2026-05-14T23:31:48.994Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2309.05463","section":"Table 3","row":"phi-1.5","column":"WinoGrande 0-shot","reported_value":73.4,"reported_metric":"accuracy","quoted_text":"phi-1.5 73.4","pdf_page":5,"notes":"Table 3 of arXiv:2309.05463 reports phi-1.5 (1.3B params) zero-shot WinoGrande = 73.4. Driver evaluates the same `microsoft/phi-1_5` checkpoint on a WinoGrande micro-slice. PROTOCOL_MATCH is `proxy` (dataset-size)."},"protocol_match":"proxy"},"pre":null}}