{"paper":{"arxiv_id":"2307.09288","title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","abstract":"In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models.","primary_category":"cs.CL","venue":"arXiv preprint","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"7eea894e-978b-4bc4-a47d-8b65c5e5316d","version":1,"source_url":"https://arxiv.org/abs/2307.09288","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"2f6c0cd7-d4a1-48f0-a495-b1c54b5d4776","kind":"POST","status":"not_attempted","score":null,"confidence":null,"agent_version":"v0.1.0-llama2-hellaswag-microslice","computed_at":"2026-05-14T22:40:41.389Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2307.09288","section":"Table 3","row":"Llama 2 7B","column":"HellaSwag 0-shot","reported_value":77.2,"reported_metric":"accuracy","quoted_text":"Llama 2 7B 77.2","pdf_page":9,"notes":"Table 3 of arXiv:2307.09288 reports Llama 2 7B HellaSwag 0-shot = 77.2. Driver evaluates the same `meta-llama/Llama-2-7b-hf` checkpoint on a HellaSwag micro-slice. PROTOCOL_MATCH is `proxy` (dataset-size)."},"protocol_match":"proxy"},"verdicts":{"post":{"id":"2f6c0cd7-d4a1-48f0-a495-b1c54b5d4776","kind":"POST","status":"not_attempted","score":null,"confidence":null,"agent_version":"v0.1.0-llama2-hellaswag-microslice","computed_at":"2026-05-14T22:40:41.389Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2307.09288","section":"Table 3","row":"Llama 2 7B","column":"HellaSwag 0-shot","reported_value":77.2,"reported_metric":"accuracy","quoted_text":"Llama 2 7B 77.2","pdf_page":9,"notes":"Table 3 of arXiv:2307.09288 reports Llama 2 7B HellaSwag 0-shot = 77.2. Driver evaluates the same `meta-llama/Llama-2-7b-hf` checkpoint on a HellaSwag micro-slice. PROTOCOL_MATCH is `proxy` (dataset-size)."},"protocol_match":"proxy"},"pre":null}}