{"paper":{"arxiv_id":"1910.01108","title":"DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter","abstract":"As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster.","primary_category":"cs.CL","venue":"NeurIPS 2019 EMC^2 Workshop","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"bce656f2-1426-4416-98d9-5a4313107b83","version":1,"source_url":"https://arxiv.org/abs/1910.01108","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"74115808-6f35-468f-a4d6-2088eed2ddf4","kind":"POST","status":"reproduced","score":0.915,"confidence":0.8,"agent_version":"v0.1.0-distilbert-sst2-microslice","computed_at":"2026-05-15T19:19:17.785Z","is_current":true,"claim_citation":{"paper_arxiv_id":"1910.01108","section":"Table 2","row":"DistilBERT","column":"SST-2","reported_value":91.3,"reported_metric":"accuracy","quoted_text":"DistilBERT 77.0 51.3 91.3 85.5 59.9 86.9 56.1 89.2","pdf_page":4,"notes":"Table 2 (Development set results on the dev sets of the GLUE benchmark). Numbers are accuracy on SST-2 dev. The matching HuggingFace checkpoint is `distilbert-base-uncased-finetuned-sst-2-english`."},"protocol_match":"proxy"},"verdicts":{"post":{"id":"74115808-6f35-468f-a4d6-2088eed2ddf4","kind":"POST","status":"reproduced","score":0.915,"confidence":0.8,"agent_version":"v0.1.0-distilbert-sst2-microslice","computed_at":"2026-05-15T19:19:17.785Z","is_current":true,"claim_citation":{"paper_arxiv_id":"1910.01108","section":"Table 2","row":"DistilBERT","column":"SST-2","reported_value":91.3,"reported_metric":"accuracy","quoted_text":"DistilBERT 77.0 51.3 91.3 85.5 59.9 86.9 56.1 89.2","pdf_page":4,"notes":"Table 2 (Development set results on the dev sets of the GLUE benchmark). Numbers are accuracy on SST-2 dev. The matching HuggingFace checkpoint is `distilbert-base-uncased-finetuned-sst-2-english`."},"protocol_match":"proxy"},"pre":null}}