{"paper":{"arxiv_id":"2004.02984","title":"MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices","abstract":"Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks. To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that MobileBERT is 4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known benchmarks. On the natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore of 77.7 (0.6 lower than BERT_BASE), and 62 ms latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task, MobileBERT achieves a dev F1 score of 90.0/79.2 (1.5/2.1 higher than BERT_BASE).","primary_category":"cs.CL","venue":"ACL 2020","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"9b59cf49-75a6-4871-a384-3099577b7ae4","version":1,"source_url":"https://arxiv.org/abs/2004.02984","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"ea817a08-4b6c-4c4d-8619-ea54ed0f0009","kind":"POST","status":"reproduced","score":0.8383353341336535,"confidence":0.8,"agent_version":"v0.1.0-mobilebert-mnli-microslice","computed_at":"2026-05-14T23:10:59.815Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2004.02984","section":"Table 4","row":"MobileBERT","column":"MNLI-m","reported_value":84.3,"reported_metric":"accuracy","quoted_text":"MobileBERT 84.3","pdf_page":6,"notes":"Table 4 (GLUE test set results) of arXiv:2004.02984. MobileBERT MNLI-m headline accuracy is 84.3. The matching HuggingFace checkpoint is `typeform/mobilebert-uncased-mnli`. Driver eval runs on an MNLI micro-slice, so PROTOCOL_MATCH is `proxy`."},"protocol_match":"proxy"},"verdicts":{"post":{"id":"ea817a08-4b6c-4c4d-8619-ea54ed0f0009","kind":"POST","status":"reproduced","score":0.8383353341336535,"confidence":0.8,"agent_version":"v0.1.0-mobilebert-mnli-microslice","computed_at":"2026-05-14T23:10:59.815Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2004.02984","section":"Table 4","row":"MobileBERT","column":"MNLI-m","reported_value":84.3,"reported_metric":"accuracy","quoted_text":"MobileBERT 84.3","pdf_page":6,"notes":"Table 4 (GLUE test set results) of arXiv:2004.02984. MobileBERT MNLI-m headline accuracy is 84.3. The matching HuggingFace checkpoint is `typeform/mobilebert-uncased-mnli`. Driver eval runs on an MNLI micro-slice, so PROTOCOL_MATCH is `proxy`."},"protocol_match":"proxy"},"pre":null}}