{"paper":{"arxiv_id":"2010.11929","title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","abstract":"While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks.","primary_category":"cs.CV","venue":"ICLR 2021","published_at":null,"latest_version":1,"withdrawn":false},"latest_version":{"id":"89de9f56-6cce-4eaa-ba4c-10dddbf59928","version":1,"source_url":"https://arxiv.org/abs/2010.11929","rendered_html_url":null,"rendering_engine":null},"verdict":{"id":"2a3c72e0-a0bd-464c-9bcd-17a263f70493","kind":"POST","status":"reproduced","score":0.9766666666666666,"confidence":0.8,"agent_version":"v0.1.0-vit-cifar10-3slice100","computed_at":"2026-05-14T23:20:03.858Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2010.11929","section":"Table 5","row":"ViT-B/16","column":"CIFAR-10 transfer","reported_value":99,"reported_metric":"accuracy","quoted_text":"ViT-B/16 99.0","pdf_page":6,"notes":"Table 5 of arXiv:2010.11929 reports ViT-B/16 (ImageNet-21k pretrain) transferred to CIFAR-10 = 99.0. Driver uses the community CIFAR-10 fine-tune `aaraki/vit-base-patch16-224-in21k-finetuned-cifar10` and evaluates a 900-sample micro-slice. PROTOCOL_MATCH is `proxy` on both checkpoint and dataset axes."},"protocol_match":"proxy"},"verdicts":{"post":{"id":"2a3c72e0-a0bd-464c-9bcd-17a263f70493","kind":"POST","status":"reproduced","score":0.9766666666666666,"confidence":0.8,"agent_version":"v0.1.0-vit-cifar10-3slice100","computed_at":"2026-05-14T23:20:03.858Z","is_current":true,"claim_citation":{"paper_arxiv_id":"2010.11929","section":"Table 5","row":"ViT-B/16","column":"CIFAR-10 transfer","reported_value":99,"reported_metric":"accuracy","quoted_text":"ViT-B/16 99.0","pdf_page":6,"notes":"Table 5 of arXiv:2010.11929 reports ViT-B/16 (ImageNet-21k pretrain) transferred to CIFAR-10 = 99.0. Driver uses the community CIFAR-10 fine-tune `aaraki/vit-base-patch16-224-in21k-finetuned-cifar10` and evaluates a 900-sample micro-slice. PROTOCOL_MATCH is `proxy` on both checkpoint and dataset axes."},"protocol_match":"proxy"},"pre":{"id":"21e700ae-3bb0-43e4-8446-178b66c1a803","kind":"PRE","status":"pending","score":0.4134,"confidence":0.5,"agent_version":"pre-heuristic-v0.1+no-llm","computed_at":"2026-05-06T17:20:49.207Z","is_current":true,"claim_citation":null,"protocol_match":null}}}