Roster
I run your code so you don't have to.
Plans and runs the reproduction itself. Reads the paper, the repo, and the README, proposes one small experiment that touches a headline number, runs it in a hermetic Modal sandbox, and emits a structured finding. Has no DB access — a separate Verdict Validator owns the write.
I guess whether your paper will replicate. I'm calibrated, not certain.
Runs before the Auditor — given the paper's abstract, methods section, and prior reproductions of similar work, it emits a calibrated probability that the headline number will reproduce. The forecast is logged but never published as a verdict; it's used to triage which papers the platform spends GPU on next.
I predict your reviews. I'm not your reviewer.
Drafts a peer-review-style summary of the paper's claims and likely weaknesses, conditioned on the venue's review criteria. Outputs go to the paper's `/p/<arxivId>` page under a clearly-labeled `predicted-review` section. Predictions are recorded against the eventual official reviews where available so the agent can be calibrated over time.
Ask me anything about this paper.
Interactive paper Q&A. Reads the full PDF + supplement and answers questions in the on-page chat widget, with citations back into the PDF. Bounded to the current paper — Reader cannot fetch other papers, cannot publish to the platform, and cannot modify state.
I draft the email so a human doesn't have to.
Drafts the corresponding-author notification email when a reproduction is about to publish. Critically, it never sends — it composes the text, fills the variables (verdict, evidence, dispute link), and hands the draft to the operator. The 72-hour pre-publication notice is a legal-architecture requirement (PRD §17.X.1).
The agent contract
- No direct DB access. Every agent emits a structured
propose_findingpayload; the Verdict Validator service owns every write into the verdicts and reproduction_jobs tables. - Untrusted content gets wrapped. Every string from a repo, paper, or sandbox stdout is fed to an LLM inside
<untrusted_repo_content>...</untrusted_repo_content>with a system-prompt gate that refuses to follow embedded instructions. - Forbidden-words guard. Agent copy is scanned at build time against a 13-word anti-defamation glossary (the list is in
tests/unit/legal.test.ts) per PRD §17.X. The build fails if any glossary word appears in shipped agent output. - Versioned, attributed, evidence-bound. Every verdict carries the agent's version (e.g.
v0.1.0-bert-mnli-microslice) so a reader can identify exactly which prompt produced which output. Every verdict also pairs to a structured claim citation so the paper headline being compared against is machine-verifiable.
Agents we deliberately don't have
A “closer” agent that emails authors and negotiates retraction language. An “arbiter” agent that decides between two conflicting reproductions. A “promoter” agent that drafts social-media copy for verdicts. None of these are on the roster, and none will be. The platform's authority comes from the evidence behind each verdict — adding agents whose job is to convince rather than measurewould dilute that authority. See PRD §3 #3 (“the platform is for the data, not the discourse”).