Agents — paperiswrong

Roster

I run your code so you don't have to.

Plans and runs the reproduction itself. Reads the paper, the repo, and the README, proposes one small experiment that touches a headline number, runs it in a hermetic Modal sandbox, and emits a structured finding. Has no DB access — a separate Verdict Validator owns the write.

Claude Opus 4.7Read persona →

Predictor

Reproduction triage (PRD §16.2)

I guess whether your paper will replicate. I'm calibrated, not certain.

Runs before the Auditor — given the paper's abstract, methods section, and prior reproductions of similar work, it emits a calibrated probability that the headline number will reproduce. The forecast is logged but never published as a verdict; it's used to triage which papers the platform spends GPU on next.

Claude Sonnet 4.6Read persona →

Reviewer

Pre-publication review (PRD §16.4)

I predict your reviews. I'm not your reviewer.

Drafts a peer-review-style summary of the paper's claims and likely weaknesses, conditioned on the venue's review criteria. Outputs go to the paper's `/p/<arxivId>` page under a clearly-labeled `predicted-review` section. Predictions are recorded against the eventual official reviews where available so the agent can be calibrated over time.

Claude Opus 4.7Read persona →

Reader

Paper navigation (PRD §16.5)

Ask me anything about this paper.

Interactive paper Q&A. Reads the full PDF + supplement and answers questions in the on-page chat widget, with citations back into the PDF. Bounded to the current paper — Reader cannot fetch other papers, cannot publish to the platform, and cannot modify state.

Claude Haiku 4.5Read persona →

Notifier

Author outreach (PRD §16.3, §17.X.4)

I draft the email so a human doesn't have to.

Drafts the corresponding-author notification email when a reproduction is about to publish. Critically, it never sends — it composes the text, fills the variables (verdict, evidence, dispute link), and hands the draft to the operator. The 72-hour pre-publication notice is a legal-architecture requirement (PRD §17.X.1).

Claude Haiku 4.5Read persona →

The agent contract

No direct DB access. Every agent emits a structured propose_finding payload; the Verdict Validator service owns every write into the verdicts and reproduction_jobs tables.
Untrusted content gets wrapped. Every string from a repo, paper, or sandbox stdout is fed to an LLM inside <untrusted_repo_content>...</untrusted_repo_content> with a system-prompt gate that refuses to follow embedded instructions.
Forbidden-words guard. Agent copy is scanned at build time against a 13-word anti-defamation glossary (the list is intests/unit/legal.test.ts) per PRD §17.X. The build fails if any glossary word appears in shipped agent output.
Versioned, attributed, evidence-bound. Every verdict carries the agent's version (e.g. v0.1.0-bert-mnli-microslice) so a reader can identify exactly which prompt produced which output. Every verdict also pairs to a structured claim citation so the paper headline being compared against is machine-verifiable.

Agents we deliberately don't have

A “closer” agent that emails authors and negotiates retraction language. An “arbiter” agent that decides between two conflicting reproductions. A “promoter” agent that drafts social-media copy for verdicts. None of these are on the roster, and none will be. The platform's authority comes from the evidence behind each verdict — adding agents whose job is to convince rather than measurewould dilute that authority. See PRD §3 #3 (“the platform is for the data, not the discourse”).