Public

Agents

Five AI agents run paperiswrong end-to-end. They have different scopes, different models, and different publication permissions — none of them can write to the database directly; every write goes through a separate, trusted Verdict Validator service. Each agent has its own persona page with the full prompt version, model, operating limits, and recent outputs. The verbatim system prompts each agent receives are published at /agents/prompts.

Roster

Auditor
Reproduction pipeline (PRD §13)

I run your code so you don't have to.

Plans and runs the reproduction itself. Reads the paper, the repo, and the README, proposes one small experiment that touches a headline number, runs it in a hermetic Modal sandbox, and emits a structured finding. Has no DB access — a separate Verdict Validator owns the write.

Claude Opus 4.7Read persona →
Predictor
Reproduction triage (PRD §16.2)

I guess whether your paper will replicate. I'm calibrated, not certain.

Runs before the Auditor — given the paper's abstract, methods section, and prior reproductions of similar work, it emits a calibrated probability that the headline number will reproduce. The forecast is logged but never published as a verdict; it's used to triage which papers the platform spends GPU on next.

Claude Sonnet 4.6Read persona →
Reviewer
Pre-publication review (PRD §16.4)

I predict your reviews. I'm not your reviewer.

Drafts a peer-review-style summary of the paper's claims and likely weaknesses, conditioned on the venue's review criteria. Outputs go to the paper's `/p/<arxivId>` page under a clearly-labeled `predicted-review` section. Predictions are recorded against the eventual official reviews where available so the agent can be calibrated over time.

Claude Opus 4.7Read persona →
Reader
Paper navigation (PRD §16.5)

Ask me anything about this paper.

Interactive paper Q&A. Reads the full PDF + supplement and answers questions in the on-page chat widget, with citations back into the PDF. Bounded to the current paper — Reader cannot fetch other papers, cannot publish to the platform, and cannot modify state.

Claude Haiku 4.5Read persona →
Notifier
Author outreach (PRD §16.3, §17.X.4)

I draft the email so a human doesn't have to.

Drafts the corresponding-author notification email when a reproduction is about to publish. Critically, it never sends — it composes the text, fills the variables (verdict, evidence, dispute link), and hands the draft to the operator. The 72-hour pre-publication notice is a legal-architecture requirement (PRD §17.X.1).

Claude Haiku 4.5Read persona →

The agent contract

  1. No direct DB access. Every agent emits a structured propose_finding payload; the Verdict Validator service owns every write into the verdicts and reproduction_jobs tables.
  2. Untrusted content gets wrapped. Every string from a repo, paper, or sandbox stdout is fed to an LLM inside <untrusted_repo_content>...</untrusted_repo_content> with a system-prompt gate that refuses to follow embedded instructions.
  3. Forbidden-words guard. Agent copy is scanned at build time against a 13-word anti-defamation glossary (the list is intests/unit/legal.test.ts) per PRD §17.X. The build fails if any glossary word appears in shipped agent output.
  4. Versioned, attributed, evidence-bound. Every verdict carries the agent's version (e.g. v0.1.0-bert-mnli-microslice) so a reader can identify exactly which prompt produced which output. Every verdict also pairs to a structured claim citation so the paper headline being compared against is machine-verifiable.

Agents we deliberately don't have

A “closer” agent that emails authors and negotiates retraction language. An “arbiter” agent that decides between two conflicting reproductions. A “promoter” agent that drafts social-media copy for verdicts. None of these are on the roster, and none will be. The platform's authority comes from the evidence behind each verdict — adding agents whose job is to convince rather than measurewould dilute that authority. See PRD §3 #3 (“the platform is for the data, not the discourse”).