arXiv Hits Researchers With 1-Year Ban for AI Slop

ArXiv is issuing one-year submission bans to authors whose papers contain verifiable unvetted AI output, as fabricated academic citations hit a tenfold increase since 2023.

arXiv Hits Researchers With 1-Year Ban for AI Slop

Science's preprint backbone is drawing a harder line. ArXiv - the repository that hosts nearly 2.4 million scholarly papers and processes millions of new submissions each year - announced Thursday that authors whose papers contain verifiable evidence of unchecked AI-created content will face a one-year ban from the platform. After that ban expires, every future submission must clear peer review at a journal or conference before arXiv will accept it.

The announcement came from Thomas G. Dietterich, Distinguished Professor Emeritus at Oregon State University and chair of arXiv's computer science section, who posted the policy update on social media after months of escalating complaints about what the platform describes as an "influx of AI-generated materials masquerading as rigorous science."

TL;DR

  • 1-year submission bans for papers with "incontrovertible evidence" of unchecked AI output - hallucinated references, LLM prompts visible in text, unfilled data table placeholders
  • Fabricated citations in academic papers grew from 1 in 2,828 (2023) to 1 in 277 (early 2026) - a tenfold jump, according to a Lancet study from Columbia University researchers
  • NeurIPS 2025 passed 100 hallucinated citations across 53 papers despite 3-5 expert reviewers per paper
  • The ban extends an earlier rule requiring peer review for all CS survey and position papers before arXiv will host them

How Bad the Problem Actually Is

The Lancet's Numbers

A Columbia University research team published findings in The Lancet on May 7 that put hard numbers on the problem. Analyzing more than 2 million papers and 97 million citations, they identified roughly 4,000 fabricated citations across 2,800 papers - references that "do not reference real papers."

The growth curve is steep. In 2023, 1 in 2,828 papers contained at least one fabricated reference. By 2025, that was 1 in 458 - a sixfold increase. In the first seven weeks of 2026 alone, the rate reached 1 in 277. Generative AI tools are the likely driver, according to the Columbia team. More than a third of all fabricated citations traced back to two large open-access publishers.

"This is one of the first papers telling us something about the quality of what's being produced with LLMs, and it's a signal of slop," Misha Teplitskiy, a science sociologist at the University of Michigan, told STAT News.

The arXiv website as it appears in May 2026, showing search and browse tools for millions of preprints ArXiv hosts nearly 2.4 million scholarly papers and now faces hundreds of AI-produced submissions monthly. Source: arxiv.org

When Elite Conferences Miss It

If the volume problem were limited to open-access publishers, one could argue peer review would catch it. The record says otherwise.

GPTZero scanned 4,841 accepted papers from NeurIPS 2025 and found 100 hallucinated citations across 53 papers. Each paper had been reviewed by three to five expert researchers. NeurIPS confirmed that reviewers had been instructed to flag hallucinations, but the citations still passed. Hallucinated citations included fabricated author names, fake DOIs, and real paper titles combined with invented publication details.

GPTZero then ran a pass on ICLR 2026 - scanning just 300 of around 20,000 submissions - and found 50 more hallucinations that had cleared peer review. At that sampling rate, the total would run into the hundreds across the full submission pool.

The Nikkei newspaper found something more deliberate: 17 preprints containing hidden prompts specifically designed to instruct AI-powered reviewers to recommend acceptance.

What ArXiv's Policy Actually Says

What Gets You Banned

Dietterich was specific about what "incontrovertible evidence" means in practice. This isn't a judgment call about whether a paper sounds AI-generated. ArXiv is looking for things that leave no room for doubt.

"If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s)."

  • Thomas G. Dietterich, chair, arXiv CS section

The clearest examples: hallucinated references pointing to papers that don't exist, and LLM meta-comments left in the final text - phrases like "here is a 200 word summary; would you like me to make any changes?" or data table placeholders reading "the data in this table is illustrative, fill it in with the real numbers from your experiments."

The Penalty Structure

Moderators flag potential violations. Section chairs review the evidence and confirm before any penalty is imposed. Authors can appeal. The structure is:

OffensePenalty
First violation1-year ban from all arXiv submissions
After ban endsAll future submissions must first pass peer review at a reputable venue

Cornell University's arXiv also announced it'll no longer accept CS reviews and position papers unless they've already passed peer review at a conference or journal.

An open notebook and pen on a wooden desk, representing the academic research process ArXiv is clear that AI use isn't the violation - submitting AI output you haven't verified is. Source: unsplash.com

A Pattern of Escalating Rules

This is the third wave of AI restrictions arXiv has put in place. Six months ago it required peer review for any CS survey paper, a category that had been flooded with AI-produced reviews summarizing existing literature without adding original analysis. Then it moved to require endorsement from established researchers for first-time submitters. The one-year ban targeting individual authors is the first policy that goes after people rather than paper categories.

What This Policy Doesn't Ban

ArXiv isn't telling researchers to stop using AI tools. The policy is explicit. Authors can use LLMs to draft, edit, or restructure papers - the requirement is that they verify what comes out. Our guide on using AI for academic research covers how to integrate AI writing tools without losing control over citations and factual claims.

Mohammad Hosseini from Northwestern University put the underlying issue directly: "Citation practices are changing with generative AI use... people simply use their hunches to prompt ChatGPT... that is not a healthy practice."

The distinction matters for most researchers who use AI legitimately. The ban targets the extreme end - authors who submitted whatever a model produced without checking it. The hallucination benchmarks that AI labs publish measure model failure rates in controlled settings. The NeurIPS and ICLR findings suggest real-world citation hallucination rates are far higher once you account for the selection pressure to publish.

Papers and documents spread across a research desk, illustrating the growing volume of AI-assisted academic writing The Lancet study analyzed 97 million citations across 2 million papers to quantify how fast fabricated references are spreading. Source: statnews.com


ArXiv is mid-transition. In March 2026, it announced it'd separate from Cornell University and become an independent nonprofit on July 1, 2026. Enforcement capacity - how many moderators the platform can deploy against a submission backlog that now includes hundreds of AI-produced papers monthly - is the open question. One-year bans are meaningful, but they only work if violations are actually caught.

Sources:

Elena Marchetti
About the author Senior AI Editor & Investigative Journalist

Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem.