GPT-Rosalind: OpenAI's Life Sciences Reasoning Model

OpenAI's first domain-specific reasoning model for biology and drug discovery, launched April 16 2026 as a US-only research preview with a 0.751 BixBench score.

GPT-Rosalind: OpenAI's Life Sciences Reasoning Model

Overview

OpenAI launched GPT-Rosalind on April 16, 2026 as its first domain-specific reasoning model and the opening entry in a new Life Sciences series. It targets the multi-step work that dominates early drug discovery: literature synthesis, hypothesis generation, experimental design, and agentic analysis over genomics, protein engineering, and chemistry data.

TL;DR

  • Purpose-built reasoning model for biology, chemistry, and drug discovery workflows
  • Scores 0.751 on BixBench, ahead of GPT-5.4 (0.732) and Gemini 3.1 Pro (0.550)
  • Gated US-only research preview, free to qualified enterprise customers, with pricing undisclosed

OpenAI describes GPT-Rosalind as a reasoning model trained from the architecture up for biology rather than a fine-tune wrapper on GPT-5.4. A free Codex Life Sciences plugin exposes more than 50 scientific databases and tools to the agent loop. The launch came two days after OpenAI's strategic AI alliance with Novo Nordisk, and the same quarter as Isomorphic Labs' proprietary IsoDDE pipeline and Anthropic's Coefficient Bio acquisition. Access is the real story: the model runs only inside a Trusted Access Program for qualified US enterprise research teams.

Key Specifications

SpecificationDetails
ProviderOpenAI
Model FamilyGPT Life Sciences (series debut)
ArchitectureReasoning model (not disclosed publicly)
ParametersNot disclosed
Context WindowNot disclosed
Input ModalitiesText
Output ModalityText
Release DateApril 16, 2026
AvailabilityChatGPT Enterprise, Codex, API (all gated)
LicenseProprietary, Trusted Access Program
PricingNot disclosed; free during research preview
Codex PluginFree Life Sciences plugin, 50+ scientific data sources

Architecture, parameter count, and context window are all undisclosed at launch. That matches GPT-5.4 and GPT-5.3 practice but blocks independent architecture analysis.

Benchmark Performance

OpenAI published three benchmarks at launch. Only BixBench comes with cross-model numbers, and those are the ones worth focusing on.

BenchmarkGPT-RosalindGPT-5.4GPT-5Grok 4.2Gemini 3.1 Pro
BixBench (Pass@1)0.7510.7320.7280.6980.550
LABBench2 (tasks beat)6 of 11 vs GPT-5.4baseline---
Dyno Therapeutics RNA prediction>95th percentile of human experts----
Dyno Therapeutics RNA generation84th percentile of human experts----

A scientist at a computer analysing genomic data, the kind of workflow GPT-Rosalind targets A bioinformatician analyzing genomic data. GPT-Rosalind's Codex plugin wires the model into 50+ of these pipelines so agents can run multi-step analyses without switching tools. Source: unsplash.com

BixBench, built by FutureHouse and maintained by Edison Scientific, hands an agent an empty Jupyter notebook and 53 real-world bioinformatics scenarios covering 296 questions. GPT-Rosalind's 0.751 pass rate is 1.9 points over GPT-5.4 and 20.1 points over Gemini 3.1 Pro. Clear lead, though the bar is open-answer agent performance in a notebook, not wet-lab output.

LABBench2 is broader. The 2026 update spans roughly 1,900 tasks across literature retrieval, database access, sequence manipulation, protocol troubleshooting, and experiment planning. OpenAI reports GPT-Rosalind beats GPT-5.4 on 6 of 11 task families, with the largest jump on CloningQA. Per-task scores aren't published.

The Dyno Therapeutics result is the most interesting and hardest to replicate. The gene therapy company gave the model unpublished RNA sequences that couldn't have appeared in training. GPT-Rosalind's best-of-ten submissions ranked above the 95th percentile of human experts on sequence-to-function prediction and around the 84th percentile on sequence generation. Dyno supplied the data, so it's harder to write off as contamination. It also isn't reproducible outside Dyno.

For the bigger picture, see our scientific reasoning LLM leaderboard and reasoning benchmarks leaderboard.

Key Capabilities

Biological reasoning. GPT-Rosalind is tuned for multi-step inference across molecules, proteins, genes, and disease-relevant biology. Workflows include target discovery and validation, genomics interpretation, pathway analysis, literature synthesis, and hypothesis generation. OpenAI positions the model for long-horizon tool-heavy tasks, not short conversational exchanges.

Codex Life Sciences plugin. The free plugin is the more broadly useful piece of the launch. It connects Codex to 50+ scientific tools and data sources covering human genetics, functional genomics, protein structure, biochemistry, clinical evidence, and public study discovery. Critically, it works with general-purpose models like GPT-5.4 too, which matters because most researchers will never qualify for the Trusted Access Program.

Agentic workflows. Allen Institute CTO Andy Hickl says the model makes "manual steps like finding and aligning data more consistent and repeatable in an agentic workflow." Literature reads, database queries, sequence analyses, and protocol drafts run inside one Codex session instead of across many tools.

Safety training. OpenAI included biosecurity refusal training and a governance review in the qualification flow, citing dual-use pathogen design concerns. Organizations must show legitimate research purposes and strong internal controls before provisioning.

Pricing and Availability

GPT-Rosalind is a US-only research preview. No self-serve onboarding, no developer playground, no published token price. Qualified enterprise customers use the model during preview without consuming ChatGPT Enterprise credits or paid API tokens. OpenAI says it will publish pricing and broader availability "as the program expands" without committing to a date.

Access TierAvailabilityCost
Trusted Access Program (API)Qualified US enterprise research teamsFree during preview
ChatGPT EnterpriseSame qualified customersFree during preview
Codex (model)Same qualified customersFree during preview
Codex Life Sciences pluginPublic, works with any Codex modelFree
General API / ChatGPT PlusNot availableNot available
International accessNot availableNot available

A technician handling sample vials during DNA genotyping and sequencing DNA genotyping at a cancer genomics laboratory. OpenAI's qualification process restricts GPT-Rosalind to research teams with similar institutional governance and biosecurity controls. Source: unsplash.com

Launch partners named across coverage: Amgen, Moderna, Thermo Fisher Scientific, the Allen Institute, Oracle Health and Life Sciences, NVIDIA, Benchling, UCSF School of Pharmacy, Los Alamos National Laboratory, and Dyno Therapeutics. Each had early access ahead of the announcement, which is how the Dyno RNA evaluation and partner quotes were produced.

"GPT-Rosalind represents an important step in helping scientific teams use advanced AI to reason across complex biological evidence, data, and workflows," said Moderna CEO Stéphane Bancel.

The free plugin is the practical contrast. Any Codex user can install the Life Sciences plugin today, point it at GPT-5.4 or GPT-5.3 Codex, and get programmatic access to the same 50+ databases. For labs outside the partner set, that's the real shipped product.

Strengths

  • Top BixBench score. 0.751 Pass@1 leads GPT-5.4 (0.732) and crushes Gemini 3.1 Pro (0.550) on agentic bioinformatics tasks
  • Dyno RNA result is contamination-resistant. The 95th-percentile prediction score used unpublished sequences, which is rare among vendor-published biology benchmarks
  • Purpose-built for long-horizon workflows. Literature synthesis, hypothesis generation, and experiment planning chain inside one Codex session
  • Codex Life Sciences plugin ships free. 50+ scientific data sources, usable with general-purpose models, not gated
  • Partner lineup validates the target. Amgen, Moderna, Thermo Fisher, Allen Institute, and Los Alamos are serious research shops, not marquee logos
  • Biosecurity framing is explicit. Governance review and refusal training are documented parts of the access flow

Weaknesses

  • Gated US-only research preview. No self-serve access, no international availability, and no GA timeline
  • Pricing undisclosed. Budgeting around GPT-Rosalind is impossible today
  • Parameters, architecture, context window all undisclosed. Independent architecture analysis isn't possible
  • LABBench2 reporting is thin. 6-of-11 task families beat GPT-5.4 is the headline, with no per-task scores
  • All public benchmarks are vendor-selected. Independent verification isn't possible outside OpenAI and its partner environment
  • "Drug discovery" claims outrun the evidence. Ranking above human experts on an RNA prediction task isn't the same as advancing a molecule to the clinic

Sources

✓ Last verified April 21, 2026

GPT-Rosalind: OpenAI's Life Sciences Reasoning Model
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.