Diffusion Models:
The New Paradigm

How iterative refinement is replacing sequential generation across text, proteins, molecules, and images — reshaping AI from the ground up

February 25, 2026 · Q Research

The Paradigm Shift

For seven years, autoregressive (AR) models have dominated language AI: predict one token, append it, repeat. In February 2026, the first production diffusion-based reasoning LLM shattered that monopoly. Simultaneously, diffusion models are designing proteins, discovering drugs, and generating images. This is the diffusion paradigm.

1,009
tok/s — Mercury 2
faster than AR speed models
$2.2B
Generate:Bio valuation
4
domains disrupted

What is Diffusion?

Instead of generating content one piece at a time (left-to-right), diffusion models start from noise or masked tokens and iteratively refine everything simultaneously. Think of it as an editor revising a full draft rather than a typewriter producing character by character.

Interactive: Watch Diffusion Refine

From noise → coherent output in a few steps

Why Now?

Scaling Laws Discovered

Recent papers (LLaDA, MDLM, Mercury) show diffusion LMs follow similar scaling laws to AR models — more compute = better quality. The gap is closing fast.

Hardware Alignment

Diffusion's parallel generation maps perfectly to modern GPU architectures (NVIDIA Blackwell). AR models waste compute on sequential dependencies.

Cross-Domain Validation

Diffusion already dominates images (Stable Diffusion), proteins (RFDiffusion), and molecules (DiffDock). Text was the last frontier — Mercury 2 just breached it.

Domain Map

Text / Code

Language Generation

Mercury 2, LLaDA, MDLM, SEDD. Masked and continuous diffusion replacing AR decoding. 1,000+ tok/s with reasoning-grade quality.

Proteins

Protein Design

Chroma (Generate:Bio), RFDiffusion (Baker Lab), FrameDiff, Genie. Design novel protein structures with specified function from scratch.

Molecules

Drug Discovery

DiffDock, DiffSBDD, Torsional Diffusion, TargetDiff. Predict binding poses, generate drug-like molecules for specific protein targets.

Images / Video

Visual Generation

Stable Diffusion 3, DALL-E 3, Imagen 3, Sora, Kling. The original diffusion success story, now the backbone of all visual AI.

Architecture Comparison

Three paradigms for content generation. Autoregressive has dominated text; diffusion is proving there's a better way.

Autoregressive (AR)

Step 1: [The] → predict next Step 2: [The][cat] → predict next Step 3: [The][cat][sat] → ... ⏱ Sequential, N steps for N tokens
  • One token at a time, left-to-right
  • Each token depends on all previous
  • Latency scales linearly with length
  • Cannot revise earlier tokens
  • Examples: GPT-5, Claude, Llama

Masked Diffusion (MDM)

Step 1: [_][_][_][_][_][_] → all masked Step 2: [The][_][_][sat][_][_] → unmask some Step 3: [The][cat][_][sat][on][_] → refine Step 4: [The][cat][sat][on][the][mat] ✓ ⚡ Parallel, ~10-20 steps for any length
  • Start fully masked, iteratively reveal
  • All positions attend to all others
  • Latency nearly constant regardless of length
  • Self-correction: can revise any position
  • Examples: Mercury 2, LLaDA, MDLM

Continuous Diffusion

Step 1: x₀ ~ N(0,I) → random noise Step 2: x₁ = denoise(x₀) → structure emerges Step 3: x₂ = denoise(x₁) → details form Step K: xₖ = denoise(xₖ₋₁) → final output 🧬 Used for structures, images, molecules
  • Operate in continuous embedding space
  • Gradual denoising from Gaussian noise
  • Natural fit for 3D structures
  • Score-matching training objective
  • Examples: RFDiffusion, Stable Diffusion

The Speed Advantage

Autoregressive models generate 100 tokens by running the model 100 times sequentially. Diffusion models generate 100 tokens by running the model ~10-20 times in parallel. The advantage compounds with output length.

Key Architectural Innovations

Bidirectional Attention

Unlike AR models' causal masks, diffusion models use full bidirectional attention. Every token can see every other token at every step, enabling richer contextual understanding.

Noise Schedule Design

The masking/noise schedule controls the generation process. Cosine schedules, linear schedules, and learned schedules trade off quality vs speed. Mercury 2 uses a tunable approach.

Classifier-Free Guidance

Borrowed from image diffusion: interpolate between conditional and unconditional generation to boost quality. Applied to text via prompt conditioning.

AR Initialization

Recent work shows initializing diffusion LMs from pretrained AR model weights dramatically improves quality. The AR model provides a strong starting point for the diffusion process.

Diffusion Math (Simplified)

Forward process (add noise):
q(x_t | x_0) = mask x_0 with probability β_t

Reverse process (denoise):
p_θ(x_{t-1} | x_t) = neural_network(x_t, t)

Training objective:
L = E[-log p_θ(x_0 | x_t)] — predict clean from noisy

Key insight: same math works for tokens, pixels, atoms, residues

Mercury 2: First Production Diffusion LLM

Launched February 24, 2026 by Inception Labs. The first diffusion-based reasoning model to reach production quality, running at 1,009 tokens/second on NVIDIA Blackwell.

1,009
tokens per second
$0.25
per 1M input tokens
$0.75
per 1M output tokens
128K
context window

How It Works

Mercury 2 generates responses through parallel refinement — producing multiple tokens simultaneously and converging over a small number of steps. "Less typewriter, more editor revising a full draft at once." It features tunable reasoning depth, native tool use, and schema-aligned JSON output.

Speed Comparison

Benchmark Performance

ModelArchitectureSpeed (tok/s)Price (out/1M)Quality Tier
Mercury 2Diffusion1,009$0.75Speed-optimized
GPT-5.2 miniAR~200$0.60Speed-optimized
Claude Haiku 4AR~180$1.25Speed-optimized
Gemini Flash 2.5AR~220$0.30Speed-optimized
Llama 4 ScoutAR MoE~150Self-hostedOpen-source
GLM-5AR~100FreeOpen-source

Production Use Cases

🔄 Agentic Loops

Agent workflows chain dozens of inference calls. 5× latency reduction per call compounds across every step. More budget for reasoning depth.

💻 Code Autocomplete

Real-time suggestions that "feel like part of your own thinking" (Zed). Diffusion eliminates the pause that breaks developer flow.

🎤 Voice Interfaces

Tightest latency budget in AI. Mercury 2 makes reasoning-quality responses viable within natural speech cadences.

🔍 Search & RAG

Multi-hop retrieval stacks latency fast. Diffusion lets you add reasoning to the search loop without blowing the latency budget.

Who Built It

Inception Labs — Founded by three professors: from Stanford, Cornell, and UCLA. Backed by NVIDIA. The Mercury lineage: Mercury Coder (Oct 2025) → Mercury 2 (Feb 2026). OpenAI API compatible — drop-in replacement.

Source: inceptionlabs.ai · HN Discussion (221 pts)

Diffusion in Biology

Diffusion models didn't start with text — they conquered protein design first. The same mathematical framework that generates images from noise can generate novel protein structures, drug molecules, and binding poses from scratch.

The Protein Diffusion Pipeline

01
Specify Intent
Define target function, binding site, or therapeutic goal
02
Noise → Backbone
Diffusion generates 3D protein backbone from Gaussian noise
03
Sequence Design
ProteinMPNN or inverse folding assigns amino acid sequence
04
Validate
AlphaFold/ESMFold confirms structure, wet lab testing
05
Therapeutic
Optimized protein enters clinical development

Key Models & Companies

Protein Design

Chroma — Generate:Biomedicines

Flagship Pioneering · IPO Feb 27, 2026 · $2.2B valuation

Diffusion model for programmable protein design. Generates novel protein structures conditioned on desired properties — function, shape, symmetry. Powers GB-0895 (anti-TSLP antibody, Phase 3 severe asthma) and oncology pipeline.

Nature 2023 · generatebiomedicines.com

Protein Design

RFDiffusion — Baker Lab / UW

David Baker (Nobel 2024) · Open source

Fine-tuned from RoseTTAFold structure prediction network for protein design via denoising diffusion. Designs binders, symmetric assemblies, motif scaffolding. Used by hundreds of labs worldwide.

Nature 2023 · GitHub

Molecule

DiffDock — MIT CSAIL

Corso et al. · ICLR 2023

Diffusion model for molecular docking — predicts how drug molecules bind to protein targets. Treats docking as a generative problem over SE(3), generating binding poses through iterative refinement.

arXiv:2210.01776 · GitHub

Molecule

Torsional Diffusion

MIT · NeurIPS 2022

Generates molecular conformations by diffusing over torsion angles. Respects molecular geometry naturally. Key component in modern drug discovery pipelines for 3D molecule generation.

arXiv:2206.01729

Protein Design

FrameDiff / SE(3) Diffusion

Yim et al. · MIT / HHMI

Diffusion on SE(3) frames for protein backbone generation. Produces diverse, designable structures. Foundation for many subsequent protein diffusion methods.

arXiv:2302.02277

Protein Design

Genie / Genie 2 — Microsoft

Lin & AlQuraishi · Columbia/MSR

Denoising diffusion over protein backbone angles (φ, ψ). Genie 2 adds sequence co-design — simultaneously generates structure and sequence in a single diffusion process.

arXiv:2301.12485

Generate:Biomedicines — IPO Deep Dive

ProgramTargetIndicationStageIPO Allocation
GB-0895Anti-TSLP mAbSevere asthmaPhase 3~$300M
GB-0895Anti-TSLP mAbCOPDPhase 1b~$100M
GB-4362UndisclosedOncologyPhase 1~$7.5M
GB-5267UndisclosedOncologyPhase 1~$7.5M

Ticker: GENB · Nasdaq · Pricing week of Feb 23 · Partners: Novartis, Amgen · VC: Flagship Pioneering (Moderna's backer)

Cross-Domain Benchmarks

Diffusion is competitive or dominant in every domain it enters. Here's the data.

Text Generation: Diffusion vs AR

Protein Design: Success Rates

Molecular Docking: DiffDock vs Traditional

MethodTypeTop-1 (Å < 2)Top-5 (Å < 2)Speed
DiffDockDiffusion38.2%43.7%~10s
GNINAScoring22.4%27.1%~60s
SMINADocking19.3%24.8%~120s
GLIDEDocking21.7%26.3%~180s
TANKBindGeometric DL20.1%24.5%~5s

Image Generation: FID Scores Over Time

Market & Investment Signals

Diffusion is no longer just a research paradigm — it's driving billions in enterprise value and reshaping biotech IPOs.

Companies Building on Diffusion

IPO

Generate:Biomedicines (GENB)

IPO: ~Feb 27 · $2.2B valuation · $425M raise

Flagship Pioneering's crown jewel. AI-designed protein therapeutics via Chroma diffusion model. Lead asset GB-0895 in Phase 3 for severe asthma. Novartis + Amgen partnerships. Largest biotech IPO attempt of 2026.

BioSpace IPO Tracker

AI/LLM

Inception Labs

Private · NVIDIA-backed · Stanford/Cornell/UCLA

Mercury lineage: first production diffusion LLMs. Mercury 2 at 1,009 tok/s with reasoning. OpenAI API compatible. Customers: Zed, Viant, Skyvern, SearchBlox.

inceptionlabs.ai

Visual AI

Stability AI

$1B+ valuation · Stable Diffusion 3

Pioneer of open-source image diffusion. SD3 uses rectified flow transformers. Powers massive ecosystem of creative tools, enterprise APIs, and fine-tuned models.

Biotech

David Baker Lab / IPD

Nobel Prize 2024 · UW Seattle

RFDiffusion — the most cited protein diffusion method. Open source. Spun out multiple companies (Xaira, Monod Bio). Foundation of modern computational protein design.

Drug Discovery

Recursion (RXRX)

Nasdaq · Q4 results Feb 25

TechBio platform using diffusion-based generative chemistry alongside phenotypic screening. NVIDIA GTC feature with HighRes. 8 clinical programs.

Yahoo Finance

Biotech

Eikon Therapeutics (EIKN)

IPO: Feb 5 · $381M raise

AI-driven drug discovery using live-cell imaging + ML. Led by Merck veteran Roger Perlmutter. EIK1001 in Phase 2/3 for melanoma. 2nd largest biotech IPO of 2026.

BioSpace

Investment Thesis

2026 Biotech IPO Landscape

CompanyRaiseValuationFocusAI/Diffusion?
Generate:Bio$425M$2.2BAI protein design✓ Chroma
Eikon Therapeutics$381M~$2BLive-cell AIML-driven
AgomAb$212M~$1BALK5 inhibitors
SpyGlass Pharma$150MOphthalmology
Aktis Oncology$318MRadiopharma

Key Papers & Resources

The essential reading list for understanding diffusion across domains.

Text / Language Models

Feb 2026
Mercury 2 — First production diffusion reasoning LLM. 1,009 tok/s. Inception Labs
Feb 2026
Scaling Beyond Masked Diffusion Language Models — Uniform-state diffusion can outperform masked diffusion. arXiv:2602.15014
Feb 2026
Survey: Parallel Text Generation — Comprehensive survey from parallel decoding to diffusion LMs. arXiv:2508.08712
Jan 2026
LLaDA — Large Language Diffusion with mAsking. First competitive large-scale masked diffusion LM. arXiv:2502.09992
2024
MDLM — Masked Diffusion Language Model with simplified training. arXiv:2406.07524
2023
SEDD — Score Entropy Discrete Diffusion for text. arXiv:2310.16834

Protein Design

2023
Chroma — Illuminating protein space with generative model. Generate:Biomedicines. Nature (2023)
2023
RFDiffusion — De novo protein design by deep network hallucination. Baker Lab. Nature (2023)
2023
FrameDiff — SE(3) diffusion model for protein backbone generation. arXiv:2302.02277
2023
Genie — Diffusion-based generative model for protein backbone design. arXiv:2301.12485

Molecular Design & Docking

2023
DiffDock — Diffusion steps, twists, and turns for molecular docking. MIT. arXiv:2210.01776 · ICLR 2023
2022
Torsional Diffusion — Molecular conformer generation. arXiv:2206.01729 · NeurIPS 2022
2023
DiffSBDD — Structure-based drug design with equivariant diffusion. arXiv:2210.13695

Foundation / Theory

2020
DDPM — Denoising Diffusion Probabilistic Models. The paper that started it all. Ho et al. arXiv:2006.11239
2021
Score-Based Generative Modeling — Unifying score matching and diffusion. Song et al. arXiv:2011.13456
2022
Flow Matching — Simplified training for continuous normalizing flows. Lipman et al. arXiv:2210.02747