Diffusion Models: The New Paradigm

The Paradigm Shift

For seven years, autoregressive (AR) models have dominated language AI: predict one token, append it, repeat. In February 2026, the first production diffusion-based reasoning LLM shattered that monopoly. Simultaneously, diffusion models are designing proteins, discovering drugs, and generating images. This is the diffusion paradigm.

1,009

tok/s — Mercury 2

5×

faster than AR speed models

$2.2B

Generate:Bio valuation

domains disrupted

What is Diffusion?

Instead of generating content one piece at a time (left-to-right), diffusion models start from noise or masked tokens and iteratively refine everything simultaneously. Think of it as an editor revising a full draft rather than a typewriter producing character by character.

Interactive: Watch Diffusion Refine

From noise → coherent output in a few steps

Why Now?

Scaling Laws Discovered

Recent papers (LLaDA, MDLM, Mercury) show diffusion LMs follow similar scaling laws to AR models — more compute = better quality. The gap is closing fast.

Hardware Alignment

Diffusion's parallel generation maps perfectly to modern GPU architectures (NVIDIA Blackwell). AR models waste compute on sequential dependencies.

Cross-Domain Validation

Diffusion already dominates images (Stable Diffusion), proteins (RFDiffusion), and molecules (DiffDock). Text was the last frontier — Mercury 2 just breached it.

Domain Map

Text / Code

Language Generation

Mercury 2, LLaDA, MDLM, SEDD. Masked and continuous diffusion replacing AR decoding. 1,000+ tok/s with reasoning-grade quality.

Proteins

Protein Design

Chroma (Generate:Bio), RFDiffusion (Baker Lab), FrameDiff, Genie. Design novel protein structures with specified function from scratch.

Molecules

Drug Discovery

DiffDock, DiffSBDD, Torsional Diffusion, TargetDiff. Predict binding poses, generate drug-like molecules for specific protein targets.

Images / Video

Visual Generation

Stable Diffusion 3, DALL-E 3, Imagen 3, Sora, Kling. The original diffusion success story, now the backbone of all visual AI.

Architecture Comparison

Three paradigms for content generation. Autoregressive has dominated text; diffusion is proving there's a better way.

Autoregressive (AR)

Step 1: [The] → predict next Step 2: [The][cat] → predict next Step 3: [The][cat][sat] → ... ⏱ Sequential, N steps for N tokens

One token at a time, left-to-right
Each token depends on all previous
Latency scales linearly with length
Cannot revise earlier tokens
Examples: GPT-5, Claude, Llama

Masked Diffusion (MDM)

Step 1: [_][_][_][_][_][_] → all masked Step 2: [The][_][_][sat][_][_] → unmask some Step 3: [The][cat][_][sat][on][_] → refine Step 4: [The][cat][sat][on][the][mat] ✓ ⚡ Parallel, ~10-20 steps for any length

Start fully masked, iteratively reveal
All positions attend to all others
Latency nearly constant regardless of length
Self-correction: can revise any position
Examples: Mercury 2, LLaDA, MDLM

Continuous Diffusion

Step 1: x₀ ~ N(0,I) → random noise Step 2: x₁ = denoise(x₀) → structure emerges Step 3: x₂ = denoise(x₁) → details form Step K: xₖ = denoise(xₖ₋₁) → final output 🧬 Used for structures, images, molecules

Operate in continuous embedding space
Gradual denoising from Gaussian noise
Natural fit for 3D structures
Score-matching training objective
Examples: RFDiffusion, Stable Diffusion

The Speed Advantage

Autoregressive models generate 100 tokens by running the model 100 times sequentially. Diffusion models generate 100 tokens by running the model ~10-20 times in parallel. The advantage compounds with output length.

Key Architectural Innovations

Bidirectional Attention

Unlike AR models' causal masks, diffusion models use full bidirectional attention. Every token can see every other token at every step, enabling richer contextual understanding.

Noise Schedule Design

The masking/noise schedule controls the generation process. Cosine schedules, linear schedules, and learned schedules trade off quality vs speed. Mercury 2 uses a tunable approach.

Classifier-Free Guidance

Borrowed from image diffusion: interpolate between conditional and unconditional generation to boost quality. Applied to text via prompt conditioning.

AR Initialization

Recent work shows initializing diffusion LMs from pretrained AR model weights dramatically improves quality. The AR model provides a strong starting point for the diffusion process.

Diffusion Math (Simplified)

    Forward process (add noise):

    q(x_t | x_0) = mask x_0 with probability β_t

    Reverse process (denoise):

    p_θ(x_{t-1} | x_t) = neural_network(x_t, t)

    Training objective:

    L = E[-log p_θ(x_0 | x_t)] — predict clean from noisy

    Key insight: same math works for tokens, pixels, atoms, residues

Mercury 2: First Production Diffusion LLM

Launched February 24, 2026 by Inception Labs. The first diffusion-based reasoning model to reach production quality, running at 1,009 tokens/second on NVIDIA Blackwell.

1,009

tokens per second

$0.25

per 1M input tokens

$0.75

per 1M output tokens

128K

context window

How It Works

Mercury 2 generates responses through parallel refinement — producing multiple tokens simultaneously and converging over a small number of steps. "Less typewriter, more editor revising a full draft at once." It features tunable reasoning depth, native tool use, and schema-aligned JSON output.

Speed Comparison

Benchmark Performance

Model	Architecture	Speed (tok/s)	Price (out/1M)	Quality Tier
Mercury 2	Diffusion	1,009	$0.75	Speed-optimized
GPT-5.2 mini	AR	~200	$0.60	Speed-optimized
Claude Haiku 4	AR	~180	$1.25	Speed-optimized
Gemini Flash 2.5	AR	~220	$0.30	Speed-optimized
Llama 4 Scout	AR MoE	~150	Self-hosted	Open-source
GLM-5	AR	~100	Free	Open-source

Production Use Cases

🔄 Agentic Loops

Agent workflows chain dozens of inference calls. 5× latency reduction per call compounds across every step. More budget for reasoning depth.

💻 Code Autocomplete

Real-time suggestions that "feel like part of your own thinking" (Zed). Diffusion eliminates the pause that breaks developer flow.

🎤 Voice Interfaces

Tightest latency budget in AI. Mercury 2 makes reasoning-quality responses viable within natural speech cadences.

🔍 Search & RAG

Multi-hop retrieval stacks latency fast. Diffusion lets you add reasoning to the search loop without blowing the latency budget.

Who Built It

Inception Labs — Founded by three professors: from Stanford, Cornell, and UCLA. Backed by NVIDIA. The Mercury lineage: Mercury Coder (Oct 2025) → Mercury 2 (Feb 2026). OpenAI API compatible — drop-in replacement.

Source: inceptionlabs.ai · HN Discussion (221 pts)

Diffusion in Biology

Diffusion models didn't start with text — they conquered protein design first. The same mathematical framework that generates images from noise can generate novel protein structures, drug molecules, and binding poses from scratch.

The Protein Diffusion Pipeline

Specify Intent

Define target function, binding site, or therapeutic goal

Noise → Backbone

Diffusion generates 3D protein backbone from Gaussian noise

Sequence Design

ProteinMPNN or inverse folding assigns amino acid sequence

Validate

AlphaFold/ESMFold confirms structure, wet lab testing

Therapeutic

Optimized protein enters clinical development

Key Models & Companies

Protein Design

Chroma — Generate:Biomedicines

Flagship Pioneering · IPO Feb 27, 2026 · $2.2B valuation

Diffusion model for programmable protein design. Generates novel protein structures conditioned on desired properties — function, shape, symmetry. Powers GB-0895 (anti-TSLP antibody, Phase 3 severe asthma) and oncology pipeline.

Nature 2023 · generatebiomedicines.com

Protein Design

RFDiffusion — Baker Lab / UW

David Baker (Nobel 2024) · Open source

Fine-tuned from RoseTTAFold structure prediction network for protein design via denoising diffusion. Designs binders, symmetric assemblies, motif scaffolding. Used by hundreds of labs worldwide.

Nature 2023 · GitHub

Molecule

DiffDock — MIT CSAIL

Corso et al. · ICLR 2023

Diffusion model for molecular docking — predicts how drug molecules bind to protein targets. Treats docking as a generative problem over SE(3), generating binding poses through iterative refinement.

arXiv:2210.01776 · GitHub

Molecule

Torsional Diffusion

MIT · NeurIPS 2022

Generates molecular conformations by diffusing over torsion angles. Respects molecular geometry naturally. Key component in modern drug discovery pipelines for 3D molecule generation.

arXiv:2206.01729

Protein Design

FrameDiff / SE(3) Diffusion

Yim et al. · MIT / HHMI

Diffusion on SE(3) frames for protein backbone generation. Produces diverse, designable structures. Foundation for many subsequent protein diffusion methods.

arXiv:2302.02277

Protein Design

Genie / Genie 2 — Microsoft

Lin & AlQuraishi · Columbia/MSR

Denoising diffusion over protein backbone angles (φ, ψ). Genie 2 adds sequence co-design — simultaneously generates structure and sequence in a single diffusion process.

arXiv:2301.12485

Generate:Biomedicines — IPO Deep Dive

Program	Target	Indication	Stage	IPO Allocation
GB-0895	Anti-TSLP mAb	Severe asthma	Phase 3	~$300M
GB-0895	Anti-TSLP mAb	COPD	Phase 1b	~$100M
GB-4362	Undisclosed	Oncology	Phase 1	~$7.5M
GB-5267	Undisclosed	Oncology	Phase 1	~$7.5M

Ticker: GENB · Nasdaq · Pricing week of Feb 23 · Partners: Novartis, Amgen · VC: Flagship Pioneering (Moderna's backer)

Cross-Domain Benchmarks

Diffusion is competitive or dominant in every domain it enters. Here's the data.

Text Generation: Diffusion vs AR

Protein Design: Success Rates

Molecular Docking: DiffDock vs Traditional

Method	Type	Top-1 (Å < 2)	Top-5 (Å < 2)	Speed
DiffDock	Diffusion	38.2%	43.7%	~10s
GNINA	Scoring	22.4%	27.1%	~60s
SMINA	Docking	19.3%	24.8%	~120s
GLIDE	Docking	21.7%	26.3%	~180s
TANKBind	Geometric DL	20.1%	24.5%	~5s

Image Generation: FID Scores Over Time

Market & Investment Signals

Diffusion is no longer just a research paradigm — it's driving billions in enterprise value and reshaping biotech IPOs.

Companies Building on Diffusion

IPO

Generate:Biomedicines (GENB)

IPO: ~Feb 27 · $2.2B valuation · $425M raise

Flagship Pioneering's crown jewel. AI-designed protein therapeutics via Chroma diffusion model. Lead asset GB-0895 in Phase 3 for severe asthma. Novartis + Amgen partnerships. Largest biotech IPO attempt of 2026.

BioSpace IPO Tracker

AI/LLM

Inception Labs

Private · NVIDIA-backed · Stanford/Cornell/UCLA

Mercury lineage: first production diffusion LLMs. Mercury 2 at 1,009 tok/s with reasoning. OpenAI API compatible. Customers: Zed, Viant, Skyvern, SearchBlox.

inceptionlabs.ai

Visual AI

Stability AI

$1B+ valuation · Stable Diffusion 3

Pioneer of open-source image diffusion. SD3 uses rectified flow transformers. Powers massive ecosystem of creative tools, enterprise APIs, and fine-tuned models.

Biotech

David Baker Lab / IPD

Nobel Prize 2024 · UW Seattle

RFDiffusion — the most cited protein diffusion method. Open source. Spun out multiple companies (Xaira, Monod Bio). Foundation of modern computational protein design.

Drug Discovery

Recursion (RXRX)

Nasdaq · Q4 results Feb 25

TechBio platform using diffusion-based generative chemistry alongside phenotypic screening. NVIDIA GTC feature with HighRes. 8 clinical programs.

Yahoo Finance

Biotech

Eikon Therapeutics (EIKN)

IPO: Feb 5 · $381M raise

AI-driven drug discovery using live-cell imaging + ML. Led by Merck veteran Roger Perlmutter. EIK1001 in Phase 2/3 for melanoma. 2nd largest biotech IPO of 2026.

BioSpace

Investment Thesis

2026 Biotech IPO Landscape

Company	Raise	Valuation	Focus	AI/Diffusion?
Generate:Bio	$425M	$2.2B	AI protein design	✓ Chroma
Eikon Therapeutics	$381M	~$2B	Live-cell AI	ML-driven
AgomAb	$212M	~$1B	ALK5 inhibitors	—
SpyGlass Pharma	$150M	—	Ophthalmology	—
Aktis Oncology	$318M	—	Radiopharma	—

Key Papers & Resources

The essential reading list for understanding diffusion across domains.

Text / Language Models

Feb 2026

Mercury 2 — First production diffusion reasoning LLM. 1,009 tok/s. Inception Labs

Feb 2026

Scaling Beyond Masked Diffusion Language Models — Uniform-state diffusion can outperform masked diffusion. arXiv:2602.15014

Feb 2026

Survey: Parallel Text Generation — Comprehensive survey from parallel decoding to diffusion LMs. arXiv:2508.08712

Jan 2026

LLaDA — Large Language Diffusion with mAsking. First competitive large-scale masked diffusion LM. arXiv:2502.09992

2024

MDLM — Masked Diffusion Language Model with simplified training. arXiv:2406.07524

2023

SEDD — Score Entropy Discrete Diffusion for text. arXiv:2310.16834

Protein Design

2023

Chroma — Illuminating protein space with generative model. Generate:Biomedicines. Nature (2023)

2023

RFDiffusion — De novo protein design by deep network hallucination. Baker Lab. Nature (2023)

2023

FrameDiff — SE(3) diffusion model for protein backbone generation. arXiv:2302.02277

2023

Genie — Diffusion-based generative model for protein backbone design. arXiv:2301.12485

Molecular Design & Docking

2023

DiffDock — Diffusion steps, twists, and turns for molecular docking. MIT. arXiv:2210.01776 · ICLR 2023

2022

Torsional Diffusion — Molecular conformer generation. arXiv:2206.01729 · NeurIPS 2022

2023

DiffSBDD — Structure-based drug design with equivariant diffusion. arXiv:2210.13695

Foundation / Theory

2020

DDPM — Denoising Diffusion Probabilistic Models. The paper that started it all. Ho et al. arXiv:2006.11239

2021

Score-Based Generative Modeling — Unifying score matching and diffusion. Song et al. arXiv:2011.13456

2022

Flow Matching — Simplified training for continuous normalizing flows. Lipman et al. arXiv:2210.02747