The Paradigm Shift
For seven years, autoregressive (AR) models have dominated language AI: predict one token, append it, repeat. In February 2026, the first production diffusion-based reasoning LLM shattered that monopoly. Simultaneously, diffusion models are designing proteins, discovering drugs, and generating images. This is the diffusion paradigm.
What is Diffusion?
Instead of generating content one piece at a time (left-to-right), diffusion models start from noise or masked tokens and iteratively refine everything simultaneously. Think of it as an editor revising a full draft rather than a typewriter producing character by character.
Interactive: Watch Diffusion Refine
From noise → coherent output in a few steps
Why Now?
Scaling Laws Discovered
Recent papers (LLaDA, MDLM, Mercury) show diffusion LMs follow similar scaling laws to AR models — more compute = better quality. The gap is closing fast.
Hardware Alignment
Diffusion's parallel generation maps perfectly to modern GPU architectures (NVIDIA Blackwell). AR models waste compute on sequential dependencies.
Cross-Domain Validation
Diffusion already dominates images (Stable Diffusion), proteins (RFDiffusion), and molecules (DiffDock). Text was the last frontier — Mercury 2 just breached it.
Domain Map
Language Generation
Mercury 2, LLaDA, MDLM, SEDD. Masked and continuous diffusion replacing AR decoding. 1,000+ tok/s with reasoning-grade quality.
Protein Design
Chroma (Generate:Bio), RFDiffusion (Baker Lab), FrameDiff, Genie. Design novel protein structures with specified function from scratch.
Drug Discovery
DiffDock, DiffSBDD, Torsional Diffusion, TargetDiff. Predict binding poses, generate drug-like molecules for specific protein targets.
Visual Generation
Stable Diffusion 3, DALL-E 3, Imagen 3, Sora, Kling. The original diffusion success story, now the backbone of all visual AI.
Architecture Comparison
Three paradigms for content generation. Autoregressive has dominated text; diffusion is proving there's a better way.
Autoregressive (AR)
- One token at a time, left-to-right
- Each token depends on all previous
- Latency scales linearly with length
- Cannot revise earlier tokens
- Examples: GPT-5, Claude, Llama
Masked Diffusion (MDM)
- Start fully masked, iteratively reveal
- All positions attend to all others
- Latency nearly constant regardless of length
- Self-correction: can revise any position
- Examples: Mercury 2, LLaDA, MDLM
Continuous Diffusion
- Operate in continuous embedding space
- Gradual denoising from Gaussian noise
- Natural fit for 3D structures
- Score-matching training objective
- Examples: RFDiffusion, Stable Diffusion
The Speed Advantage
Autoregressive models generate 100 tokens by running the model 100 times sequentially. Diffusion models generate 100 tokens by running the model ~10-20 times in parallel. The advantage compounds with output length.
Key Architectural Innovations
Bidirectional Attention
Unlike AR models' causal masks, diffusion models use full bidirectional attention. Every token can see every other token at every step, enabling richer contextual understanding.
Noise Schedule Design
The masking/noise schedule controls the generation process. Cosine schedules, linear schedules, and learned schedules trade off quality vs speed. Mercury 2 uses a tunable approach.
Classifier-Free Guidance
Borrowed from image diffusion: interpolate between conditional and unconditional generation to boost quality. Applied to text via prompt conditioning.
AR Initialization
Recent work shows initializing diffusion LMs from pretrained AR model weights dramatically improves quality. The AR model provides a strong starting point for the diffusion process.
Diffusion Math (Simplified)
q(x_t | x_0) = mask x_0 with probability β_t
Reverse process (denoise):
p_θ(x_{t-1} | x_t) = neural_network(x_t, t)
Training objective:
L = E[-log p_θ(x_0 | x_t)] — predict clean from noisy
Key insight: same math works for tokens, pixels, atoms, residues
Mercury 2: First Production Diffusion LLM
Launched February 24, 2026 by Inception Labs. The first diffusion-based reasoning model to reach production quality, running at 1,009 tokens/second on NVIDIA Blackwell.
How It Works
Mercury 2 generates responses through parallel refinement — producing multiple tokens simultaneously and converging over a small number of steps. "Less typewriter, more editor revising a full draft at once." It features tunable reasoning depth, native tool use, and schema-aligned JSON output.
Speed Comparison
Benchmark Performance
| Model | Architecture | Speed (tok/s) | Price (out/1M) | Quality Tier |
|---|---|---|---|---|
| Mercury 2 | Diffusion | 1,009 | $0.75 | Speed-optimized |
| GPT-5.2 mini | AR | ~200 | $0.60 | Speed-optimized |
| Claude Haiku 4 | AR | ~180 | $1.25 | Speed-optimized |
| Gemini Flash 2.5 | AR | ~220 | $0.30 | Speed-optimized |
| Llama 4 Scout | AR MoE | ~150 | Self-hosted | Open-source |
| GLM-5 | AR | ~100 | Free | Open-source |
Production Use Cases
🔄 Agentic Loops
Agent workflows chain dozens of inference calls. 5× latency reduction per call compounds across every step. More budget for reasoning depth.
💻 Code Autocomplete
Real-time suggestions that "feel like part of your own thinking" (Zed). Diffusion eliminates the pause that breaks developer flow.
🎤 Voice Interfaces
Tightest latency budget in AI. Mercury 2 makes reasoning-quality responses viable within natural speech cadences.
🔍 Search & RAG
Multi-hop retrieval stacks latency fast. Diffusion lets you add reasoning to the search loop without blowing the latency budget.
Who Built It
Inception Labs — Founded by three professors: from Stanford, Cornell, and UCLA. Backed by NVIDIA. The Mercury lineage: Mercury Coder (Oct 2025) → Mercury 2 (Feb 2026). OpenAI API compatible — drop-in replacement.
Diffusion in Biology
Diffusion models didn't start with text — they conquered protein design first. The same mathematical framework that generates images from noise can generate novel protein structures, drug molecules, and binding poses from scratch.
The Protein Diffusion Pipeline
Key Models & Companies
Chroma — Generate:Biomedicines
Diffusion model for programmable protein design. Generates novel protein structures conditioned on desired properties — function, shape, symmetry. Powers GB-0895 (anti-TSLP antibody, Phase 3 severe asthma) and oncology pipeline.
RFDiffusion — Baker Lab / UW
Fine-tuned from RoseTTAFold structure prediction network for protein design via denoising diffusion. Designs binders, symmetric assemblies, motif scaffolding. Used by hundreds of labs worldwide.
DiffDock — MIT CSAIL
Diffusion model for molecular docking — predicts how drug molecules bind to protein targets. Treats docking as a generative problem over SE(3), generating binding poses through iterative refinement.
Torsional Diffusion
Generates molecular conformations by diffusing over torsion angles. Respects molecular geometry naturally. Key component in modern drug discovery pipelines for 3D molecule generation.
FrameDiff / SE(3) Diffusion
Diffusion on SE(3) frames for protein backbone generation. Produces diverse, designable structures. Foundation for many subsequent protein diffusion methods.
Genie / Genie 2 — Microsoft
Denoising diffusion over protein backbone angles (φ, ψ). Genie 2 adds sequence co-design — simultaneously generates structure and sequence in a single diffusion process.
Generate:Biomedicines — IPO Deep Dive
| Program | Target | Indication | Stage | IPO Allocation |
|---|---|---|---|---|
| GB-0895 | Anti-TSLP mAb | Severe asthma | Phase 3 | ~$300M |
| GB-0895 | Anti-TSLP mAb | COPD | Phase 1b | ~$100M |
| GB-4362 | Undisclosed | Oncology | Phase 1 | ~$7.5M |
| GB-5267 | Undisclosed | Oncology | Phase 1 | ~$7.5M |
Ticker: GENB · Nasdaq · Pricing week of Feb 23 · Partners: Novartis, Amgen · VC: Flagship Pioneering (Moderna's backer)
Cross-Domain Benchmarks
Diffusion is competitive or dominant in every domain it enters. Here's the data.
Text Generation: Diffusion vs AR
Protein Design: Success Rates
Molecular Docking: DiffDock vs Traditional
| Method | Type | Top-1 (Å < 2) | Top-5 (Å < 2) | Speed |
|---|---|---|---|---|
| DiffDock | Diffusion | 38.2% | 43.7% | ~10s |
| GNINA | Scoring | 22.4% | 27.1% | ~60s |
| SMINA | Docking | 19.3% | 24.8% | ~120s |
| GLIDE | Docking | 21.7% | 26.3% | ~180s |
| TANKBind | Geometric DL | 20.1% | 24.5% | ~5s |
Image Generation: FID Scores Over Time
Market & Investment Signals
Diffusion is no longer just a research paradigm — it's driving billions in enterprise value and reshaping biotech IPOs.
Companies Building on Diffusion
Generate:Biomedicines (GENB)
Flagship Pioneering's crown jewel. AI-designed protein therapeutics via Chroma diffusion model. Lead asset GB-0895 in Phase 3 for severe asthma. Novartis + Amgen partnerships. Largest biotech IPO attempt of 2026.
Inception Labs
Mercury lineage: first production diffusion LLMs. Mercury 2 at 1,009 tok/s with reasoning. OpenAI API compatible. Customers: Zed, Viant, Skyvern, SearchBlox.
Stability AI
Pioneer of open-source image diffusion. SD3 uses rectified flow transformers. Powers massive ecosystem of creative tools, enterprise APIs, and fine-tuned models.
David Baker Lab / IPD
RFDiffusion — the most cited protein diffusion method. Open source. Spun out multiple companies (Xaira, Monod Bio). Foundation of modern computational protein design.
Recursion (RXRX)
TechBio platform using diffusion-based generative chemistry alongside phenotypic screening. NVIDIA GTC feature with HighRes. 8 clinical programs.
Eikon Therapeutics (EIKN)
AI-driven drug discovery using live-cell imaging + ML. Led by Merck veteran Roger Perlmutter. EIK1001 in Phase 2/3 for melanoma. 2nd largest biotech IPO of 2026.
Investment Thesis
2026 Biotech IPO Landscape
| Company | Raise | Valuation | Focus | AI/Diffusion? |
|---|---|---|---|---|
| Generate:Bio | $425M | $2.2B | AI protein design | ✓ Chroma |
| Eikon Therapeutics | $381M | ~$2B | Live-cell AI | ML-driven |
| AgomAb | $212M | ~$1B | ALK5 inhibitors | — |
| SpyGlass Pharma | $150M | — | Ophthalmology | — |
| Aktis Oncology | $318M | — | Radiopharma | — |
Key Papers & Resources
The essential reading list for understanding diffusion across domains.