upd

June 23, 2026

min

Grok 4.20 in 2026: Benchmarks, Pricing & API

Grok 4.20 specs, benchmarks and full API pricing ($2.6/$7.8 per 1M). 2M context, 4-agent architecture, 235 tok/s — plus copy-paste API code.

Quick answer

‍Grok 4.20 is xAI's multi-agent flagship — best for long-context and real-time tasks (2M-token window, live X data, ~1493 Arena Elo). It costs $2.6 / $7.8 per 1M tokens via API at 235 tok/s, cheaper than most frontier models at this context size. Weak spot: Nov 2024 knowledge cutoff, offset by live data. Benchmarks, pricing tiers and API code below.

The complete breakdown of xAI's flagship multi-agent model, including its 4-agent architecture, 2M-token context window, real-time data integration, and how to access it via API today.

Quick Facts at a Glance

Grok 4.20 is a multi-agent large language model developed by xAI, released in March 2026. It features a native 4-agent collaborative architecture, a 2 million token context window, real-time integration with X (Twitter) data, and two operational modes — reasoning for depth and non-reasoning for speed.

Developer

xAI

Release Date

March 2026

Model Type

Multi-Agent LLM

Context Window

2M Tokens

Output Speed

235 tok/s

API Input Price

$2.6 / 1M tokens

API Output Price

$7.8 / 1M tokens

Modes

Reasoning Non-Reasoning

Input Modalities

Text Vision

Arena Elo (Apr 2026)

~1493

Agent Count

4 (Standard) / 16 (Heavy)

Knowledge Cutoff

Nov 2024 + Live X Data

What Is Grok 4.20?

Grok 4.20 is xAI's flagship large language model, and the most consequential release the company has shipped since Grok 4 in July 2025. Where that earlier model relied on a single unified architecture, Grok 4.20 fundamentally rethinks how inference works. Instead of one model doing everything, it deploys a team of four specialized agents that work in parallel, debate conclusions, and synthesize a final answer behind the scenes. The result is a system that feels qualitatively sharper on complex, multi-step tasks, not because it got bigger, but because it got smarter about how it uses what it already knows.

The beta launched on February 17, 2026. Full release and API access followed on March 10, 2026, at which point three model variants became available: grok-4.20-0309-reasoning, grok-4.20-0309-non-reasoning, and grok-4.20-multi-agent-0309.

Architecture: 3 Trillion Parameters (MoE)

Built on a Mixture-of-Experts backbone similar to Grok 4, with pre-training-scale reinforcement learning applied to refine reasoning quality. The model shares weights across its four agents, keeping compute costs far below what four independent models would cost.

Two Modes, One Endpoint

Reasoning mode generates visible chain-of-thought before responding, improving accuracy on math, code, and multi-step logic. Non-reasoning mode skips the deliberation step for lower latency and cheaper token costs — ideal for production pipelines that don't need deep analysis.

Live Context via X Firehose

Grok has access to approximately 68 million English-language posts per day from X. This isn't just a search plugin — the signal is used for real-time grounding at millisecond latency, which is what gave an early Grok 4.20 checkpoint its edge in the Alpha Arena financial trading simulation.

Weekly Iterative Updates

Unlike models that ship and stall, Grok 4.20 follows a rapid iteration cycle. Beta 2 shipped in April 2026 with improvements to instruction following, LaTeX rendering, multi-image handling, and reduced hallucination rates. xAI publishes release notes with each update.

Key Features & Capabilities

Here's what actually matters for developers and teams evaluating Grok 4.20 for real workloads.

Native Multi-Agent Architecture

`Core Differentiator`

This is the headline capability. Unlike systems where multi-agent behavior is a developer-built wrapper around a single model, Grok 4.20's four-agent council — Grok (coordinator), Harper (research), Benjamin (math/code), and Lucas (synthesis/creativity) — runs natively at inference time. All four operate in parallel on shared weights and cached context. They debate intermediate results and the coordinator synthesizes the final answer. The overhead is roughly 1.5–2.5× a single call, not 4×, because of shared KV caching on xAI's Colossus infrastructure.

2M Token Context Window

`Scale Advantage`

Two million tokens is roughly 3,000 pages of standard A4 text. In practical terms, you can feed an entire code repository, a full quarter of financial documents, or several hours of meeting transcripts into a single prompt. For developers building RAG pipelines, the massive context significantly reduces chunking complexity — many retrieval steps simply become unnecessary. No other flagship model currently matches this window size at this price point.

Real-Time X Data Integration

`Live Grounding`

The Harper agent ingests roughly 68 million English posts per day from X's firehose at millisecond-level latency. This makes Grok 4.20 genuinely useful for tasks that require current awareness: trending news analysis, live financial sentiment, breaking event summarization. The knowledge cutoff of November 2024 is effectively extended by live data for many real-world queries. This is an infrastructure moat that competitors cannot easily replicate.

Visible Chain-of-Thought Reasoning

`Explainability`

In reasoning mode, Grok 4.20 shows its work before delivering a final answer. This isn't just a UX feature — the intermediate steps allow developers to validate logic chains, catch errors before they propagate, and build higher-trust applications in legal, medical, and financial contexts. The approach adds latency per request but measurably improves accuracy on multi-step problems, mathematical proofs, and complex code debugging.

Vision & Multimodal Input

`Multimodal`

Grok 4.20 accepts both text and image inputs natively. Images discovered during search operations are charged per image token. The April 2026 Beta 2 update improved multi-image rendering accuracy and image search precision. Output remains text-only; image generation is handled separately by Grok Imagine. For vision tasks — document parsing, chart analysis, screenshot debugging — the model handles complex visual inputs alongside long text context.

Generation Speed: 235 Tokens/Second

`Performance`

Among flagship models, Grok 4.20 is currently the fastest — outputting approximately 235 tokens per second according to April 2026 benchmark data. That's three to four times the generation speed of some competitors at the frontier. For latency-sensitive applications like real-time copilots, customer-facing chat, and streaming interfaces, this is a genuine operational advantage, especially combined with the low API pricing.

Inside the 4-Agent Council

The standard Grok 4.20 model runs four specialized replicas of the underlying architecture in parallel. The Heavy tier scales this to 16 agents for extreme research workloads.

Grok — The Captain

Coordinator · Synthesizer · Arbiter

Decomposes the incoming query into sub-tasks, assigns work to the other agents, resolves conflicts when agents disagree, and assembles the final coherent output. Every response passes through here last.

Harper — The Researcher

Real-Time Data · Fact Verification

Handles all research-intensive tasks: live web search, X firehose data ingestion, source verification, and evidence integration. Ensures outputs are current rather than limited by training cutoff.

Benjamin — The Logician

Math · Code · Rigorous Reasoning

Takes on numerical computation, code generation and debugging, mathematical proofs, and step-by-step logical chains. Stress-tests strategies produced by other agents before synthesis.

Lucas — The Creator

Synthesis · Creative Drafting · Ideas

Generates novel framings, creative drafts, and polished outputs. Works with the Captain to translate analysis into clear, structured, and useful results for the end user.

The agent collaboration happens entirely at inference time, you don't need to orchestrate it manually. From an API perspective, Grok 4.20 behaves like a standard model. The multi-agent layer is invisible in the request/response format.

Benchmarks & Performance Data

Grok 4.20 holds an approximate Chatbot Arena Elo of 1,493 as of April 2026 — neck and neck with Gemini 3.1 Pro, and positioned just below GPT-5.4's composite leadership. It leads all frontier flagships on generation speed and context window size, and is the most cost-efficient option among top-tier models. On the hardest reasoning benchmarks (Humanity's Last Exam), the Grok 4 series leads the pack at 50.7%.

Model	Arena Elo	GPQA Diamond	HLE	Speed	Context	API ($/1M)
Grok 4.20	~1493	~88%	50.7%	235 tok/s	2M	$2.6
GPT-5.4	~1510	92.8%	—	~80 tok/s	128K	$3.25
Claude Opus 4.6	~1504	91%+	—	~60 tok/s	1M	$6.5
Gemini 3.1 Pro	~1493	94.3%	—	~90 tok/s	1M	$2.6
DeepSeek V4	~1470	~89%	—	~200 tok/s	128K	$0.28

Grok 4.20 API Pricing: Reasoning vs Non-Reasoning

Grok 4.20 exposes its modes as separate model IDs, but they share the same per-token rate on AI/ML API: $2.6 per 1M input tokens and $7.8 per 1M output tokens. The real cost difference between modes isn't the rate — it's how many tokens each mode generates.

Variant	Model ID	Input ($/1M)	Output ($/1M)	What drives your bill
Non-reasoning	x-ai/grok-4-20-0309-non-reasoning	$2.6	$7.8	Lowest effective cost per call — no hidden reasoning tokens. Best for high-throughput, latency-sensitive workloads.
Reasoning	x-ai/grok-4-20-0309-reasoning	$2.6	$7.8	Same rate, but visible chain-of-thought tokens are billed at the output rate, so each call costs more. Best for math, code, and multi-step logic.

How to keep the bill down:

Pick non-reasoning by default. Switch to reasoning only when a task genuinely needs step-by-step deliberation — the accuracy gain is real, but you pay for every reasoning token.
Reuse a stable system prompt. Repeated context qualifies for cached-input pricing on xAI's underlying billing (roughly 85% off the standard input rate), which matters most in agent loops where the system prompt stays constant.
Budget for tools separately. Built-in web search, X search, code execution, and file attachments are billed per call on top of token costs.

How to Call Grok 4.20 via API

Grok 4.20 is OpenAI-compatible through AI/ML API — one key, one base URL, and you can swap between reasoning and non-reasoning by changing the model string. Replace <YOUR_AIMLAPI_KEY> with your key from the dashboard.

cURL

curl -L \
  --request POST \
  --url 'https://api.aimlapi.com/v1/chat/completions' \
  --header 'Authorization: Bearer <YOUR_AIMLAPI_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "x-ai/grok-4-20-0309-reasoning",
    "messages": [
      { "role": "user", "content": "Summarize today'\''s top AI stories from X." }
    ]
  }'

Python

import requests
import json

response = requests.post(
    "https://api.aimlapi.com/v1/chat/completions",
    headers={
        "Authorization": "Bearer <YOUR_AIMLAPI_KEY>",
        "Content-Type": "application/json",
    },
    json={
        # Swap to "x-ai/grok-4-20-0309-non-reasoning" for faster, cheaper calls
        "model": "x-ai/grok-4-20-0309-reasoning",
        "messages": [
            {"role": "user", "content": "Explain the 4-agent architecture in one paragraph."}
        ],
    },
)

data = response.json()
print(json.dumps(data, indent=2, ensure_ascii=False))

Node.js

async function main() {
  const response = await fetch("https://api.aimlapi.com/v1/chat/completions", {
    method: "POST",
    headers: {
      "Authorization": "Bearer <YOUR_AIMLAPI_KEY>",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "x-ai/grok-4-20-0309-non-reasoning",
      messages: [
        { role: "user", content: "Draft a tweet about Grok 4.20's speed." }
      ],
    }),
  });

  const data = await response.json();
  console.log(JSON.stringify(data, null, 2));
}

main();

Because the format is OpenAI-compatible, you can also point any existing OpenAI SDK at https://api.aimlapi.com/v1 and use the same model IDs without rewriting your client code.

Real-World Use Cases

Financial Analysis & Live Market Intelligence

Grok 4.20 demonstrated this before it was publicly released. An early checkpoint topped the Alpha Arena stock trading simulation with roughly 10–12% returns, using X firehose data for real-time sentiment signals. For analysts building live dashboards, earnings call summarizers, or portfolio commentary tools, the live data integration plus Benjamin's rigorous numerical reasoning is a compelling combination.

Large-Scale Code Analysis & Refactoring

The 2M context window makes Grok 4.20 particularly strong for codebases too large to fit in competitors' context windows. Feed an entire repository, describe the refactoring goal, and let Benjamin handle the logic chain. Reasoning mode is worth the latency cost here — the chain-of-thought output gives developers a reviewable trace of every decision before touching production code.

Academic Research & Literature Synthesis

Harper's fact-verification plus the 2M context window makes Grok 4.20 useful for researchers who need to synthesize large bodies of literature. Load multiple papers, ask for contradictions, gaps, and emerging themes. The reasoning trace is particularly useful for academic work, it's easier to audit and cite than a black-box response.

Agentic Pipelines & Workflow Automation

The multi-agent architecture makes Grok 4.20 naturally suited to agentic workflows where tasks need to be decomposed, parallelized, and synthesized. The xAI API's server-side tool support (code interpreter, file search, web search, image generation) gives developers a rich toolkit for building complex autonomous applications without external orchestration frameworks.

Legal & Compliance Document Review

Contract analysis, regulatory compliance checks, and cross-jurisdictional comparisons all benefit from long context and chain-of-thought explainability. Feeding an entire contract suite into a single Grok 4.20 call, rather than chunking and reassembling, reduces the risk of missed cross-references and produces more coherent analysis.

Real-Time News Monitoring & Content Tools

For media companies, newsrooms, and content teams, the X firehose integration enables use cases that static-knowledge models simply can't support: breaking story summaries, trend analysis, social sentiment monitoring. Combined with Grok Imagine for image generation, the API ecosystem supports end-to-end content production pipelines.

How Grok 4.20 Stacks Up

No single model wins everything in 2026. The right choice depends on what your application actually needs. Here's an honest comparison across the dimensions that matter most for development teams.

Model	Best at	Context	Live data	Vision	API cost	Speed
Grok 4.20	Speed, context, live grounding, HLE reasoning	2M	✓ Native X	✓	$2.6 / $7.8	235 t/s
GPT-5.4	Composite benchmarks, computer use, plugins	128K	◑ Bing	✓	$3.25 / $19.5	~80 t/s
Claude Opus 4.6	Coding, nuanced writing, long instruction following	1M	—	✓	$6.5 / $32.5	~60 t/s
Gemini 3.1 Pro	GPQA Diamond, multimodal, scientific reasoning	1M	◑ Search	✓	$2.6 / $15.6	~90 t/s
DeepSeek V4	Cost efficiency, Python coding, open-weight option	128K	—	◑	$0.28 / $0.50	~200 t/s

Bottom line: If your workload needs real-time data, very long context, or maximum throughput at low cost — Grok 4.20 is the strongest option right now. If you need best-in-class coding (Claude Opus 4.6), top GPQA scores (Gemini 3.1 Pro), or all-around benchmark leadership with computer use (GPT-5.4), those models still lead in their respective lanes.

Who Should Use Grok 4.20?

Grok 4.20 is the right model if your priority is throughput, context depth, live data, or cost efficiency. At 235 tokens per second, a 2M token window, and $2 per million input tokens, it's the fastest and most context-capable frontier model on the market right now — and one of the cheapest to operate at scale. The native 4-agent architecture delivers measurably better results on complex, multi-step tasks without requiring any changes to your API integration.

It's probably not your first choice if you need top-tier coding benchmark scores (Claude Opus 4.6), best GPQA Diamond performance (Gemini 3.1 Pro), or a mature plugin ecosystem with computer-use capabilities (GPT-5.4). And the lack of published per-model benchmarks from xAI means you'll want to run your own evals before making it your production default on critical tasks.

For developers who want to test it right now without setting up a separate xAI account, AI/ML API is the fastest path — one API key, OpenAI-compatible format, and access to all Grok 4.20 variants alongside hundreds of other models for comparison and fallback routing.

What's next: Grok 5 is currently in training on xAI's Colossus supercluster, targeting a public beta potentially in May–June 2026. At a rumored 6 trillion parameters, it would be the largest publicly announced model ever. We'll update this review when it ships.

Frequently Asked Questions

What is Grok 4.20?

Grok 4.20 is xAI's flagship large language model, released in beta on February 17, 2026 and made fully available with API access on March 10, 2026. It uses a native 4-agent collaborative architecture (Standard) or 16-agent architecture (Heavy), supports a 2 million token context window, processes text and image inputs, and integrates real-time data from X (Twitter). Two modes are available: reasoning for chain-of-thought accuracy on complex tasks, and non-reasoning for fast, high-throughput generation.

How does Grok 4.20 compare to GPT-5.4?

Grok 4.20 leads on generation speed (235 vs ~80 tokens/second), context window (2M vs 128K tokens), and API cost ($2/$6 vs $2.50/$15 per million tokens). GPT-5.4 leads on composite benchmark scores, coding (SWE-Bench), and computer-use tasks (OSWorld). For real-time data tasks and long-document analysis, Grok 4.20 has a structural advantage. For structured reasoning and plugin ecosystem breadth, GPT-5.4 is stronger. Most teams benefit from routing different tasks to different models.

What is the context window of Grok 4.20?

Grok 4.20 supports a 2 million token context window — the largest among current frontier flagships. That's roughly 3,000 pages of standard A4 text. In multi-agent mode, all four agents share this context window, enabling comprehensive analysis of very large documents, codebases, or conversation histories without chunking.

Is Grok 4.20 open-source?

No. Grok 4.20 is a proprietary closed-weight model developed by xAI. Access is provided through grok.com, the Grok apps, and API endpoints. xAI has not announced plans to release Grok 4.20 weights publicly.

Example H2

Share with friends

Ready to get started? Get Your API Key Now!

Get API Key