256K
3.15
15.75
Chat
Active

Grok 4

Optimized for long‑form planning and robust agentic behavior, Grok 4 features a 256k context window and excels at step‑by‑step problem solving, math, logic, and instruction alignment. While multimodal capabilities are limited, Grok 4 dominates in text‑only domains and outperforms previous models across multiple SOTA evaluations.
Try it now
Testimonials

Our Clients' Voices

Grok 4Techflow Logo - Techflow X Webflow Template

Grok 4

Grok 4 is designed for advanced reasoning and complex tool‑use workflows. Built on the Grok 3 architecture with 10× more reinforcement learning compute, it sets state‑of‑the‑art scores on tasks like ARC-AGI‑2, AIME25, and Humanity’s Last Exam (HLE).

xAI Grok 4 Description

Grok 4 is the latest large language model from xAI, designed for high-level reasoning, agentic behavior, and real-world task automation. It builds upon Grok 3’s architecture, but trains reasoning with 10× more compute and integrates tool use directly into its RLHF pipeline.

Technical Specification

Performance Benchmarks

  • Context Window: 256,000 tokens
  • Max Output: ~4,096 tokens
  • Training Regime: 10× more RL compute than Grok 3
  • Tool Use: Native, with strong multi-step support

Performance Metrics

  • SOTA on ARC-AGI-2: 15.9%
  • AIME 2025: 76.9% accuracy
  • Humanity’s Last Exam (HLE):
    • With tools: 44.4% overall, 50.7% on text-only section
    • Without tools: 25.4% (vs 21.6% Gemini 2.5 Pro)
Metrics

Key Capabilities

  • Multi-step reasoning across long contexts
  • Native tool-use through real/synthetic environments
  • Deterministic outputs (non-streamed)
  • Planning with API execution
  • Robust performance on AGI-style benchmarks

API Pricing

Input: 0–128k: $3.15; 128k+: $6.3; cache: $0.75 per 1M tokens

Output: 0–128k: $15.75; 128k+: $31.5 per 1M tokens

Code Samples

Comparison with Other Models

  • vs. GPT‑4o: GPT‑4o leads in multimodality and web browsing. Grok 4 offers better reasoning performance and tool integration in AGI-style tasks.
  • vs. Claude 4 Opus: Claude 4 excels in language safety and alignment. Grok 4 outperforms on ARC-AGI-2 (15.9% vs 8.6%) and HLE, especially in tool-enabled setups.
  • vs. Gemini 2.5 Pro: Gemini is strong in speed and instruction following. Grok 4 surpasses in zero-shot reasoning and planning (HLE 25.4% vs 21.6% without tools).
  • vs. Grok 3: Grok 4 is a major upgrade over Grok 3, trained with 10× more RL compute and native tool-use instruction. It achieves 25.4% on Humanity’s Last Exam without tools (vs. Grok 3’s ~14.7%), and delivers better multi-step reasoning and factual recall.

Limitations

  • Text-only (no vision/audio support as of Grok 4)
  • Tool use not compositional (sequential only)
  • Closed-weight model
  • Seed determinism may be unreliable in streaming
  • No public inference locally or offline

API Integration

Accessible via AI/ML API. Sign up here.

Try it now

400+ AI Models

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The Best Growth Choice
for Enterprise

Get API Key