Access new generation Scout and Maverick models that seamlessly blend text, images, and video processing.
Llama 4 benchmarks showcase Meta's impressive new model family with Scout, Maverick, and Behemoth variants utilizing MoE architecture and a 10M token context window. Performance metrics include Maverick's 85.5 on MMLU, 61.2 on MATH, 77.6 on MBPP (coding), 85.3 on ChartQA, 91.6 on DocVQA, and 84.6 on Multilingual MMLU—often outperforming proprietary models like Gemini 2.5 Pro and GPT-4o in long-context and coding tasks.
Discover how Meta's new models change operations across industries.
Llama 4 can analyze inventory, sales, feedback, and market data to provide insights and automate inventory management. Retailers can use it to monitor stock levels, predict demand, and automatically generate purchase orders when needed.
Llama 4 Scout's 10M token window analyzes vast document collections, helping businesses extract insights from SharePoint libraries, technical manuals, and financial reports. Legal firms can use it to summarize cases and identify key clauses across thousands of pages.
Llama 4 Maverick offers 12-language support, enabling AI assistants that analyze both text and uploaded images for 24/7 technical support. These systems identify product issues from images and provide multilingual troubleshooting guidance.
17B parameters (16 experts) outperforms all previous Llama models and competitors like Gemma 3 and Gemini 2.0 Flash on standard benchmarks. Runs on a single H100 GPU with exceptional 10M context window and Mixture of Experts (MoE), making it the leading multimodal model in its class.
17B active parameters with 128 experts surpasses GPT-4o and Gemini 2.0 Flash on key benchmarks, matching DeepSeek v3's reasoning and coding capabilities with less than half the parameters. Delivers unmatched performance-to-cost efficiency in its class.
Get API Key288B parameter model (16 experts) powers the latest releases through distillation. Already surpassing GPT-4.5 and Claude Sonnet 3.7 on multiple STEM metrics despite ongoing training. This represents the most advanced LLM to date, ranking among the world's most intelligent language models.
AI/ML API provides scalability, faster deployment, and access to 200+ advanced machine learning models without the need for extensive in-house expertise or infrastructure.
Our API allows seamless integration of powerful AI capabilities into your applications, regardless of your coding experience. Simply swap your API key to begin using the AI/ML API.
AI/ML API provides flexibility for business growth since you can scale resources by purchasing more tokens as needed, ensuring optimal performance and cost efficiency
We offer flat, predictable pricing, payable by card or cryptocurrency, keeping it the lowest on the market and affordable for everyone.
import os
from openai import OpenAI
client = OpenAI(
base_url="<https://api.aimlapi.com/v1>",
api_key="<YOUR_API_KEY>",
)
response = client.chat.completions.create(
model="meta-llama/llama-4-maverick",
messages=[
{
"role": "system",
"content": "You are an AI assistant who knows everything.",
},
{
"role": "user",
"content": "Tell me, why is the sky blue?"
},
],
)
message = response.choices[0].message.content
print(f"Assistant: {message}")
Visit AI Playground to quickly try Llama 4 Scout and Maverick APIs.
For more information about technical features, please refer to the Llama 4 model cards:
• Llama 4 Scout
• Llama 4 Maverick