Why is the input price the most important number?

Most production workloads (RAG, agents, code assistants) read far more tokens than they write. Input price typically dominates the total bill by 5–20x.

Developer Tools

AI Model Picker

An opinionated capability matrix of the frontier LLMs I actually deploy with. Filter by use case, compare context windows and per-token pricing side by side. Editorial, not exhaustive.

EditorialLast updated 2026-05-31

Use case

Vendors

Sort by14 of 14 models

Model	Context	$ Input /1M	$ Output /1M	Latency	Capabilities	Cutoff	Best for
Llama 4 Scout Meta	10M	$0.20	$0.60	Fast	VisionTools	2025-03	Cheap bulkLong context
DeepSeek V3 DeepSeek	128K	$0.27	$1.10	Fast	VisionTools	2024-07	Cheap bulkCoding
Gemini 2.5 Flash Google	1M	$0.30	$1.20	Fast	VisionTools	2025-04	Cheap bulkLong contextVision
GPT-5 mini OpenAI	200K	$0.50	$2	Fast	VisionTools	2025-10	Cheap bulkVision
Llama 4 Maverick Meta	1M	$0.50	$1.50	Medium	VisionTools	2025-03	Cheap bulkLong contextVision
DeepSeek R1 DeepSeek	128K	$0.55	$2.19	Medium	VisionTools	2024-12	ReasoningCodingCheap bulk
Claude Haiku 4.5 Anthropic	200K	$1	$5	Fast	VisionTools	2025-08	Cheap bulkVision
Mistral Large 2 Mistral	128K	$2	$6	Medium	VisionTools	2024-07	Coding
Gemini 2.5 Pro Google	2M	$2.50	$15	Medium	VisionTools	2025-04	Long contextVisionReasoning
Claude Sonnet 4.6 Anthropic	1M	$3	$15	Medium	VisionTools	2025-08	CodingLong contextVision
Grok 4 xAI	256K	$5	$15	Medium	VisionTools	2025-06	ReasoningCodingVision
GPT-5 OpenAI	400K	$10	$30	Medium	VisionTools	2025-10	CodingReasoningVision
Claude Opus 4.7 Anthropic	1M	$15	$75	Slow	VisionTools	2026-01	CodingLong contextReasoningVision
o3 OpenAI	200K	$20	$80	Slow	VisionTools	2024-10	ReasoningCoding

Llama 4 Scout

Meta · Llama 4

Fast

Context

10M

In /1M

$0.20

Out /1M

$0.60

VisionTools 2025-03

Cheap bulkLong context

10M context window — niche but unmatched for chat-over-huge-corpus prototypes.

DeepSeek V3

DeepSeek · DeepSeek

Fast

Context

128K

In /1M

$0.27

Out /1M

$1.10

VisionTools 2024-07

Cheap bulkCoding

General-purpose chat. Excellent value, weaker than R1 on hard reasoning.

Gemini 2.5 Flash

Google · Gemini 2.5

Fast

Context

In /1M

$0.30

Out /1M

$1.20

VisionTools 2025-04

Cheap bulkLong contextVision

Cheapest long-context option on the market. Good for RAG over large corpora.

GPT-5 mini

OpenAI · GPT-5

Fast

Context

200K

In /1M

$0.50

Out /1M

VisionTools 2025-10

Cheap bulkVision

Budget tier for OpenAI. Good cost/quality trade-off for classification and extraction.

Llama 4 Maverick

Meta · Llama 4

Medium

Context

In /1M

$0.50

Out /1M

$1.50

VisionTools 2025-03

Cheap bulkLong contextVision

Open weights, multimodal, MoE. Self-host or use via Together / Groq / Bedrock for low $/M.

DeepSeek R1

DeepSeek · DeepSeek

Medium

Context

128K

In /1M

$0.55

Out /1M

$2.19

VisionTools 2024-12

ReasoningCodingCheap bulk

Open-weight reasoning model. Strong math/code benchmarks at very low API cost. No vision.

Claude Haiku 4.5

Anthropic · Claude 4

Fast

Context

200K

In /1M

Out /1M

VisionTools 2025-08

Cheap bulkVision

Fast, cheap, surprisingly strong. Right pick for high-throughput pipelines.

Mistral Large 2

Mistral · Mistral

Medium

Context

128K

In /1M

Out /1M

VisionTools 2024-07

Coding

European hosting options matter for some regulated workloads.

Gemini 2.5 Pro

Google · Gemini 2.5

Medium

Context

In /1M

$2.50

Out /1M

$15

VisionTools 2025-04

Long contextVisionReasoning

Largest production context window (2M). Strong multimodal grounding.

Claude Sonnet 4.6

Anthropic · Claude 4

Medium

Context

In /1M

Out /1M

$15

VisionTools 2025-08

CodingLong contextVision

Default workhorse for most production loads. Excellent coding scores at a fraction of Opus cost.

Grok 4

xAI · Grok

Medium

Context

256K

In /1M

Out /1M

$15

VisionTools 2025-06

ReasoningCodingVision

X integration is the differentiator. Reasoning competitive with mid-tier frontier.

GPT-5

OpenAI · GPT-5

Medium

Context

400K

In /1M

$10

Out /1M

$30

VisionTools 2025-10

CodingReasoningVision

OpenAI flagship. Reliable generalist with strong reasoning mode.

Claude Opus 4.7

Anthropic · Claude 4

Slow

Context

In /1M

$15

Out /1M

$75

VisionTools 2026-01

CodingLong contextReasoningVision

Anthropic flagship. Strong on agentic coding, long-context reasoning, and tool use. 1M context tier.

OpenAI · o-series

Slow

Context

200K

In /1M

$20

Out /1M

$80

VisionTools 2024-10

ReasoningCoding

Reasoning model. Best when you can pay for many hidden thinking tokens.

How I'd pick

After deploying these in production across several teams, my heuristic is simple: pick two models, not one. A workhorse for the default 80 % of work, and a cheap-fast model for high-volume, low-stakes tasks. Reserve the flagship tier (Opus 4.7, GPT-5, o3, Gemini 2.5 Pro) for the hard 5 % where reasoning and reliability matter more than cost.

What I actually look at

Input price dominates real-world bills. RAG and agent loops read 5–20× more tokens than they write.
Context window matters when it matters — but most workloads stay under 32 K. Don't overpay for headroom you won't use.
Tool use quality ≠ tool-use checkmark. Test the actual behaviour under your prompts before committing.
Latency tier is a UX decision. Chat needs fast first-token; batch jobs can wait.
Knowledge cutoff only matters if you don't ground with search or RAG.

Caveats

Pricing and model availability change often. This page is hand-curated and refreshed quarterly — verify on the vendor's page before committing to anything contractual. Open-weight models (Llama, DeepSeek) are priced at common hosting providers; self-hosting changes the cost equation entirely.

Frequently Asked Questions

Which LLM should I pick for production?

Pick two: a workhorse (Sonnet 4.6 / GPT-5 / Gemini 2.5 Pro) and a cheap fast model (Haiku 4.5 / GPT-5 mini / Gemini Flash). Reserve flagship Opus / o3 / Gemini Pro for the hard 5 %.

How accurate is the pricing here?

Hand-curated, refreshed quarterly. Always cross-check with the vendor pricing page before signing anything.

Why is input price the most important number?

Most production workloads (RAG, agents, code assistants) read 5–20× more tokens than they write. Input price usually dominates the bill.

Are open-weight models included?

Yes — Llama 4 and DeepSeek, priced at common API hosts (Together, Fireworks, Bedrock). Self-hosting changes the maths; use a dedicated GPU-cost comparator for that.