Gemini 3.1 Flash Lite
Fast & Cheap★ Best ValueGoogle · $0.1/M in · $0.4/M out
Estimate your monthly cost across all major AI APIs. Adjust the sliders, see what each model would cost.
Google · $0.1/M in · $0.4/M out
DeepSeek · $0.27/M in · $1.1/M out
Best price/perf ratio in frontier tier
OpenAI · $0.3/M in · $1.2/M out · $0.03/M cached
Mistral · $0.4/M in · $2/M out
Meta (via Together) · $0.88/M in · $0.88/M out
Google · $0.3/M in · $2.5/M out · $0.08/M cached
xAI · $0.5/M in · $2/M out
Anthropic · $0.8/M in · $4/M out · $0.08/M cached
OpenAI · $1.1/M in · $4.4/M out
Affordable reasoning model
Mistral · $2/M in · $6/M out
Google · $1.25/M in · $10/M out · $0.31/M cached
1M token context
Meta (via Together) · $5/M in · $5/M out
Open weights — also runnable locally for free
Anthropic · $3/M in · $15/M out · $0.3/M cached
xAI · $5/M in · $15/M out
OpenAI · $5/M in · $20/M out · $0.5/M cached
OpenAI · $10/M in · $40/M out
Reasoning model — slower, much more capable on math/logic
OpenAI · $15/M in · $60/M out
Higher reasoning, Pro/Business/Enterprise
Anthropic · $15/M in · $75/M out · $1.5/M cached
Input tokens = everything the model reads (your prompt, attached files, conversation history). Output tokens = what the model writes back.
Rule of thumb: 1 token ≈ 0.75 words. A 1,000-word document is ~1,300 tokens.
Cached input ratio matters because most providers offer steep discounts (10x) on tokens that hit their cache. For applications with consistent system prompts or repeated context, cache hit rates of 50-90% are achievable.
Pricing data is from May 2026. Providers update prices regularly — check official pricing pages before committing to production budgets. Open-weight models like Llama 4 405B and DeepSeek V4 can also be self-hosted for unlimited usage if you have the hardware.