AWS Bedrock Pricing 2026: Claude, Llama, and Mistral Through One AWS Endpoint
AWS Bedrock gives you API access to models from Anthropic, Meta, Mistral, Cohere, and others through a single AWS endpoint. This page covers multi-provider pricing on Bedrock specifically. For Anthropic's direct API rates (not via AWS), see Anthropic API Pricing. Bedrock rates generally match direct rates, but the billing modes differ.
On-Demand
- ✓ Claude Sonnet 4.6: $3/$15 per 1M tokens
- ✓ Llama 3.1 70B: $2.65/$3.50 per 1M tokens
- ✓ Mistral Large: $4/$12 per 1M tokens
- ✓ No minimum spend or commitment
- ✓ Best for variable or unpredictable workloads
Batch Inference
- ✓ Half the cost of on-demand
- ✓ Results within 24 hours
- ✓ Same models and quality
- ✓ Best for offline processing
- ✓ Submit via S3 bucket
Provisioned Throughput
- ✓ Guaranteed throughput for production
- ✓ 1-month or 6-month commitments
- ✓ No per-token charges
- ✓ Best for steady high-volume workloads
- ✓ Custom model hosting available
Knowledge Bases / Agents
- ✓ Managed RAG pipeline
- ✓ Vector storage included
- ✓ Automatic document chunking
- ✓ Agent orchestration built in
- ✓ Additional charges on top of model costs
On-Demand Model Pricing Table
Here's what every major model costs on Bedrock's on-demand tier. These prices match (or are very close to) the providers' direct API pricing.
| Model | Input / 1M tokens | Output / 1M tokens | Context Window |
|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K |
| Claude Opus 4.6 | $5.00 | $25.00 | 200K |
| Llama 3.1 8B | $0.30 | $0.60 | 128K |
| Llama 3.1 70B | $2.65 | $3.50 | 128K |
| Llama 3.1 405B | $5.32 | $16.00 | 128K |
| Mistral Large | $4.00 | $12.00 | 128K |
| Cohere Command R+ | $2.50 | $10.00 | 128K |
| Cohere Embed v3 | $0.10 | n/a | 512 |
| Llama 4 Scout (109B MoE) | $0.27 | $0.36 | 10M |
| Llama 4 Maverick (402B MoE) | $0.50 | $0.77 | 1M |
| Mistral Large 2 | $2.00 | $6.00 | 128K |
Bedrock vs Direct API Pricing: Are You Actually Saving Money?
A common question in 2026 is whether Bedrock adds a markup over calling model providers directly. The short answer: for on-demand pricing, Bedrock's per-token rates match the providers' published prices almost exactly. Claude Sonnet 4.6 costs $3/$15 per million tokens on both Bedrock and the Anthropic API. Llama models are priced at Meta's published inference rates.
Where the math gets interesting is prompt caching and batch discounts. Anthropic's direct API offers prompt caching that reduces costs by 90% on repeated prefixes. Bedrock supports prompt caching for Claude models, but the feature availability can lag behind Anthropic's direct API by weeks. If prompt caching is critical to your cost structure, verify that the specific caching feature you need is live on Bedrock before committing.
Batch inference is 50% off on Bedrock, matching the discount available on direct APIs. The operational advantage of Bedrock batch is S3 integration: you drop your input files in an S3 bucket, Bedrock processes them, and results appear in another bucket. No webhook management, no polling. For teams already running data pipelines on AWS, this eliminates real integration work.
The hidden cost that catches teams is Knowledge Bases. The managed RAG pipeline uses OpenSearch Serverless under the hood, which has a minimum cost of roughly $700/month (4 OCUs). For a simple chatbot that only needs vector search over a few thousand documents, this floor cost makes Bedrock's managed RAG dramatically more expensive than running Pinecone ($50/month) or pgvector (free) alongside direct API calls.
Bedrock vs Direct API Access: When Bedrock Wins
Bedrock's pricing matches direct API pricing, so the decision comes down to operational benefits, not cost savings.
Bedrock wins when your team is already AWS-native. IAM authentication means no API key management. CloudWatch gives you usage metrics alongside your other AWS monitoring. VPC endpoints keep traffic off the public internet. These are significant operational advantages for enterprise teams.
Bedrock also wins for multi-model architectures. Instead of managing API keys, billing, and SDKs for Anthropic, Meta, Mistral, and Cohere separately, Bedrock gives you one endpoint. Model switching is a config change, not an integration project.
Direct APIs win when you need the providers' latest features fastest. New models and capabilities (like Anthropic's prompt caching or OpenAI's batch API improvements) often hit the direct API before they're available on Bedrock. If being on the latest version matters, direct access has less lag.
Hidden Costs & Gotchas
- ⚠ Bedrock's on-demand pricing for Claude matches Anthropic's direct API pricing. You're not paying a premium for the AWS wrapper, but you're also not getting a discount.
- ⚠ Legacy Claude models (3.5 Sonnet in Public Extended Access) now cost $6/$30 per 1M tokens, double the current Sonnet 4.6 price. If your code references old model IDs, you're overpaying.
- ⚠ Knowledge Bases adds charges for vector storage, document processing, and retrieval on top of the model inference cost. A simple RAG setup can cost $50-200/month in Bedrock-specific charges.
- ⚠ Provisioned throughput requires 1-month minimum commitments. If your traffic drops, you still pay for reserved capacity. Only commit after you have stable baseline traffic data.
- ⚠ Data transfer costs apply when moving data between AWS regions or out of AWS. These are standard AWS charges but easy to overlook when budgeting for AI.
- ⚠ Bedrock Agents adds orchestration charges per step. A multi-step agent workflow costs more than a single model invocation for the same output.
- ⚠ Model availability varies by AWS region. Not all models are available in all regions. Check your region before building a pipeline.
Which Plan Do You Need?
AWS-native team
On-demand Bedrock. If your infrastructure is already on AWS, Bedrock keeps everything in one ecosystem. IAM auth, CloudWatch metrics, and VPC endpoints work out of the box.
Multi-model application
On-demand Bedrock gives you Claude, Llama, Mistral, and Cohere through a single API endpoint. No need for separate API keys and billing from each provider.
High-volume production workload
Provisioned throughput. If you need guaranteed latency and throughput for a steady workload, provisioned capacity eliminates throttling risk at a predictable cost.
Cost-sensitive team
Consider the providers' direct APIs. Anthropic and OpenAI offer the same models at the same price, and some offer additional discounts (prompt caching, batch API) that may not be available on Bedrock.
The Bottom Line
Bedrock makes sense for teams already on AWS who want one API endpoint for multiple model providers. The pricing matches direct API costs for on-demand usage, and batch inference at 50% off is the same deal you'd get from the providers directly. The value-add is operational: IAM auth, CloudWatch, VPC endpoints, and managed RAG via Knowledge Bases. If you're not on AWS, there's no pricing reason to choose Bedrock over direct API access.
Related Resources
Frequently Asked Questions
AWS Bedrock Pricing Update Tracker (2026)
AWS Bedrock pricing and model availability changes throughout the year. We track every update so this page stays the most current source. Last reviewed: April 2026.
- April 2026: No major pricing changes. Claude 4.6 family rates hold. Amazon Nova pricing tiers unchanged. Prompt caching support expanded across more regions.
- Q1 2026: Claude Opus 4.6 and Sonnet 4.6 GA on Bedrock following the direct Anthropic API launch. Cross-region inference availability expanded.
- Q4 2025: Claude 4.6 family launched on Bedrock at the same per-token pricing as 4.5 family. Llama 3.3 70B added at competitive pricing.
- Q3 2025: Amazon Nova family launched as the in-house Bedrock-native model lineup. Mistral Large 2 added with European data residency support.
AI tool pricing changes weekly. We track all of it.
Weekly data from 22,000+ job postings. Free.
2,700+ subscribers. Unsubscribe anytime.
