![]() |
VOOZH | about |
Token counting lets you determine the number of tokens in a message before you send it to Claude. This helps you make informed decisions about your prompts and usage. With token counting, you can:
This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.
The token counting endpoint accepts the same structured list of inputs for creating a message, including support for system prompts, tools, images, and PDFs. The response contains the total number of input tokens.
The token count should be considered an estimate. In some cases, the actual number of input tokens used when creating a message may differ by a small amount.
Token counts may include tokens added automatically by Anthropic for system optimizations. You are not billed for system-added tokens. Billing reflects only your content.
All active models support token counting.
client = anthropic.Anthropic()
response = client.messages.count_tokens(
model="claude-opus-4-8",
system="You are a scientist",
messages=[{"role": "user", "content": "Hello, Claude"}],
)
print(response.json()){ "input_tokens": 14 }Server tool token counts only apply to the first sampling call.
client = anthropic.Anthropic()
response = client.messages.count_tokens(
model="claude-opus-4-8",
tools=[
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
},
}
],
messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
)
print(response.json()){ "input_tokens": 403 }import base64
import httpx
image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_media_type = "image/jpeg"
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")
client = anthropic.Anthropic()
response = client.messages.count_tokens(
model="claude-opus-4-8",
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": image_media_type,
"data": image_data,
},
},
{"type": "text", "text": "Describe this image"},
],
}
],
)
print(response.json()){ "input_tokens": 1551 }See how the context window is calculated with extended thinking for more details
client = anthropic.Anthropic()
response = client.messages.count_tokens(
model="claude-sonnet-4-6",
thinking={"type": "enabled", "budget_tokens": 16000},
messages=[
{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?",
},
{
"role": "assistant",
"content": [
{
"type": "thinking",
"thinking": "This is a nice number theory question. Let's think about it step by step...",
"signature": "EuYBCkQYAiJAgCs1le6/Pol5Z4/JMomVOouGrWdhYNsH3ukzUECbB6iWrSQtsQuRHJID6lWV...",
},
{
"type": "text",
"text": "Yes, there are infinitely many prime numbers p such that p mod 4 = 3...",
},
],
},
{"role": "user", "content": "Can you write a formal proof?"},
],
)
print(response.json()){ "input_tokens": 88 }Token counting supports PDFs with the same limitations as the Messages API.
import base64
import anthropic
client = anthropic.Anthropic()
with open("document.pdf", "rb") as pdf_file:
pdf_base64 = base64.standard_b64encode(pdf_file.read()).decode("utf-8")
response = client.messages.count_tokens(
model="claude-opus-4-8",
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_base64,
},
},
{"type": "text", "text": "Please summarize this document."},
],
}
],
)
print(response.json()){ "input_tokens": 2188 }Claude Fable 5 and Claude Mythos 5 use the tokenizer introduced with Claude Opus 4.7, which produces roughly 30% more tokens than models before Claude Opus 4.7 for the same text. The token counting endpoint returns the count under the tokenizer of the model you pass, so to measure the difference for your workload, count the same request twice: once with your current model and once with model: "claude-fable-5" (or "claude-mythos-5"), and compare the two input_tokens values.
Billing and migration: Usage and billing on Claude Fable 5 and Claude Mythos 5 reflect this tokenizer's counts. If you're migrating from a model before Claude Opus 4.7, the same content consumes roughly 30% more tokens. When migrating a workload to Claude Fable 5 and Claude Mythos 5, don't reuse token counts measured on a model before Claude Opus 4.7 to estimate costs or context window fit. Count your prompts with model: "claude-fable-5" (or "claude-mythos-5").
Token counting is free to use but subject to requests per minute rate limits based on your usage tier. If you need higher limits, contact sales through the Claude Console.
| Usage tier | Requests per minute (RPM) |
|---|---|
| 1 | 100 |
| 2 | 2,000 |
| 3 | 4,000 |
| 4 | 8,000 |
Token counting and message creation have separate and independent rate limits. Usage of one does not count against the limits of the other.
Read the full API reference for the token counting endpoint.
Use token counts to keep prompts within a model's context window.
Check token counts before you send a request to stay within your usage tier.
Reduce cost and latency on repeated prompts by caching prompt prefixes.
Was this page helpful?