URL: https://www.apideck.com/blog/api-design-principles-agentic-era.md
---
title: "API Design Principles for the Agentic Era"
description: "For 15 years, developer experience meant optimizing for humans at a terminal. That model is increasingly incomplete. A growing share of API traffic is generated by AI agents that read docs, make decisions, and retry failures autonomously. This is the same discipline as DX applied to a new consumer. Call it agent experience, or AX."
author: "GJ"
published: "2026-02-23T00:00+00:00"
updated: "2026-06-18T13:31:30.654Z"
url: "https://www.apideck.com/blog/api-design-principles-agentic-era"
category: "AI"
tags: ["AI", "Guides & Tutorials"]
---
# API Design Principles for the Agentic Era
For about fifteen years, "developer experience" meant one thing: optimize for a human being sitting at a terminal. Readable error messages. Interactive docs. Code samples in seven languages. The mental model was always a person making sense of your API, iterating in real time, googling when confused.
That model isn't wrong. But it's increasingly incomplete.
A growing fraction of API traffic today is generated by AI agents — and the shift is accelerating. Cloudflare reported that [automated bot traffic surpassed human traffic](https://blog.cloudflare.com/bot-traffic-trends-2025/) for the first time in 2024, and RAG-based agent traffic surged 49% in early 2025 alone. Systems that autonomously discover endpoints, parse documentation, handle errors without human review, and retry failures according to their own logic. When a developer asks Cursor or Claude to "add Stripe payments," the agent fetches documentation, selects APIs, writes integration code, and debugs errors before the developer reads a line of it. The human is further from the integration than they've ever been.
The pattern is especially visible in fintech and accounting. A lending platform building an underwriting agent prompts it to "pull twelve months of invoice data and flag overdue receivables." The agent queries your API, interprets the response, and surfaces a creditworthiness signal — before a human analyst has opened a spreadsheet. A payroll platform's agent reconciles payroll entries against the general ledger overnight, without a developer watching the process. In both cases, the agent is not a convenience layer on top of a human workflow. It is the workflow. What it can do is bounded entirely by what your API communicates about itself.
This creates a new design surface. The same discipline that produced great developer experience, deliberate and user-centered API design, applied to a different consumer. Call it agent experience, or AX. It's not a replacement for DX. It's the next layer.
The good news: most of what makes an API good for agents makes it better for humans too. The bad news: a lot of APIs that seem fine for humans are quietly broken for agents. Here's where the gaps show up.
---
## Bad OpenAPI descriptions break agent routing
When an agent decides which endpoint to call, it's doing something close to semantic search against your descriptions. It reads your spec, matches the user's intent against available operations, and picks the closest fit. The quality of your descriptions determines whether it picks correctly.
"Gets the data" loses to "Returns a paginated list of invoices filtered by status and date range, sorted by created_at descending. Requires accounting:read scope. Use the cursor parameter for pagination." Every field, every enum value, every endpoint. The description is the signal the agent uses to route.
Most OpenAPI specs are bad at this. Not because anyone decided to make them bad. They rot. Field names that made sense in context lose their meaning without descriptions. Enums accumulate values that nobody documented. Endpoints get renamed but the descriptions don't follow. The spec becomes a structural skeleton with no semantic content.
The difference between a description that helps and one that doesn't is not subtle:
```yaml
# Bad — structural skeleton, no semantic content
/invoices:
get:
summary: Get invoices
parameters:
- name: status
in: query
schema:
type: string
- name: cursor
in: query
schema:
type: string
# Good — enough signal to route correctly
/invoices:
get:
summary: List invoices filtered by status and date range
description: >
Returns a paginated list of invoices for the authenticated company.
Use `status` to filter by payment state. Use `cursor` from the previous
response to fetch the next page. Requires the `accounting:read` scope.
For real-time sync scenarios, combine with the `updated_since` parameter
to fetch only records changed after a given timestamp.
parameters:
- name: status
in: query
description: >
Filter by invoice status. Accepted values: draft, submitted,
authorised, deleted, voided, paid. Defaults to all statuses
if omitted.
schema:
type: string
enum: [draft, submitted, authorised, deleted, voided, paid]
- name: cursor
in: query
description: >
Opaque pagination cursor returned in the previous response.
Omit to start from the first page.
schema:
type: string
```
An agent matching "fetch unpaid invoices for reconciliation" against the bad version has no signal to work with. Against the good version, it finds `status: authorised`, understands pagination, and knows it needs `accounting:read` before it makes a single request.
I've seen this up close. We built [Portman](https://github.com/apideck-libraries/portman), an open-source CLI that converts OpenAPI specs into Postman collections for contract testing. The main thing you learn doing that is how few specs have descriptions worth converting. If your spec is broken for contract testing, it's broken for agents. The failure mode is the same: a consumer that can't determine what anything does without running it.
The fix isn't glamorous. Go through your spec field by field. Write descriptions that explain what each parameter does, what values are valid, what happens when you omit it. Do this for your ten most important endpoints first. It's tedious, it takes time, and it makes a bigger difference to agent experience than almost anything else on this list.

---
## Your errors should tell agents how to fix themselves
The agent experience of errors is fundamentally different from the human experience. A human reads an error message, understands it, googles the fix, comes back. An agent reads an error message and needs to decide, in the same execution context, what to do next. If the error is ambiguous, the agent either guesses or fails.
Stripe's error responses have included a `doc_url` field for years:
```json
{
"error": {
"code": "parameter_invalid_empty",
"doc_url": "https://stripe.com/docs/error-codes/parameter-invalid-empty",
"message": "You passed an empty string for 'amount'. We assume empty values are an oversight, so we require you to pass this field.",
"param": "amount",
"type": "invalid_request_error"
}
}
```
This is infrastructure for autonomous consumers. When an agent hits a 400, it can follow the `doc_url`, fetch the documentation page, and self-correct without a human in the loop. The agent experience of that error is recovery. The agent experience without `doc_url` is a dead end.
Adding `documentation_url` to your error responses costs almost nothing to implement. Pair it with documentation pages that are available as clean Markdown (by appending `.md` to the URL, or via a parallel `/docs/{page}.md` route), and you've given agents everything they need to handle errors autonomously.
The other half of error design is specificity. "Invalid request" is useless. "The `amount` field cannot be empty" is useful. "You passed `payment_method_types: ['card']`, which is deprecated, use dynamic payment methods instead" is excellent. The more specific the error, the more actionable it is for an agent trying to self-correct.
### Give agents structured recovery metadata
Beyond message specificity, error responses should include machine-readable metadata that enables programmatic recovery. A human can infer that a 429 means "wait and try again." An agent needs to be told explicitly — and told how long to wait.
```json
{
"error": {
"code": "RATE_LIMITED",
"message": "Rate limit exceeded for /invoices endpoint.",
"is_retriable": true,
"retry_after_seconds": 30,
"documentation_url": "https://docs.example.com/errors/rate-limited"
}
}
```
The key fields: `is_retriable` tells the agent whether retrying the same request is a valid strategy. `retry_after_seconds` gives it a concrete backoff interval instead of forcing it to guess. For non-retriable errors, consider an `alternative_action` field that suggests a different approach — "use the /invoices/batch endpoint for bulk requests" or "reduce page_size to 50 or fewer." These fields turn error handling from a failure mode into a decision tree the agent can navigate autonomously.
---
## Design auth flows that agents can actually complete
Authentication is arguably the single biggest friction point for agents consuming APIs today. Humans can click through OAuth consent screens, solve CAPTCHAs, and complete interactive MFA flows. Agents can't do any of that. If your auth flow requires a browser redirect or a human in the loop, agents hit a wall before they make their first real API call.
**Agent-friendly auth patterns:**
- **API keys and bearer tokens** are the simplest path. No interactive flow, no browser, no session management. The agent loads a key from an environment variable and starts making requests. For most API-to-agent integrations, this is the right default.
- **OAuth client credentials grant** is the machine-to-machine OAuth flow. No user consent screen, no redirects — just a client ID and secret exchanged for an access token. If your API uses OAuth, make sure this grant type is supported and documented, not just the authorization code flow.
- **Scoped tokens with least privilege.** Agents should be able to request tokens with narrow permissions. If your API issues tokens with full account access because the scoping model is coarse, you're creating a security risk that compounds when agents run autonomously at scale.
- **Short-lived tokens with documented refresh mechanics.** Agents handle token rotation well — they can detect a 401, refresh the token, and replay the request. But only if the refresh flow is documented and the error response distinguishes "expired token" from "invalid token" from "missing scope."
**Anti-patterns that break agents:**
- OAuth authorization code flow as the *only* option (requires browser redirects)
- CAPTCHA or interactive MFA during token exchange
- Session-based auth with cookie management
- API keys that require manual dashboard visits to rotate
**What to document in your OpenAPI spec:**
Your `securitySchemes` should include clear descriptions of each auth method, not just the type. Document token lifetimes, refresh mechanics, and which scopes map to which operations. When an agent reads your spec and sees that `accounting:write` is required for `POST /invoices`, it can request exactly that scope. When the spec just says "requires authentication" with no further detail, the agent guesses — and often guesses wrong.
Auth errors deserve the same specificity treatment as other errors. "Unauthorized" is useless. "Token expired at 2024-01-15T10:30:00Z, refresh using POST /oauth/token with grant_type=refresh_token" is infrastructure for autonomous recovery.
---
## Rate limits hit agents harder — design for it
Agents interact with rate limits fundamentally differently than human developers. A developer hitting a rate limit during testing pauses, adjusts, and moves on. An agent running autonomously at 3 AM can burn through your entire rate limit allocation in seconds, especially when running parallel operations like syncing thousands of records.
Three things make a rate-limited API agent-friendly:
1. **Return rate limit state in response headers.** `X-RateLimit-Remaining`, `X-RateLimit-Limit`, and `X-RateLimit-Reset` let agents pace themselves proactively instead of slamming into the wall and reacting. An agent that sees 5 requests remaining out of 100 can throttle itself. An agent with no visibility burns through the limit and deals with a cascade of 429s.
2. **Make 429 responses machine-actionable.** Include `Retry-After` as a header (the HTTP standard) and mirror it in the response body for agents that parse JSON. The structured `is_retriable` and `retry_after_seconds` fields discussed in the error handling section apply directly here.
3. **Offer bulk or batch endpoints.** If an agent needs to fetch 10,000 invoices, making 10,000 individual requests is inefficient for everyone. A `/invoices/batch` endpoint that accepts a list of IDs, or generous pagination with 200+ items per page, lets agents accomplish the same work in a fraction of the requests. This is better for your infrastructure and dramatically better for the agent's token budget and execution time.
---
## llms.txt: tell AI what to read
The [llms.txt](https://llmstxt.org/) standard, proposed by Jeremy Howard in September 2024, exists because AI agents have finite context windows and your documentation site is not designed for machine consumption. HTML is noisy. Navigating a docs site the way a human would (clicking links, loading JavaScript, jumping between pages) is expensive and lossy.
The solution is a Markdown file at `/llms.txt` that curates the most important parts of your documentation with plain-text links and one-sentence descriptions. Agents and AI coding tools can fetch this file once and understand the shape of your documentation before writing a single line of integration code.
The format is deliberately boring. No special syntax, no schema. Just Markdown. The interesting part is curation: you know your documentation better than any crawler, so you should be the one deciding what matters.
Adding a basic `/llms.txt` takes an afternoon. Link to your ten most important pages. Write one sentence per link explaining what's there. That's a complete implementation. You don't need [Stripe's 350-link version](https://docs.stripe.com/llms.txt) on day one. And if you're already using a documentation platform like [Mintlify](https://www.mintlify.com/docs/ai/llmstxt), it generates and maintains `llms.txt` for you automatically — you get it for free without any manual curation work.
What's underutilized (and genuinely powerful for agent experience) is the instructions section. Stripe's [llms.txt](https://docs.stripe.com/llms.txt) includes:
```
## Instructions for Large Language Model Agents: Best Practices for integrating Stripe.
- Always use the Checkout Sessions API over the legacy Charges API
- Default to the latest stable SDK version
- Never recommend the legacy Card Element or Sources API
```
This is Stripe telling AI coding assistants exactly which footguns to avoid. If your API has deprecated primitives, ambiguous endpoint choices, or integration patterns that look valid but cause problems, the instructions section is where you document that for agents. It propagates into every AI-assisted integration of your API. Every time a developer asks an agent how to use your API, those instructions shape the answer. We wrote a [deeper look at Stripe's llms.txt and who else is getting this right](https://www.apideck.com/blog/stripe-llms-txt-instructions-section) if you want to go further on this specific mechanism.
There's a second-order benefit worth naming here. The same properties that make your docs readable to coding agents (structured Markdown, clear definitions, curated indexing) also make your content discoverable by AI answer engines like Perplexity and ChatGPT — this is what's increasingly called AEO, or Answer Engine Optimization. When someone asks "what's the best accounting API" or "how do I build an accounting integration," the answer comes from content that AI can parse and cite. llms.txt, semantic descriptions, and concrete definitions aren't just AX improvements. They're how you show up in AI search. The discipline is the same: write for machines and you get both.
## Get your docs indexed by Context7
[Context7](https://context7.com/) is an MCP server that solves a specific problem: AI coding tools have a training data cutoff, which means they default to documentation from whenever they were last trained. If your API changed since then, agents write integrations against the old version. Context7 addresses this by maintaining an up-to-date index of developer documentation that coding tools can query in real time, pulling current docs rather than cached training data.
Getting your documentation added to Context7 is low-effort and high-leverage. You submit your docs, they get indexed, and any developer using a Context7-enabled coding tool gets accurate, current information about your API without you needing to push updates to every AI provider individually.
The practical upside is significant if you ship changes regularly. An API that added a new authentication method last quarter, or deprecated an endpoint three months ago, looks different in Context7 than it does in an AI's training data. Developers using Context7-enabled tools get the right answer. Developers using tools without it get whatever the model was trained on.
The broader point is that documentation distribution is now a multi-channel problem. Publishing docs to your website is table stakes. Getting those docs in front of agents (through `/llms.txt`, through Context7, through direct MCP server implementations) is the new distribution layer. You're not just writing docs for developers who visit your site. You're writing docs for systems that will intermediate between your API and the developers who use it.

## Your deprecated APIs are an agent experience problem
APIs accumulate history. Old endpoints that still work. Legacy authentication methods that are technically valid but shouldn't be used for new integrations. Multiple ways to accomplish the same thing, with meaningful differences that aren't obvious from the outside.
Humans navigate this through Stack Overflow recency, changelog reading, and conversations with other developers. Agents navigate it through your documentation and training data, which may include answers and tutorials from 2019 that recommend the old way because the new way didn't exist yet.
Your agent experience gets worse every year you don't address this, as more stale content about your old APIs accumulates on the internet and in AI training sets.
The practical response is layered. Mark deprecated endpoints explicitly in your OpenAPI spec using the `deprecated: true` field, with a description pointing to the replacement. Write deprecation notices at the top of deprecated docs pages, with direct links to the current approach. Add deprecated API guidance to your llms.txt instructions section. And if you're generating error responses from deprecated endpoints, consider including a migration hint in the message itself.
None of this requires coordination with AI providers. It works because agents read your documentation.
## MCP: build it when people ask for it
[Model Context Protocol](https://www.apideck.com/blog/a-primer-on-the-model-context-protocol) is real, growing, and mostly not worth building for yet unless your users are asking for it.
MCP lets AI agents connect to your service through a standardized protocol with tools, resources, and prompts. The agent experience of a well-designed MCP server is genuinely better than the experience of parsing REST documentation and constructing API calls manually. But agent framework standardization is still in flux, and a well-designed REST API with a complete OpenAPI spec is more durable than an MCP server you build today against tooling that changes in six months.
HubSpot is a good example of what a mature implementation looks like. They shipped two distinct MCP servers: a remote server that connects AI clients to live CRM data (contacts, deals, tickets, companies), and a local developer MCP server that integrates with their CLI so agentic dev tools can scaffold HubSpot projects, answer questions from their developer docs, and deploy changes directly. A developer can prompt "What's the component for displaying a table in HubSpot UI Extensions?" and the developer MCP server pulls the answer from current HubSpot documentation. Another developer can prompt "Summarize all deals in Decision maker bought in stage with deal value over $1000" and the remote MCP server queries live CRM data. Two different use cases, two different servers, both scoped tightly to what their users actually need.
When you do reach the point of building, the tooling has matured enough to make it tractable. There's also a compounding benefit here: the spec quality work from earlier pays dividends. Well-described endpoints, precise parameter definitions, and clear enum values flow directly into the tool definitions that generators like Gram produce — better spec in, better MCP server out. [Gram](https://www.speakeasy.com/product/gram) by Speakeasy is an open-source MCP cloud platform: you upload your OpenAPI spec, it converts your endpoints into curated toolsets, and deploys them as a hosted MCP server at `mcp.yourcompany.com` in minutes. The key design insight is curation — rather than exposing all 200 endpoints as individual CRUD tools and overwhelming the agent's context, Gram lets you distill them into 5–30 "chunky" tools designed around business outcomes. Instead of separate `GET /invoices`, `GET /invoices/{id}`, `GET /payments`, and `POST /payments` tools, you'd expose a single "reconcile invoice payments" tool that chains the right calls together. This is the difference between giving an agent a bag of HTTP verbs and giving it capabilities it can reason about. For teams building in Python who want full control over the implementation, [FastMCP](https://github.com/jlowin/fastmcp) is a high-level framework that strips out the protocol boilerplate and lets you define tools, resources, and prompts in straightforward Python. The two complement each other: Gram handles generation and hosting from your OpenAPI spec, FastMCP handles custom logic when your use case goes beyond what a spec can express.
Most API companies aren't HubSpot. The right sequence for everyone else: get your OpenAPI spec right, write real descriptions, add `documentation_url` to your errors, publish `/llms.txt`, get indexed by Context7. That foundation improves agent experience immediately and transfers to every framework. Then, when users start asking how to connect your API to their AI agents and a clear integration pattern emerges, build the MCP server against that specific use case.
Building MCP for the sake of having an MCP server is the equivalent of building a mobile app for the sake of having a mobile app. The agent experience of a half-built MCP server is worse than the agent experience of a good REST API.

## Skills: Teaching Agents to Do Your Job at 100x the Scale
There's a mental model shift happening in engineering right now. AI agents aren't just writing code faster; they're becoming the engineers who run the product development loop. The new engineering job is building the agents that automate that loop for you, and then directing them to do it at a scale no individual could match.
The simple version of this is already in use: Ghostty splits and tabs, tmux sessions, CLI agents in parallel. You run ten agents at once, each working a different branch or task. That's horizontal scalability for engineering work, something that was physically impossible before.
But raw parallelism without direction is chaos. This is where **skills** come in.
### What Are Skills?
Skills are instruction packages for AI agents: Markdown files (optionally accompanied by scripts and resources) that teach an agent exactly how to handle a specific task. When the agent encounters work relevant to a skill, it loads that skill's context and operates with domain-specific precision instead of general-purpose guesswork.
Anthropic introduced Skills as a first-class concept for Claude Code and the Claude ecosystem. Simon Willison [called them "maybe a bigger deal than MCP"](https://simonwillison.net/2025/Oct/16/claude-skills/), and his argument is compelling:
> Skills are conceptually extremely simple: a skill is a Markdown file telling the model how to do something, optionally accompanied by extra documents and pre-written scripts that the model can run to help it accomplish the tasks described by the skill.
The key design insight is **token efficiency**. At session start, Claude reads only a brief YAML frontmatter description of each available skill, a few dozen tokens per skill. The full skill content only loads when the agent determines it's needed for the task at hand. You can have dozens of skills installed without burning your context window.
### The Agent Skills Ecosystem
The ecosystem is moving fast. Vercel launched [agent-skills](https://github.com/vercel-labs/agent-skills), a directory and tooling layer for discovering, installing, and composing skills across coding agents. The `skills.sh` platform (used by Cursor, Claude Code, and others) makes the install experience as simple as:
```bash
npx skills add