Voozh

👁 AI Batch Processing API Guide 2026: Process Millions of Requests Efficiently

Crazyrouter

Read the docs Check live pricing Open image tool Create account

AI Batch Processing API Guide 2026: Process Millions of Requests Efficiently#

When you need to process thousands or millions of AI requests—whether for data classification, content generation, document analysis, or embeddings—real-time API calls become impractical and expensive. Batch processing APIs offer a solution: submit large volumes of requests, get results within hours, and save up to 50% on costs.

What is AI Batch Processing?#

AI batch processing lets you submit many requests at once instead of making individual API calls. The provider processes them asynchronously, typically within a 24-hour window, and returns all results together.

Key benefits:

50% cost savings — Most providers offer significant discounts for batch jobs
Higher throughput — Process millions of requests without rate limits
Simplified infrastructure — No need to build retry logic and queue systems
Better resource utilization — Providers can schedule batch jobs during off-peak hours

Common use cases:

Classifying millions of customer support tickets
Generating product descriptions for e-commerce catalogs
Analyzing financial documents at scale
Creating embeddings for large document collections
Translating content libraries
Evaluating LLM outputs for quality assessment

OpenAI Batch API: Complete Tutorial#

OpenAI's Batch API is the most mature batch processing solution, offering 50% cost reduction for requests processed within 24 hours.

Step 1: Prepare Your Input File (JSONL)#

Each line in the JSONL file is an independent request:

python

import json

# Create batch input file
requests_data = [
 {
 "custom_id": f"request-{i}",
 "method": "POST",
 "url": "/v1/chat/completions",
 "body": {
 "model": "gpt-5-mini",
 "messages": [
 {"role": "system", "content": "Classify the sentiment as positive, negative, or neutral."},
 {"role": "user", "content": text}
 ],
 "max_tokens": 10
 }
 }
 for i, text in enumerate(customer_reviews) # Your data
]

# Write JSONL file
with open("batch_input.jsonl", "w") as f:
 for req in requests_data:
 f.write(json.dumps(req) + "\n")

print(f"Created batch with {len(requests_data)} requests")

Step 2: Upload and Submit the Batch#

python

from openai import OpenAI

client = OpenAI(
 api_key="your-api-key",
 base_url="https://api.crazyrouter.com/v1"
)

# Upload the input file
batch_file = client.files.create(
 file=open("batch_input.jsonl", "rb"),
 purpose="batch"
)

print(f"Uploaded file: {batch_file.id}")

# Create the batch job
batch_job = client.batches.create(
 input_file_id=batch_file.id,
 endpoint="/v1/chat/completions",
 completion_window="24h",
 metadata={"description": "Sentiment classification batch"}
)

print(f"Batch job created: {batch_job.id}")
print(f"Status: {batch_job.status}")

Step 3: Monitor Progress#

python

import time

def wait_for_batch(client, batch_id, poll_interval=60):
 """Poll batch status until completion"""
 while True:
 batch = client.batches.retrieve(batch_id)
 print(f"Status: {batch.status} | "
 f"Completed: {batch.request_counts.completed}/{batch.request_counts.total} | "
 f"Failed: {batch.request_counts.failed}")
 
 if batch.status == "completed":
 return batch
 elif batch.status in ["failed", "expired", "cancelled"]:
 raise Exception(f"Batch {batch.status}: {batch.errors}")
 
 time.sleep(poll_interval)

completed_batch = wait_for_batch(client, batch_job.id)
print(f"Output file: {completed_batch.output_file_id}")

Step 4: Download and Process Results#

python

# Download results
result_file = client.files.content(completed_batch.output_file_id)
results = result_file.text

# Parse results
processed = []
for line in results.strip().split("\n"):
 result = json.loads(line)
 custom_id = result["custom_id"]
 response = result["response"]["body"]["choices"][0]["message"]["content"]
 processed.append({"id": custom_id, "sentiment": response})

print(f"Processed {len(processed)} results")

# Show sample
for item in processed[:5]:
 print(f" {item['id']}: {item['sentiment']}")

Complete Batch Processing Pipeline#

Here's a production-ready pipeline:

python

import json
import time
from pathlib import Path
from openai import OpenAI

class BatchProcessor:
 def __init__(self, api_key: str, base_url: str = "https://api.crazyrouter.com/v1"):
 self.client = OpenAI(api_key=api_key, base_url=base_url)
 
 def create_batch_file(self, items: list, system_prompt: str, 
 model: str = "gpt-5-mini", output_path: str = "batch_input.jsonl"):
 """Create JSONL input file from a list of items"""
 with open(output_path, "w") as f:
 for i, item in enumerate(items):
 request = {
 "custom_id": f"req-{i:06d}",
 "method": "POST",
 "url": "/v1/chat/completions",
 "body": {
 "model": model,
 "messages": [
 {"role": "system", "content": system_prompt},
 {"role": "user", "content": str(item)}
 ],
 "max_tokens": 256,
 "temperature": 0
 }
 }
 f.write(json.dumps(request) + "\n")
 
 print(f"Created {output_path} with {len(items)} requests")
 return output_path
 
 def submit_batch(self, input_path: str, description: str = "") -> str:
 """Upload file and create batch job"""
 # Upload
 with open(input_path, "rb") as f:
 uploaded = self.client.files.create(file=f, purpose="batch")
 
 # Create batch
 batch = self.client.batches.create(
 input_file_id=uploaded.id,
 endpoint="/v1/chat/completions",
 completion_window="24h",
 metadata={"description": description}
 )
 
 print(f"Batch submitted: {batch.id}")
 return batch.id
 
 def wait_and_download(self, batch_id: str, poll_interval: int = 60) -> list:
 """Wait for completion and download results"""
 while True:
 batch = self.client.batches.retrieve(batch_id)
 completed = batch.request_counts.completed
 total = batch.request_counts.total
 failed = batch.request_counts.failed
 
 print(f"\r Progress: {completed}/{total} ({failed} failed)", end="", flush=True)
 
 if batch.status == "completed":
 print("\n ✅ Batch completed!")
 break
 elif batch.status in ["failed", "expired", "cancelled"]:
 print(f"\n ❌ Batch {batch.status}")
 raise Exception(f"Batch failed: {batch.errors}")
 
 time.sleep(poll_interval)
 
 # Download results
 content = self.client.files.content(batch.output_file_id)
 results = []
 for line in content.text.strip().split("\n"):
 data = json.loads(line)
 results.append({
 "id": data["custom_id"],
 "response": data["response"]["body"]["choices"][0]["message"]["content"],
 "tokens": data["response"]["body"]["usage"]["total_tokens"]
 })
 
 return sorted(results, key=lambda x: x["id"])

# Usage example
processor = BatchProcessor(api_key="your-crazyrouter-key")

# Classify 10,000 customer reviews
reviews = ["Great product, love it!", "Terrible service, never again.", ...] # Your data

processor.create_batch_file(
 items=reviews,
 system_prompt="Classify sentiment as: positive, negative, or neutral. Reply with one word only.",
 model="gpt-5-mini"
)

batch_id = processor.submit_batch("batch_input.jsonl", "Customer review sentiment analysis")
results = processor.wait_and_download(batch_id)

for r in results[:10]:
 print(f"{r['id']}: {r['response']} ({r['tokens']} tokens)")

Batch Embeddings#

Processing embeddings in batch is ideal for building search indexes:

python

# Create embedding batch file
documents = ["Document text 1...", "Document text 2...", ...]

with open("embedding_batch.jsonl", "w") as f:
 for i, doc in enumerate(documents):
 request = {
 "custom_id": f"emb-{i:06d}",
 "method": "POST",
 "url": "/v1/embeddings",
 "body": {
 "model": "text-embedding-3-small",
 "input": doc
 }
 }
 f.write(json.dumps(request) + "\n")

# Submit and process same as above

DIY Async Batch Processing#

If a provider doesn't offer a native batch API, build your own with async:

python

import asyncio
import aiohttp
from typing import List

async def process_batch_async(
 items: List[str],
 api_key: str,
 model: str = "gpt-5-mini",
 max_concurrent: int = 50,
 base_url: str = "https://api.crazyrouter.com/v1"
):
 """Process items concurrently with rate limiting"""
 semaphore = asyncio.Semaphore(max_concurrent)
 results = [None] * len(items)
 
 async def process_one(session, index, text):
 async with semaphore:
 async with session.post(
 f"{base_url}/chat/completions",
 headers={"Authorization": f"Bearer {api_key}"},
 json={
 "model": model,
 "messages": [{"role": "user", "content": text}],
 "max_tokens": 256
 }
 ) as resp:
 data = await resp.json()
 results[index] = data["choices"][0]["message"]["content"]
 
 async with aiohttp.ClientSession() as session:
 tasks = [process_one(session, i, item) for i, item in enumerate(items)]
 await asyncio.gather(*tasks)
 
 return results

# Run
results = asyncio.run(process_batch_async(
 items=["Summarize: ...", "Classify: ...", ...],
 api_key="your-api-key",
 max_concurrent=50
))

Pricing: Batch vs Real-Time#

Provider	Model	Real-Time (per 1M tokens)	Batch (per 1M tokens)	Savings
Crazyrouter	GPT-5-mini	0.60	0.30	50%
Crazyrouter	GPT-5	9.00	4.50	50%
OpenAI	GPT-5-mini	1.00	0.50	50%
OpenAI	GPT-5	15.00	7.50	50%
Crazyrouter	Claude Sonnet	5.00	2.50	50%

With Crazyrouter, you save on both the base price (20-40% cheaper than official) AND the batch discount (additional 50%), compounding your savings.

When to Use Batch vs Real-Time#

Scenario	Batch	Real-Time
Latency requirement	Hours OK	Seconds needed
Volume	1,000+ requests	1-100 requests
Cost sensitivity	High	Low
Data classification	✅ Best choice	Overkill
User-facing chatbot	❌ Too slow	✅ Required
Nightly data pipeline	✅ Perfect	❌ Wasteful
Content generation (bulk)	✅ Best choice	Can work
Embedding large corpus	✅ Best choice	Expensive

Best Practices#

1. Optimize Prompt Length#

In batch processing, every extra token multiplied by millions adds up. Keep system prompts concise:

python

# ❌ Verbose (wastes tokens × millions of requests)
system = "You are a helpful assistant. Your job is to classify the sentiment of customer reviews. Please analyze the text carefully and determine whether the overall sentiment is positive, negative, or neutral. Respond with just the classification."

# ✅ Concise (saves millions of tokens) 
system = "Classify sentiment: positive/negative/neutral. One word only."

2. Use the Cheapest Adequate Model#

For simple classification tasks, GPT-5-mini or Claude Haiku is often sufficient:

Task	Recommended Model	Why
Sentiment classification	GPT-5-mini	Simple task, cheapest
Content summarization	Claude Sonnet	Good quality/price
Complex analysis	GPT-5 or Claude Opus	Accuracy matters
Embeddings	text-embedding-3-small	Cheapest, good quality

3. Handle Failures Gracefully#

python

# Check for failed requests and retry them
failed_requests = []
for line in results_text.strip().split("\n"):
 result = json.loads(line)
 if result.get("error"):
 failed_requests.append(result["custom_id"])

if failed_requests:
 print(f"Retrying {len(failed_requests)} failed requests...")
 # Rebuild JSONL with only failed requests and resubmit

Frequently Asked Questions#

How long does batch processing take?#

Most providers guarantee completion within 24 hours, but typical turnaround is 1-6 hours depending on queue depth. OpenAI's batch API usually completes within 2-4 hours for most workloads.

Is batch processing cheaper than real-time?#

Yes, significantly. OpenAI offers 50% discount for batch requests. Through Crazyrouter, you save an additional 20-40% on base pricing, making batch processing extremely cost-effective.

What's the maximum batch size?#

OpenAI's Batch API supports up to 50,000 requests per batch and 200MB per input file. For larger workloads, split into multiple batches and run them concurrently.

Can I cancel a batch job?#

Yes, most batch APIs support cancellation. Already-completed requests within the batch will still be billed, but remaining requests will be cancelled.

Does batch processing work with all models?#

Most text-based models support batch processing. Image generation and audio models typically don't have batch APIs—use async concurrent requests instead.

How do I handle rate limits in async batch processing?#

Use a semaphore (shown in the async example above) to limit concurrent requests. Start with 50 concurrent requests and adjust based on the provider's rate limits. Crazyrouter offers higher rate limits for batch workloads.

Summary#

Batch processing is essential for any application that needs to process large volumes of AI requests efficiently. Whether you use OpenAI's native Batch API for 50% cost savings or build your own async pipeline, the key is matching the right approach to your latency and cost requirements.

Crazyrouter makes batch processing even more affordable by offering competitive base pricing plus support for batch APIs across multiple providers—all through a single API key. Process millions of requests across GPT-5, Claude, Gemini, and 300+ other models without managing multiple accounts.

Start batch processing at crazyrouter.com →

Implementation Guides

Quick Start GuideMake the first Crazyrouter API call and validate your setup.Usage Logs and Cost MonitoringUse management APIs to query logs, quota, token usage, and dollar cost.List ModelsQuery models available to the current API key through GET /v1/models.API EndpointsChoose the correct base URL for OpenAI-compatible, Claude, and Gemini clients.

Crazyrouter

Read the docs Check live pricing Open image tool Create account

Topics

API GuidesTutorial

URL: https://crazyrouter.com/en/blog/ai-batch-processing-api-guide-2026

⇱ AI Batch Processing API Guide 2026: Process Millions of Requests Efficiently - Crazyrouter

AI Batch Processing API Guide 2026: Process Millions of Requests Efficiently#

What is AI Batch Processing?#

OpenAI Batch API: Complete Tutorial#

Step 1: Prepare Your Input File (JSONL)#

Step 2: Upload and Submit the Batch#

Step 3: Monitor Progress#

Step 4: Download and Process Results#

Complete Batch Processing Pipeline#

Batch Embeddings#

DIY Async Batch Processing#

Pricing: Batch vs Real-Time#

When to Use Batch vs Real-Time#

Best Practices#

1. Optimize Prompt Length#

2. Use the Cheapest Adequate Model#

3. Handle Failures Gracefully#

Frequently Asked Questions#

How long does batch processing take?#

Is batch processing cheaper than real-time?#

What's the maximum batch size?#

Can I cancel a batch job?#

Does batch processing work with all models?#

How do I handle rate limits in async batch processing?#

Summary#

Implementation Guides

Topics

Related Posts

AI Prompt Engineering Best Practices: The Developer's Guide for 2026

AI Image API Playground: Test GPT Image, Imagen, Qwen Image and FLUX Online

AI Automation: Build Intelligent Workflows That Work 24/7

Crazyrouter Codex CLI: Use Codex with One API Key and an OpenAI-Compatible Gateway

How to Integrate Suno AI Music API: Complete Developer Guide

How to Access DeepSeek, Qwen and GLM Models with One API in 2026