VOOZH about

URL: https://crazyrouter.com/en/blog/ai-batch-processing-api-guide-2026

⇱ AI Batch Processing API Guide 2026: Process Millions of Requests Efficiently - Crazyrouter


Back to Blog

AI Batch Processing API Guide 2026: Process Millions of Requests Efficiently#

When you need to process thousands or millions of AI requests—whether for data classification, content generation, document analysis, or embeddings—real-time API calls become impractical and expensive. Batch processing APIs offer a solution: submit large volumes of requests, get results within hours, and save up to 50% on costs.

What is AI Batch Processing?#

AI batch processing lets you submit many requests at once instead of making individual API calls. The provider processes them asynchronously, typically within a 24-hour window, and returns all results together.

Key benefits:

  • 50% cost savings — Most providers offer significant discounts for batch jobs
  • Higher throughput — Process millions of requests without rate limits
  • Simplified infrastructure — No need to build retry logic and queue systems
  • Better resource utilization — Providers can schedule batch jobs during off-peak hours

Common use cases:

  • Classifying millions of customer support tickets
  • Generating product descriptions for e-commerce catalogs
  • Analyzing financial documents at scale
  • Creating embeddings for large document collections
  • Translating content libraries
  • Evaluating LLM outputs for quality assessment

OpenAI Batch API: Complete Tutorial#

OpenAI's Batch API is the most mature batch processing solution, offering 50% cost reduction for requests processed within 24 hours.

Step 1: Prepare Your Input File (JSONL)#

Each line in the JSONL file is an independent request:

python
import json

# Create batch input file
requests_data = [
 {
 "custom_id": f"request-{i}",
 "method": "POST",
 "url": "/v1/chat/completions",
 "body": {
 "model": "gpt-5-mini",
 "messages": [
 {"role": "system", "content": "Classify the sentiment as positive, negative, or neutral."},
 {"role": "user", "content": text}
 ],
 "max_tokens": 10
 }
 }
 for i, text in enumerate(customer_reviews) # Your data
]

# Write JSONL file
with open("batch_input.jsonl", "w") as f:
 for req in requests_data:
 f.write(json.dumps(req) + "\n")

print(f"Created batch with {len(requests_data)} requests")

Step 2: Upload and Submit the Batch#

python
from openai import OpenAI

client = OpenAI(
 api_key="your-api-key",
 base_url="https://api.crazyrouter.com/v1"
)

# Upload the input file
batch_file = client.files.create(
 file=open("batch_input.jsonl", "rb"),
 purpose="batch"
)

print(f"Uploaded file: {batch_file.id}")

# Create the batch job
batch_job = client.batches.create(
 input_file_id=batch_file.id,
 endpoint="/v1/chat/completions",
 completion_window="24h",
 metadata={"description": "Sentiment classification batch"}
)

print(f"Batch job created: {batch_job.id}")
print(f"Status: {batch_job.status}")

Step 3: Monitor Progress#

python
import time

def wait_for_batch(client, batch_id, poll_interval=60):
 """Poll batch status until completion"""
 while True:
 batch = client.batches.retrieve(batch_id)
 print(f"Status: {batch.status} | "
 f"Completed: {batch.request_counts.completed}/{batch.request_counts.total} | "
 f"Failed: {batch.request_counts.failed}")
 
 if batch.status == "completed":
 return batch
 elif batch.status in ["failed", "expired", "cancelled"]:
 raise Exception(f"Batch {batch.status}: {batch.errors}")
 
 time.sleep(poll_interval)

completed_batch = wait_for_batch(client, batch_job.id)
print(f"Output file: {completed_batch.output_file_id}")

Step 4: Download and Process Results#

python
# Download results
result_file = client.files.content(completed_batch.output_file_id)
results = result_file.text

# Parse results
processed = []
for line in results.strip().split("\n"):
 result = json.loads(line)
 custom_id = result["custom_id"]
 response = result["response"]["body"]["choices"][0]["message"]["content"]
 processed.append({"id": custom_id, "sentiment": response})

print(f"Processed {len(processed)} results")

# Show sample
for item in processed[:5]:
 print(f" {item['id']}: {item['sentiment']}")

Complete Batch Processing Pipeline#

Here's a production-ready pipeline:

python
import json
import time
from pathlib import Path
from openai import OpenAI

class BatchProcessor:
 def __init__(self, api_key: str, base_url: str = "https://api.crazyrouter.com/v1"):
 self.client = OpenAI(api_key=api_key, base_url=base_url)
 
 def create_batch_file(self, items: list, system_prompt: str, 
 model: str = "gpt-5-mini", output_path: str = "batch_input.jsonl"):
 """Create JSONL input file from a list of items"""
 with open(output_path, "w") as f:
 for i, item in enumerate(items):
 request = {
 "custom_id": f"req-{i:06d}",
 "method": "POST",
 "url": "/v1/chat/completions",
 "body": {
 "model": model,
 "messages": [
 {"role": "system", "content": system_prompt},
 {"role": "user", "content": str(item)}
 ],
 "max_tokens": 256,
 "temperature": 0
 }
 }
 f.write(json.dumps(request) + "\n")
 
 print(f"Created {output_path} with {len(items)} requests")
 return output_path
 
 def submit_batch(self, input_path: str, description: str = "") -> str:
 """Upload file and create batch job"""
 # Upload
 with open(input_path, "rb") as f:
 uploaded = self.client.files.create(file=f, purpose="batch")
 
 # Create batch
 batch = self.client.batches.create(
 input_file_id=uploaded.id,
 endpoint="/v1/chat/completions",
 completion_window="24h",
 metadata={"description": description}
 )
 
 print(f"Batch submitted: {batch.id}")
 return batch.id
 
 def wait_and_download(self, batch_id: str, poll_interval: int = 60) -> list:
 """Wait for completion and download results"""
 while True:
 batch = self.client.batches.retrieve(batch_id)
 completed = batch.request_counts.completed
 total = batch.request_counts.total
 failed = batch.request_counts.failed
 
 print(f"\r Progress: {completed}/{total} ({failed} failed)", end="", flush=True)
 
 if batch.status == "completed":
 print("\n ✅ Batch completed!")
 break
 elif batch.status in ["failed", "expired", "cancelled"]:
 print(f"\n ❌ Batch {batch.status}")
 raise Exception(f"Batch failed: {batch.errors}")
 
 time.sleep(poll_interval)
 
 # Download results
 content = self.client.files.content(batch.output_file_id)
 results = []
 for line in content.text.strip().split("\n"):
 data = json.loads(line)
 results.append({
 "id": data["custom_id"],
 "response": data["response"]["body"]["choices"][0]["message"]["content"],
 "tokens": data["response"]["body"]["usage"]["total_tokens"]
 })
 
 return sorted(results, key=lambda x: x["id"])

# Usage example
processor = BatchProcessor(api_key="your-crazyrouter-key")

# Classify 10,000 customer reviews
reviews = ["Great product, love it!", "Terrible service, never again.", ...] # Your data

processor.create_batch_file(
 items=reviews,
 system_prompt="Classify sentiment as: positive, negative, or neutral. Reply with one word only.",
 model="gpt-5-mini"
)

batch_id = processor.submit_batch("batch_input.jsonl", "Customer review sentiment analysis")
results = processor.wait_and_download(batch_id)

for r in results[:10]:
 print(f"{r['id']}: {r['response']} ({r['tokens']} tokens)")

Batch Embeddings#

Processing embeddings in batch is ideal for building search indexes:

python
# Create embedding batch file
documents = ["Document text 1...", "Document text 2...", ...]

with open("embedding_batch.jsonl", "w") as f:
 for i, doc in enumerate(documents):
 request = {
 "custom_id": f"emb-{i:06d}",
 "method": "POST",
 "url": "/v1/embeddings",
 "body": {
 "model": "text-embedding-3-small",
 "input": doc
 }
 }
 f.write(json.dumps(request) + "\n")

# Submit and process same as above

DIY Async Batch Processing#

If a provider doesn't offer a native batch API, build your own with async:

python
import asyncio
import aiohttp
from typing import List

async def process_batch_async(
 items: List[str],
 api_key: str,
 model: str = "gpt-5-mini",
 max_concurrent: int = 50,
 base_url: str = "https://api.crazyrouter.com/v1"
):
 """Process items concurrently with rate limiting"""
 semaphore = asyncio.Semaphore(max_concurrent)
 results = [None] * len(items)
 
 async def process_one(session, index, text):
 async with semaphore:
 async with session.post(
 f"{base_url}/chat/completions",
 headers={"Authorization": f"Bearer {api_key}"},
 json={
 "model": model,
 "messages": [{"role": "user", "content": text}],
 "max_tokens": 256
 }
 ) as resp:
 data = await resp.json()
 results[index] = data["choices"][0]["message"]["content"]
 
 async with aiohttp.ClientSession() as session:
 tasks = [process_one(session, i, item) for i, item in enumerate(items)]
 await asyncio.gather(*tasks)
 
 return results

# Run
results = asyncio.run(process_batch_async(
 items=["Summarize: ...", "Classify: ...", ...],
 api_key="your-api-key",
 max_concurrent=50
))

Pricing: Batch vs Real-Time#

ProviderModelReal-Time (per 1M tokens)Batch (per 1M tokens)Savings
CrazyrouterGPT-5-mini0.600.3050%
CrazyrouterGPT-59.004.5050%
OpenAIGPT-5-mini1.000.5050%
OpenAIGPT-515.007.5050%
CrazyrouterClaude Sonnet5.002.5050%

With Crazyrouter, you save on both the base price (20-40% cheaper than official) AND the batch discount (additional 50%), compounding your savings.

When to Use Batch vs Real-Time#

ScenarioBatchReal-Time
Latency requirementHours OKSeconds needed
Volume1,000+ requests1-100 requests
Cost sensitivityHighLow
Data classification✅ Best choiceOverkill
User-facing chatbot❌ Too slow✅ Required
Nightly data pipeline✅ Perfect❌ Wasteful
Content generation (bulk)✅ Best choiceCan work
Embedding large corpus✅ Best choiceExpensive

Best Practices#

1. Optimize Prompt Length#

In batch processing, every extra token multiplied by millions adds up. Keep system prompts concise:

python
# ❌ Verbose (wastes tokens × millions of requests)
system = "You are a helpful assistant. Your job is to classify the sentiment of customer reviews. Please analyze the text carefully and determine whether the overall sentiment is positive, negative, or neutral. Respond with just the classification."

# ✅ Concise (saves millions of tokens) 
system = "Classify sentiment: positive/negative/neutral. One word only."

2. Use the Cheapest Adequate Model#

For simple classification tasks, GPT-5-mini or Claude Haiku is often sufficient:

TaskRecommended ModelWhy
Sentiment classificationGPT-5-miniSimple task, cheapest
Content summarizationClaude SonnetGood quality/price
Complex analysisGPT-5 or Claude OpusAccuracy matters
Embeddingstext-embedding-3-smallCheapest, good quality

3. Handle Failures Gracefully#

python
# Check for failed requests and retry them
failed_requests = []
for line in results_text.strip().split("\n"):
 result = json.loads(line)
 if result.get("error"):
 failed_requests.append(result["custom_id"])

if failed_requests:
 print(f"Retrying {len(failed_requests)} failed requests...")
 # Rebuild JSONL with only failed requests and resubmit

Frequently Asked Questions#

How long does batch processing take?#

Most providers guarantee completion within 24 hours, but typical turnaround is 1-6 hours depending on queue depth. OpenAI's batch API usually completes within 2-4 hours for most workloads.

Is batch processing cheaper than real-time?#

Yes, significantly. OpenAI offers 50% discount for batch requests. Through Crazyrouter, you save an additional 20-40% on base pricing, making batch processing extremely cost-effective.

What's the maximum batch size?#

OpenAI's Batch API supports up to 50,000 requests per batch and 200MB per input file. For larger workloads, split into multiple batches and run them concurrently.

Can I cancel a batch job?#

Yes, most batch APIs support cancellation. Already-completed requests within the batch will still be billed, but remaining requests will be cancelled.

Does batch processing work with all models?#

Most text-based models support batch processing. Image generation and audio models typically don't have batch APIs—use async concurrent requests instead.

How do I handle rate limits in async batch processing?#

Use a semaphore (shown in the async example above) to limit concurrent requests. Start with 50 concurrent requests and adjust based on the provider's rate limits. Crazyrouter offers higher rate limits for batch workloads.

Summary#

Batch processing is essential for any application that needs to process large volumes of AI requests efficiently. Whether you use OpenAI's native Batch API for 50% cost savings or build your own async pipeline, the key is matching the right approach to your latency and cost requirements.

Crazyrouter makes batch processing even more affordable by offering competitive base pricing plus support for batch APIs across multiple providers—all through a single API key. Process millions of requests across GPT-5, Claude, Gemini, and 300+ other models without managing multiple accounts.

Start batch processing at crazyrouter.com →

Implementation Guides

Topics

API GuidesTutorial

Related Posts

AI Prompt Engineering Best Practices: The Developer's Guide for 2026

"Master prompt engineering for GPT, Claude, and Gemini. Learn proven techniques, templates, and best practices to get better results from any AI model."

Feb 27

AI Image API Playground: Test GPT Image, Imagen, Qwen Image and FLUX Online

A practical guide for developers who need to compare AI image generation models before building production code. Learn how to test GPT Image, Imagen, Qwen Image, FLUX, and DALL-E style workflows from one playground and one API key.

Jun 4

AI Automation: Build Intelligent Workflows That Work 24/7

AI automation goes beyond chatbots. Modern AI can monitor your inbox, manage your calendar, process documents, and handle repetitive tasks while you sleep.

Jan 26

Crazyrouter Codex CLI: Use Codex with One API Key and an OpenAI-Compatible Gateway

Set up OpenAI Codex CLI through Crazyrouter with one command on Windows, macOS, and Linux. Use an OpenAI-compatible base URL, one API key, and model routing for GPT, Claude, Gemini, DeepSeek, and Qwen-style workflows.

Jun 4

How to Integrate Suno AI Music API: Complete Developer Guide

This tutorial shows you how to integrate Suno AI music generation into your applications using the OpenAI-compatible API format. Generate songs, create lyrics, and build AI-powered music applications.

Jan 22

How to Access DeepSeek, Qwen and GLM Models with One API in 2026

A tested guide to accessing DeepSeek, Qwen and GLM model families through one OpenAI-compatible API endpoint using Crazyrouter.

Jun 18