Voozh

👁 How to Build AI-Powered Applications: A Developer's Guide

Crazyrouter

Read the docs Check live pricing Open image tool Create account

Building applications with AI capabilities has never been more accessible. Whether you're adding a chatbot to your SaaS, building an AI writing assistant, or creating intelligent automation, this guide covers everything from architecture decisions to production deployment.

Understanding AI Application Architecture#

Basic Architecture Pattern#

Most AI applications follow this pattern:

code

User Input → Your Application → AI API → Response Processing → User Output

Key components:

Frontend: User interface for input/output
Backend: Business logic and API orchestration
AI Layer: Model selection and prompt management
Data Layer: Context storage and caching

Choosing Your AI Integration Approach#

Approach	Complexity	Flexibility	Cost
Direct API calls	Low	Medium	Variable
SDK/Library	Low	High	Variable
API Gateway	Medium	Very High	Lower
Self-hosted models	High	Maximum	Fixed

Getting Started: Your First AI Feature#

Step 1: Choose Your Model#

For most applications, start with:

Use Case	Recommended Model	Why
Chatbot	GPT-4o Mini or Claude Haiku	Fast, cheap, good enough
Content generation	Claude Sonnet or GPT-4o	Better quality
Code assistance	Claude Sonnet or GPT-4	Strong reasoning
Document analysis	Claude (200K context)	Long context window

Step 2: Set Up API Access#

Option A: Direct Provider Access

python

# OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# Anthropic
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")

Option B: API Gateway (Recommended)

Using a gateway like Crazyrouter simplifies multi-model access:

python

from openai import OpenAI

# Single endpoint for all models
client = OpenAI(
 api_key="your-gateway-key",
 base_url="https://api.crazyrouter.com/v1"
)

# Use any model with the same code
response = client.chat.completions.create(
 model="claude-3-5-sonnet", # or "gpt-4o", "gemini-pro", etc.
 messages=[{"role": "user", "content": "Hello!"}]
)

Step 3: Basic Implementation#

Here's a minimal chatbot implementation:

python

from openai import OpenAI

client = OpenAI(
 api_key="your-api-key",
 base_url="https://api.crazyrouter.com/v1"
)

def chat(user_message: str, conversation_history: list) -> str:
 conversation_history.append({
 "role": "user",
 "content": user_message
 })

 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=conversation_history,
 max_tokens=1000
 )

 assistant_message = response.choices[0].message.content
 conversation_history.append({
 "role": "assistant",
 "content": assistant_message
 })

 return assistant_message

Building Production-Ready AI Features#

Prompt Engineering Best Practices#

1. Use System Prompts

python

messages = [
 {
 "role": "system",
 "content": """You are a helpful customer support agent for TechCorp.
 - Be friendly and professional
 - If you don't know something, say so
 - Never make up information about products
 - For billing issues, direct users to billing@techcorp.com"""
 },
 {"role": "user", "content": user_input}
]

2. Structure Your Prompts

python

prompt = f"""
Task: Summarize the following article
Format: 3 bullet points, max 20 words each
Tone: Professional

Article:
{article_text}

Summary:
"""

3. Use Few-Shot Examples

python

messages = [
 {"role": "system", "content": "You classify customer feedback as positive, negative, or neutral."},
 {"role": "user", "content": "Great product, love it!"},
 {"role": "assistant", "content": "positive"},
 {"role": "user", "content": "Worst purchase ever."},
 {"role": "assistant", "content": "negative"},
 {"role": "user", "content": actual_feedback}
]

Error Handling and Resilience#

Handle API Errors Gracefully

python

import time
from openai import OpenAI, APIError, RateLimitError

def call_ai_with_retry(messages, max_retries=3):
 for attempt in range(max_retries):
 try:
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=messages
 )
 return response.choices[0].message.content

 except RateLimitError:
 wait_time = 2 ** attempt # Exponential backoff
 time.sleep(wait_time)
 continue

 except APIError as e:
 if attempt == max_retries - 1:
 raise
 time.sleep(1)
 continue

 raise Exception("Max retries exceeded")

Implement Fallback Models

python

MODELS = ["gpt-4o", "claude-3-5-sonnet", "gpt-4o-mini"]

def call_with_fallback(messages):
 for model in MODELS:
 try:
 response = client.chat.completions.create(
 model=model,
 messages=messages
 )
 return response.choices[0].message.content
 except Exception as e:
 print(f"Model {model} failed: {e}")
 continue

 raise Exception("All models failed")

Streaming Responses#

For better UX, stream responses instead of waiting:

python

def stream_response(messages):
 stream = client.chat.completions.create(
 model="gpt-4o",
 messages=messages,
 stream=True
 )

 for chunk in stream:
 if chunk.choices[0].delta.content:
 yield chunk.choices[0].delta.content

Frontend Integration (JavaScript)

javascript

async function streamChat(message) {
 const response = await fetch('/api/chat', {
 method: 'POST',
 body: JSON.stringify({ message }),
 headers: { 'Content-Type': 'application/json' }
 });

 const reader = response.body.getReader();
 const decoder = new TextDecoder();

 while (true) {
 const { done, value } = await reader.read();
 if (done) break;

 const text = decoder.decode(value);
 appendToChat(text);
 }
}

Advanced Patterns#

RAG (Retrieval-Augmented Generation)#

Combine AI with your own data:

python

from openai import OpenAI

def answer_with_context(question: str, documents: list) -> str:
 # 1. Find relevant documents (simplified)
 relevant_docs = search_documents(question, documents)

 # 2. Build context
 context = "\n\n".join([doc.content for doc in relevant_docs[:3]])

 # 3. Generate answer
 response = client.chat.completions.create(
 model="gpt-4o",
 messages=[
 {
 "role": "system",
 "content": f"""Answer questions based on the provided context.
 If the answer isn't in the context, say "I don't have that information."

 Context:
 {context}"""
 },
 {"role": "user", "content": question}
 ]
 )

 return response.choices[0].message.content

Function Calling / Tool Use#

Let AI interact with your systems:

python

tools = [
 {
 "type": "function",
 "function": {
 "name": "get_weather",
 "description": "Get current weather for a location",
 "parameters": {
 "type": "object",
 "properties": {
 "location": {"type": "string", "description": "City name"}
 },
 "required": ["location"]
 }
 }
 }
]

response = client.chat.completions.create(
 model="gpt-4o",
 messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
 tools=tools
)

# Handle tool calls
if response.choices[0].message.tool_calls:
 tool_call = response.choices[0].message.tool_calls[0]
 if tool_call.function.name == "get_weather":
 args = json.loads(tool_call.function.arguments)
 weather = get_weather(args["location"])
 # Continue conversation with result...

Multi-Model Routing#

Use different models for different tasks:

python

def route_to_model(task_type: str, content: str) -> str:
 model_map = {
 "simple_qa": "gpt-4o-mini",
 "complex_reasoning": "gpt-4o",
 "long_document": "claude-3-5-sonnet",
 "code_generation": "claude-3-5-sonnet",
 "creative_writing": "gpt-4o"
 }

 model = model_map.get(task_type, "gpt-4o-mini")

 response = client.chat.completions.create(
 model=model,
 messages=[{"role": "user", "content": content}]
 )

 return response.choices[0].message.content

Cost Optimization Strategies#

1. Implement Caching#

python

import hashlib
import redis

cache = redis.Redis()

def cached_completion(messages, ttl=3600):
 # Create cache key from messages
 key = hashlib.md5(str(messages).encode()).hexdigest()

 # Check cache
 cached = cache.get(key)
 if cached:
 return cached.decode()

 # Call API
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=messages
 )
 result = response.choices[0].message.content

 # Store in cache
 cache.setex(key, ttl, result)

 return result

2. Token Counting#

python

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
 encoding = tiktoken.encoding_for_model(model)
 return len(encoding.encode(text))

def estimate_cost(input_text: str, output_tokens: int = 500):
 input_tokens = count_tokens(input_text)

 # GPT-4o pricing
 input_cost = (input_tokens / 1_000_000) * 2.50
 output_cost = (output_tokens / 1_000_000) * 10.00

 return input_cost + output_cost

3. Use Appropriate Models#

Task Complexity	Model	Cost/1M tokens
Simple	GPT-4o Mini	$0.15 input
Medium	GPT-4o	$2.50 input
Complex	GPT-4 / Claude Opus	$10-15 input

4. Batch Processing#

For non-real-time tasks:

python

async def batch_process(items: list, batch_size: int = 10):
 results = []

 for i in range(0, len(items), batch_size):
 batch = items[i:i + batch_size]

 # Process batch concurrently
 tasks = [process_item(item) for item in batch]
 batch_results = await asyncio.gather(*tasks)
 results.extend(batch_results)

 # Respect rate limits
 await asyncio.sleep(1)

 return results

Security Best Practices#

1. Never Expose API Keys#

python

# Bad - key in code
client = OpenAI(api_key="sk-abc123...")

# Good - environment variable
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

2. Validate and Sanitize Input#

python

def sanitize_user_input(text: str) -> str:
 # Remove potential prompt injection attempts
 dangerous_patterns = [
 "ignore previous instructions",
 "disregard above",
 "new instructions:"
 ]

 text_lower = text.lower()
 for pattern in dangerous_patterns:
 if pattern in text_lower:
 raise ValueError("Invalid input detected")

 # Limit length
 return text[:10000]

3. Implement Rate Limiting#

python

from functools import wraps
import time

def rate_limit(calls_per_minute: int):
 min_interval = 60.0 / calls_per_minute
 last_called = [0.0]

 def decorator(func):
 @wraps(func)
 def wrapper(*args, **kwargs):
 elapsed = time.time() - last_called[0]
 wait_time = min_interval - elapsed

 if wait_time > 0:
 time.sleep(wait_time)

 last_called[0] = time.time()
 return func(*args, **kwargs)

 return wrapper
 return decorator

@rate_limit(calls_per_minute=60)
def call_ai(message):
 # Your AI call here
 pass

Monitoring and Observability#

Track Key Metrics#

python

import time
from dataclasses import dataclass

@dataclass
class AICallMetrics:
 model: str
 input_tokens: int
 output_tokens: int
 latency_ms: float
 success: bool
 cost: float

def track_ai_call(func):
 def wrapper(*args, **kwargs):
 start = time.time()

 try:
 result = func(*args, **kwargs)
 latency = (time.time() - start) * 1000

 # Log metrics
 log_metrics(AICallMetrics(
 model=kwargs.get('model', 'unknown'),
 input_tokens=result.usage.prompt_tokens,
 output_tokens=result.usage.completion_tokens,
 latency_ms=latency,
 success=True,
 cost=calculate_cost(result.usage)
 ))

 return result

 except Exception as e:
 log_error(e)
 raise

 return wrapper

Deployment Checklist#

Before going to production:

API keys stored securely (environment variables, secrets manager)
Rate limiting implemented
Error handling and retries in place
Fallback models configured
Input validation and sanitization
Cost monitoring and alerts set up
Response caching where appropriate
Logging and observability configured
Load testing completed

Conclusion#

Building AI-powered applications is straightforward with the right approach:

Start simple - Basic API calls, then add complexity
Use an API gateway - Simplifies multi-model access and reduces costs
Implement resilience - Retries, fallbacks, and error handling
Optimize costs - Caching, model routing, token management
Monitor everything - Track usage, costs, and performance

The AI landscape evolves quickly. Using an API gateway gives you flexibility to adopt new models without code changes.

Need reliable API access for your AI application? Crazyrouter provides a unified endpoint for 300+ models with built-in failover and competitive pricing. Start building today.

Implementation Guides

Quick Start GuideMake the first Crazyrouter API call and validate your setup.List ModelsQuery models available to the current API key through GET /v1/models.Claude Native FormatCall Claude through the Anthropic Messages API on Crazyrouter.API EndpointsChoose the correct base URL for OpenAI-compatible, Claude, and Gemini clients.

Crazyrouter

Read the docs Check live pricing Open image tool Create account

Topics

API Guides Coding AgentsTutorial

URL: https://crazyrouter.com/en/blog/how-to-build-ai-powered-applications-developer-guide

⇱ How to Build AI-Powered Applications: A Developer's Guide - Crazyrouter

Understanding AI Application Architecture#

Basic Architecture Pattern#

Choosing Your AI Integration Approach#

Getting Started: Your First AI Feature#

Step 1: Choose Your Model#

Step 2: Set Up API Access#

Step 3: Basic Implementation#

Building Production-Ready AI Features#

Prompt Engineering Best Practices#

Error Handling and Resilience#

Streaming Responses#

Advanced Patterns#

RAG (Retrieval-Augmented Generation)#

Function Calling / Tool Use#

Multi-Model Routing#

Cost Optimization Strategies#

1. Implement Caching#

2. Token Counting#

3. Use Appropriate Models#

4. Batch Processing#

Security Best Practices#

1. Never Expose API Keys#

2. Validate and Sanitize Input#

3. Implement Rate Limiting#

Monitoring and Observability#

Track Key Metrics#

Deployment Checklist#

Conclusion#

Implementation Guides

Topics

Related Posts

Recraft API Tutorial: Professional AI Design and Image Generation

Model Distillation Explained: How Small AI Models Learn from Giants

How to Access GPT-5 and GPT-5.2 via API - Complete Developer Guide

DeepSeek R2 API Guide: How to Use the Next-Gen Reasoning Model

AI Palm Reading with GPT-image-2 — Generate Professional Palmistry Analysis from a Single Photo

AI API Gateway for Thai Developers: Use GPT, Claude and Gemini with One Key