VOOZH about

URL: https://crazyrouter.com/en/blog/how-to-build-ai-powered-applications-developer-guide

⇱ How to Build AI-Powered Applications: A Developer's Guide - Crazyrouter


Back to Blog

Building applications with AI capabilities has never been more accessible. Whether you're adding a chatbot to your SaaS, building an AI writing assistant, or creating intelligent automation, this guide covers everything from architecture decisions to production deployment.

Understanding AI Application Architecture#

Basic Architecture Pattern#

Most AI applications follow this pattern:

code
User Input → Your Application → AI API → Response Processing → User Output

Key components:

  1. Frontend: User interface for input/output
  2. Backend: Business logic and API orchestration
  3. AI Layer: Model selection and prompt management
  4. Data Layer: Context storage and caching

Choosing Your AI Integration Approach#

ApproachComplexityFlexibilityCost
Direct API callsLowMediumVariable
SDK/LibraryLowHighVariable
API GatewayMediumVery HighLower
Self-hosted modelsHighMaximumFixed

Getting Started: Your First AI Feature#

Step 1: Choose Your Model#

For most applications, start with:

Use CaseRecommended ModelWhy
ChatbotGPT-4o Mini or Claude HaikuFast, cheap, good enough
Content generationClaude Sonnet or GPT-4oBetter quality
Code assistanceClaude Sonnet or GPT-4Strong reasoning
Document analysisClaude (200K context)Long context window

Step 2: Set Up API Access#

Option A: Direct Provider Access

python
# OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# Anthropic
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")

Option B: API Gateway (Recommended)

Using a gateway like Crazyrouter simplifies multi-model access:

python
from openai import OpenAI

# Single endpoint for all models
client = OpenAI(
 api_key="your-gateway-key",
 base_url="https://api.crazyrouter.com/v1"
)

# Use any model with the same code
response = client.chat.completions.create(
 model="claude-3-5-sonnet", # or "gpt-4o", "gemini-pro", etc.
 messages=[{"role": "user", "content": "Hello!"}]
)

Step 3: Basic Implementation#

Here's a minimal chatbot implementation:

python
from openai import OpenAI

client = OpenAI(
 api_key="your-api-key",
 base_url="https://api.crazyrouter.com/v1"
)

def chat(user_message: str, conversation_history: list) -> str:
 conversation_history.append({
 "role": "user",
 "content": user_message
 })

 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=conversation_history,
 max_tokens=1000
 )

 assistant_message = response.choices[0].message.content
 conversation_history.append({
 "role": "assistant",
 "content": assistant_message
 })

 return assistant_message

Building Production-Ready AI Features#

Prompt Engineering Best Practices#

1. Use System Prompts

python
messages = [
 {
 "role": "system",
 "content": """You are a helpful customer support agent for TechCorp.
 - Be friendly and professional
 - If you don't know something, say so
 - Never make up information about products
 - For billing issues, direct users to billing@techcorp.com"""
 },
 {"role": "user", "content": user_input}
]

2. Structure Your Prompts

python
prompt = f"""
Task: Summarize the following article
Format: 3 bullet points, max 20 words each
Tone: Professional

Article:
{article_text}

Summary:
"""

3. Use Few-Shot Examples

python
messages = [
 {"role": "system", "content": "You classify customer feedback as positive, negative, or neutral."},
 {"role": "user", "content": "Great product, love it!"},
 {"role": "assistant", "content": "positive"},
 {"role": "user", "content": "Worst purchase ever."},
 {"role": "assistant", "content": "negative"},
 {"role": "user", "content": actual_feedback}
]

Error Handling and Resilience#

Handle API Errors Gracefully

python
import time
from openai import OpenAI, APIError, RateLimitError

def call_ai_with_retry(messages, max_retries=3):
 for attempt in range(max_retries):
 try:
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=messages
 )
 return response.choices[0].message.content

 except RateLimitError:
 wait_time = 2 ** attempt # Exponential backoff
 time.sleep(wait_time)
 continue

 except APIError as e:
 if attempt == max_retries - 1:
 raise
 time.sleep(1)
 continue

 raise Exception("Max retries exceeded")

Implement Fallback Models

python
MODELS = ["gpt-4o", "claude-3-5-sonnet", "gpt-4o-mini"]

def call_with_fallback(messages):
 for model in MODELS:
 try:
 response = client.chat.completions.create(
 model=model,
 messages=messages
 )
 return response.choices[0].message.content
 except Exception as e:
 print(f"Model {model} failed: {e}")
 continue

 raise Exception("All models failed")

Streaming Responses#

For better UX, stream responses instead of waiting:

python
def stream_response(messages):
 stream = client.chat.completions.create(
 model="gpt-4o",
 messages=messages,
 stream=True
 )

 for chunk in stream:
 if chunk.choices[0].delta.content:
 yield chunk.choices[0].delta.content

Frontend Integration (JavaScript)

javascript
async function streamChat(message) {
 const response = await fetch('/api/chat', {
 method: 'POST',
 body: JSON.stringify({ message }),
 headers: { 'Content-Type': 'application/json' }
 });

 const reader = response.body.getReader();
 const decoder = new TextDecoder();

 while (true) {
 const { done, value } = await reader.read();
 if (done) break;

 const text = decoder.decode(value);
 appendToChat(text);
 }
}

Advanced Patterns#

RAG (Retrieval-Augmented Generation)#

Combine AI with your own data:

python
from openai import OpenAI

def answer_with_context(question: str, documents: list) -> str:
 # 1. Find relevant documents (simplified)
 relevant_docs = search_documents(question, documents)

 # 2. Build context
 context = "\n\n".join([doc.content for doc in relevant_docs[:3]])

 # 3. Generate answer
 response = client.chat.completions.create(
 model="gpt-4o",
 messages=[
 {
 "role": "system",
 "content": f"""Answer questions based on the provided context.
 If the answer isn't in the context, say "I don't have that information."

 Context:
 {context}"""
 },
 {"role": "user", "content": question}
 ]
 )

 return response.choices[0].message.content

Function Calling / Tool Use#

Let AI interact with your systems:

python
tools = [
 {
 "type": "function",
 "function": {
 "name": "get_weather",
 "description": "Get current weather for a location",
 "parameters": {
 "type": "object",
 "properties": {
 "location": {"type": "string", "description": "City name"}
 },
 "required": ["location"]
 }
 }
 }
]

response = client.chat.completions.create(
 model="gpt-4o",
 messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
 tools=tools
)

# Handle tool calls
if response.choices[0].message.tool_calls:
 tool_call = response.choices[0].message.tool_calls[0]
 if tool_call.function.name == "get_weather":
 args = json.loads(tool_call.function.arguments)
 weather = get_weather(args["location"])
 # Continue conversation with result...

Multi-Model Routing#

Use different models for different tasks:

python
def route_to_model(task_type: str, content: str) -> str:
 model_map = {
 "simple_qa": "gpt-4o-mini",
 "complex_reasoning": "gpt-4o",
 "long_document": "claude-3-5-sonnet",
 "code_generation": "claude-3-5-sonnet",
 "creative_writing": "gpt-4o"
 }

 model = model_map.get(task_type, "gpt-4o-mini")

 response = client.chat.completions.create(
 model=model,
 messages=[{"role": "user", "content": content}]
 )

 return response.choices[0].message.content

Cost Optimization Strategies#

1. Implement Caching#

python
import hashlib
import redis

cache = redis.Redis()

def cached_completion(messages, ttl=3600):
 # Create cache key from messages
 key = hashlib.md5(str(messages).encode()).hexdigest()

 # Check cache
 cached = cache.get(key)
 if cached:
 return cached.decode()

 # Call API
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=messages
 )
 result = response.choices[0].message.content

 # Store in cache
 cache.setex(key, ttl, result)

 return result

2. Token Counting#

python
import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
 encoding = tiktoken.encoding_for_model(model)
 return len(encoding.encode(text))

def estimate_cost(input_text: str, output_tokens: int = 500):
 input_tokens = count_tokens(input_text)

 # GPT-4o pricing
 input_cost = (input_tokens / 1_000_000) * 2.50
 output_cost = (output_tokens / 1_000_000) * 10.00

 return input_cost + output_cost

3. Use Appropriate Models#

Task ComplexityModelCost/1M tokens
SimpleGPT-4o Mini$0.15 input
MediumGPT-4o$2.50 input
ComplexGPT-4 / Claude Opus$10-15 input

4. Batch Processing#

For non-real-time tasks:

python
async def batch_process(items: list, batch_size: int = 10):
 results = []

 for i in range(0, len(items), batch_size):
 batch = items[i:i + batch_size]

 # Process batch concurrently
 tasks = [process_item(item) for item in batch]
 batch_results = await asyncio.gather(*tasks)
 results.extend(batch_results)

 # Respect rate limits
 await asyncio.sleep(1)

 return results

Security Best Practices#

1. Never Expose API Keys#

python
# Bad - key in code
client = OpenAI(api_key="sk-abc123...")

# Good - environment variable
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

2. Validate and Sanitize Input#

python
def sanitize_user_input(text: str) -> str:
 # Remove potential prompt injection attempts
 dangerous_patterns = [
 "ignore previous instructions",
 "disregard above",
 "new instructions:"
 ]

 text_lower = text.lower()
 for pattern in dangerous_patterns:
 if pattern in text_lower:
 raise ValueError("Invalid input detected")

 # Limit length
 return text[:10000]

3. Implement Rate Limiting#

python
from functools import wraps
import time

def rate_limit(calls_per_minute: int):
 min_interval = 60.0 / calls_per_minute
 last_called = [0.0]

 def decorator(func):
 @wraps(func)
 def wrapper(*args, **kwargs):
 elapsed = time.time() - last_called[0]
 wait_time = min_interval - elapsed

 if wait_time > 0:
 time.sleep(wait_time)

 last_called[0] = time.time()
 return func(*args, **kwargs)

 return wrapper
 return decorator

@rate_limit(calls_per_minute=60)
def call_ai(message):
 # Your AI call here
 pass

Monitoring and Observability#

Track Key Metrics#

python
import time
from dataclasses import dataclass

@dataclass
class AICallMetrics:
 model: str
 input_tokens: int
 output_tokens: int
 latency_ms: float
 success: bool
 cost: float

def track_ai_call(func):
 def wrapper(*args, **kwargs):
 start = time.time()

 try:
 result = func(*args, **kwargs)
 latency = (time.time() - start) * 1000

 # Log metrics
 log_metrics(AICallMetrics(
 model=kwargs.get('model', 'unknown'),
 input_tokens=result.usage.prompt_tokens,
 output_tokens=result.usage.completion_tokens,
 latency_ms=latency,
 success=True,
 cost=calculate_cost(result.usage)
 ))

 return result

 except Exception as e:
 log_error(e)
 raise

 return wrapper

Deployment Checklist#

Before going to production:

  • API keys stored securely (environment variables, secrets manager)
  • Rate limiting implemented
  • Error handling and retries in place
  • Fallback models configured
  • Input validation and sanitization
  • Cost monitoring and alerts set up
  • Response caching where appropriate
  • Logging and observability configured
  • Load testing completed

Conclusion#

Building AI-powered applications is straightforward with the right approach:

  1. Start simple - Basic API calls, then add complexity
  2. Use an API gateway - Simplifies multi-model access and reduces costs
  3. Implement resilience - Retries, fallbacks, and error handling
  4. Optimize costs - Caching, model routing, token management
  5. Monitor everything - Track usage, costs, and performance

The AI landscape evolves quickly. Using an API gateway gives you flexibility to adopt new models without code changes.


Need reliable API access for your AI application? Crazyrouter provides a unified endpoint for 300+ models with built-in failover and competitive pricing. Start building today.

Implementation Guides

Related Posts

Recraft API Tutorial: Professional AI Design and Image Generation

Complete guide to using Recraft's AI design API for generating professional vector graphics, icons, illustrations, and images. Includes code examples and pricing.

Feb 22

Model Distillation Explained: How Small AI Models Learn from Giants

"A complete guide to knowledge distillation in AI. Learn how DeepSeek, GPT-4o-mini, Gemini Flash, and Claude Haiku were built by distilling larger models, and how developers can use distillation to cut costs."

Mar 30

How to Access GPT-5 and GPT-5.2 via API - Complete Developer Guide

Learn how to access OpenAI's latest GPT-5, GPT-5.2, and o3-pro models through a unified API. Step-by-step guide with Python, Node.js, and curl examples.

Jan 22

DeepSeek R2 API Guide: How to Use the Next-Gen Reasoning Model

Complete guide to DeepSeek R2, the advanced reasoning model. Learn about its capabilities, API integration, pricing, and how it compares to OpenAI o3 and Claude.

Feb 22

AI Palm Reading with GPT-image-2 — Generate Professional Palmistry Analysis from a Single Photo

Use GPT-image-2 via Crazyrouter API to generate stunning palm reading infographics. Complete code in Python, curl, and Node.js.

May 1

AI API Gateway for Thai Developers: Use GPT, Claude and Gemini with One Key

A practical guide for developers in Thailand who want one OpenAI-compatible endpoint for GPT, Claude and Gemini model calls.

May 22