VOOZH about

URL: https://dev.to/justjinoit/multi-cloud-deployment-in-production-cloud-run-railway-oracle-cloud-3-month-report-2pdp

⇱ Multi-Cloud Deployment in Production: Cloud Run, Railway, Oracle Cloud - DEV Community


Multi-Cloud Deployment in Production: Cloud Run, Railway, Oracle Cloud

Published on: 2026-06-06

Reading time: 10 min

Tags: #devops #cloud #fastapi #production

Situation

I deployed 3 FastAPI projects to 3 different clouds. Here's what actually happened (not marketing speak):

contest-agent → Google Cloud Run
ai-insight-curator → Railway
ai-lifelogger → Oracle Cloud Always Free

1. Google Cloud Run: 20+ Deployments Before Discovering the Real Problem

Issue: "Container Failed to Start"

Deployed 20+ times, same error every time:

Build: SUCCESS ✅
Push: SUCCESS ✅
Start: TIMEOUT ❌
Port 8080 binding: TIMEOUT ❌

Root cause: FastAPI startup was blocking port binding with I/O operations

# ❌ Problem code (startup blocks port binding)
@asynccontextmanager
async def lifespan(app: FastAPI):
 await telegram_client.send_message("Starting...") # I/O blocking
 db_check = await db.test_connection() # I/O blocking 
 scheduler.start() # Heavy init
 yield

Cloud Run waits for port binding to complete before health checks. Startup blocking = timeout.

Solution: Lazy Loading

# ✅ Fixed code (startup returns immediately)
_initialized = False

async def lazy_init():
 global _initialized
 if _initialized:
 return
 _initialized = True
 await telegram_client.send_message("Started")
 scheduler.start()

@app.post("/webhook")
async def webhook(request: Request):
 await lazy_init() # Init on first actual request
 ...

Result: Startup 100ms (was 60s+ timeout), port binding immediate, health check passes.

Key Lesson: Start Minimal

Don't deploy a complex system all at once. Lessons learned:

# Phase 1: Just "/" endpoint
@app.get("/")
async def root():
 return {"status": "ok"}
# → Deploy, test, pass ✅

# Phase 2: Add health check
@app.get("/health")
async def health():
 return {"status": "healthy"}
# → Deploy, test, pass ✅

# Phase 3-N: Gradually add features
# Each phase = one deployment test

2. Railway: The "Simple" Illusion

Advantages

  • Git push → auto-deploy (very fast)
  • PostgreSQL, Redis built-in
  • Intuitive dashboard

Reality Check

Cost surprises:

Expected: $10/month
Actual: $25/month (250% overage)

Reason:
- 1 vCPU + 512MB RAM always running
- No cold start = memory always consumed
- Bandwidth costs added up

Memory leak detection is hard:

Hour 1: 150MB ✅
Hour 2: 180MB
Hour 3: 220MB
Hour 4: 260MB (OOM incoming)

Cause: RSS feed crawler not releasing memory

Auto-deploy is a double-edged sword:

  • Con: Changes go live without testing
  • Con: Need fast rollback procedure

How I Actually Operate It

# Before pushing to main:
pytest # Run tests
pylint # Lint check
docker build && docker run # Local test

# Only push after passing:
git push origin main # Auto-deploys

3. Oracle Cloud Always Free: Free but Demanding

Advantages

  • Completely free (4 CPU, 24GB RAM, 200GB storage)
  • No limits
  • Full SSH control

Real Problems

Problem #1: 1GB instance, pip install fails

MemoryError during pip install

Reason: 1GB RAM instance can't handle 
all packages at once

Solution:

# Add swap
sudo fallocate -l 8G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Or: Install only essentials
pip install --no-cache-dir anthropic supabase python-telegram-bot

Problem #2: Docker vs Local Mismatch

Local: anthropic==0.40.0 (already installed)
Docker: Fresh install reads requirements.txt
 - anthropic==0.40.0
 - langchain-anthropic needs anthropic>=0.41.0
 → pip can't resolve

Solution: Remove version pins, let pip resolve

DON'T: anthropic==0.40.0, supabase==2.0.0, ...
DO: anthropic, supabase (let pip figure it out)

Problem #3: SSH Deployment Needs Automation

# Manual (every time):
ssh oracle@your-ip
cd /opt/ai-lifelogger
git pull && systemctl restart

# Better (automated via GitHub Actions):
ssh -i $key oracle@$ip "cd /opt && git pull && systemctl restart"

Performance Comparison (3-Month Data)

Metric Cloud Run Railway Oracle
Deploy time 2-3 min 30 sec 5 min
Cold start 3-5 sec 0 sec <1 sec
Monthly cost $15 $25 $0
CPU limit 2 cores 1 core 4 cores
RAM limit 2GB 512MB 24GB
Stability ✅ Solid ⚠️ Memory issues ✅ Solid

Practical Advice

1. Start Minimal, Add Gradually

  • Deploy "/" endpoint first
  • Test, pass, add next feature
  • Repeat

2. Always Test Locally

docker build -t myapp .
docker run -p 8080:8080 myapp

3. Choose Based on Use Case

  • High traffic: Cloud Run (autoscales)
  • Medium traffic: Railway (simple)
  • Low traffic: Oracle (free)

4. Monitoring is Non-Negotiable

Cloud Run: GCP Logs + Cloud Monitoring
Railway: Built-in dashboard (limited)
Oracle: SSH → journalctl + tail -f

What I Learned

There's no "perfect" platform.

  • Cloud Run: startup timeout (solvable with lazy loading)
  • Railway: memory leaks (code issue, not platform)
  • Oracle: operational overhead (worth it for free tier)

The real skill: Understanding each platform's constraints and designing around them.

The 20+ Cloud Run deployment failures? They taught me more than 10 successful deployments would have.

Final Deployment Architecture (June 7, 2026)

Production Status

🦅 Oracle Cloud (Always Free Tier)
├─ ai-lifelogger (port 8000)
│ ├─ FastAPI + APScheduler
│ ├─ Daily summaries: 05:00 KST
│ ├─ Weekly reviews: Sunday 08:00 KST
│ └─ Memory: 111MB / 954MB
│
└─ ai-insight-curator (port 8001)
 ├─ FastAPI + Telegram Bot
 ├─ RSS collection: Daily 06:00 KST
 ├─ Auto-summarization (Claude/Gemini/Groq fallback)
 └─ Memory: 22MB / 954MB

🌐 Vercel (Free Hosting)
└─ Curator Web Dashboard
 ├─ React + Vite frontend
 ├─ Article search & filtering
 ├─ Image downloads
 └─ https://curator-web-ui.vercel.app

📊 Total Memory: 537MB / 954MB (56% usage, 44% available)

What Changed

Initial Plan:

contest-agent → Cloud Run ❌ (dependency conflicts)
ai-insight-curator → Railway ❌ (over-engineered)
ai-lifelogger → Oracle Cloud ✅

Actual Production:

ai-lifelogger → Oracle Cloud ✅ (running)
ai-insight-curator → Oracle Cloud ✅ (1 instance = better)
Curator Web UI → Vercel ✅ (new, auto-deployed)

Key Insight: Single server + Web UI > Multi-cloud complexity

Performance Metrics

API Response Times:
- Lifelogger /health: < 50ms ✅
- Curator /api/v1/articles: < 100ms ✅
- Curator /api/v1/insights: < 100ms ✅

System Health:
- Memory: 537MB (56%) - 417MB free for scaling
- Availability: 99.9%
- Uptime: Continuous (Always Free tier)

Cost Analysis (Final)

Platform Cost Status
Oracle Cloud $0/month ✅ Always Free
Vercel $0/month ✅ Free tier
Supabase DB $0/month ✅ Free tier
Claude API Needs reset* ⚠️ Using Gemini/Groq backup
TOTAL $0/month Forever Free

*Anthropic tokens exhausted → fallback to Gemini/Groq working

Lessons Learned

Multi-cloud Isn't Always Better

  • Cloud Run: Good for high-traffic APIs
  • Railway: Convenient but expensive
  • Oracle: Best for low-traffic, cost-sensitive projects

Single Server Wins Here

  • 2 concurrent FastAPI services
  • Database included (PostgreSQL via Supabase)
  • Web dashboard on separate CDN (Vercel)
  • Total cost: $0

Design Around Constraints

  • Memory: 954MB available → deployed with 537MB usage
  • Can still run 300MB+ additional services
  • Monitoring via SSH (not ideal, but works)

Conclusion

Don't chase multi-cloud complexity.

The optimal deployment turned out to be:

  • 1 Oracle Cloud instance (FastAPI services)
  • 1 CDN (Vercel for web)
  • 1 Database (Supabase)
  • Everything free

Cost: $0/month ✅
Reliability: 99.9% ✅
Maintainability: Simple ✅

Sometimes simpler is better.