๐ Node.js vs Bun vs Go: A Multi-Layer HTTP Benchmark
๐ฏ Premise
I came across a video discussing Bun's new native image support claiming "blazing fast" performance. As a software developer with 4+ years building Node services, I wanted to quantify Bun's actual performance characteristics compared to Node.js in realistic scenarios.
I included Go as a baseline representing compiled, systems-level performance to contextualize the JavaScript runtime results.
โก What this benchmark tests: Raw HTTP throughput serving static JSON responses
โ What this benchmark does NOT test: Database I/O, JSON parsing, real application logic, memory pressure, error handling, or long-tail latency under sustained load
๐ฌ Methodology: From Localhost to Cloud
Localhost benchmarks often produce misleading results because they test memory bus speeds rather than real-world network and system constraints. To uncover performance across different bottleneck layers, I tested three phases:
๐ Test Phases
| Phase | Environment | What It Reveals |
|---|---|---|
| 1๏ธโฃ Localhost | Loopback interface | Pure event loop overhead |
| 2๏ธโฃ LAN over Tailscale | WiFi network | Network I/O constraints |
| 3๏ธโฃ Cloud Datacenter | DigitalOcean droplets | Removes local hardware limits |
โ๏ธ Test Configuration
Workload: GET /json โ {"message": "Hello from [runtime]"}
Hardware:
- ๐ป Local: Windows dev machine, Tenda U10 USB WiFi adapter (WiFi 3)
- โ๏ธ Cloud: DigitalOcean shared droplets (burstable vCPUs), 10 Gbps datacenter network
Load generator: wrk -t2 -c200 -d30s
โ ๏ธ Important: Each test was run once. For production decisions, you'd want 5+ runs with statistical analysis (median, stddev, confidence intervals).
๐ Phase 1: Localhost Baseline
Testing over the loopback interface to measure pure event loop and syscall overhead without network constraints.
๐ Results: Local Memory Performance
| Configuration | Node.js | Bun | Go |
|---|---|---|---|
| 1 CPU Core | ~14,000 RPS | ~28,000 RPS | ~29,000 RPS |
| 4 CPU Cores (Single Process) | ~16,000 RPS | ~30,000 RPS | ~115,000 RPS ๐ |
| 4 CPU Cores (Multi-Process) | ~110,000 RPS | ~170,000 RPS ๐ | N/A |
๐ก Analysis
๐งต Single-threaded bottleneck: Node and Bun's JavaScript execution is single-threaded. Without process clustering, they max out one CPU core even with --cpus="4", leaving 3 cores idle. Go's M:N scheduler automatically utilizes all available cores in a single process.
๐ Multi-process scaling: Once clustered (Node's cluster module, Bun with reusePort: true spawned 4 times), both JavaScript runtimes showed strong scaling. Bun's lighter-weight Zig event loop showed ~55% higher throughput than Node's V8-based implementation.
โ What this tells us: For CPU-bound workloads on the loopback interface, Bun's event loop has measurably lower per-request overhead than Node. Go's native multi-core scheduling eliminates the need for manual clustering.
๐ Phase 2: Network-Constrained Reality (Tailscale over WiFi)
Requests now traverse a physical network: MacBook โ USB WiFi adapter โ Tailscale WireGuard VPN โ Windows dev machine.
๐ Results: Network-Bound I/O (4 Cores, Clustered)
| Metric | Node.js | Bun | Go |
|---|---|---|---|
| Throughput | 7,954 RPS | 12,519 RPS | 12,873 RPS ๐ |
| Avg Latency | 26.79 ms | 16.49 ms | 15.69 ms โก |
| Max Latency | 864.53 ms โ ๏ธ | 163.24 ms | 152.21 ms โ |
| Bandwidth | 1.62 MB/s | 1.76 MB/s | 1.67 MB/s |
๐ก Analysis
๐ Network becomes the bottleneck: All three runtimes collapsed from 30k-170k RPS to 8-13k RPS. The WiFi 3 USB adapter (theoretical max ~54 Mbps, real-world much lower) became the limiting factor.
โ ๏ธ Node's outlier spike: The 864ms max latency suggests either:
- Garbage collection pause under network pressure
- IPC coordination delays between cluster master/workers when packets arrive in bursts
- Should have been investigated with proper GC tuning flags and p99 analysis
๐ What this tells us: This phase primarily measured my WiFi adapter's limitations, not runtime performance. However, it does show that once network I/O becomes the constraint, runtime choice matters less than network hardware quality.
โ๏ธ Phase 3: Cloud Infrastructure (1 Core)
Moved to DigitalOcean droplets to remove local hardware constraints. Target and load generator in same datacenter.
๐ณ Docker Configuration
docker run --rm --cpus="1" -m="512m" -p 3000:3000 [image]
โ ๏ธ Note:
--cpus="1"uses CFS CPU quotas, not core pinning. The container can still migrate between physical cores, introducing cache invalidation. Should have used--cpuset-cpus="0"for true single-core isolation.
๐ Results: Cloud Single-Core
| Metric | Node.js | Go | Bun |
|---|---|---|---|
| Throughput | 11,705 RPS | 13,935 RPS | 25,444 RPS ๐ |
| Avg Latency | 29.13 ms | 22.83 ms | 7.89 ms โก |
| Max Latency | 2,000.00 ms โ ๏ธ | 135.97 ms | 93.27 ms โ |
| Failed Requests | 60 (timeout) ๐ด | 0 โ | 0 โ |
๐ก Analysis
๐ด Node's timeout failures: The 60 failed requests with 2-second max latency strongly suggest GC pauses. This test should have been re-run with Node tuning flags (--max-old-space-size, --optimize-for-size) to determine if this is fundamental or tunable.
๐ Bun's single-core dominance: Nearly double Go's throughput on a single core is impressive, but remember this is for a trivial 40-byte JSON response. Real applications doing actual work may show different patterns.
๐ Phase 4: Cloud Multi-Core (4 Cores)
Full resource allocation with clustering enabled for JavaScript runtimes.
๐ณ Configuration
docker run --rm --cpus="4" -m="512m" -p 3000:3000 [image]
โ๏ธ The Load Generator
# Attacker VM - wrk in Alpine container
docker run --rm alpine sh -c "apk add --no-cache wrk && \
wrk -t2 -c200 -d30s http://159.65.6.89:3000/json"
๐ Results: Cloud 4-Core Maximum Throughput
| Metric | Node.js | Go | Bun |
|---|---|---|---|
| Throughput | 31,025 RPS | 37,617 RPS | 53,446 RPS ๐ |
| Total Requests (30s) | 933,074 | 1,130,171 | 1,605,818 ๐ฏ |
| Avg Latency | 8.62 ms | 5.79 ms | 4.04 ms โก |
| Max Latency | 641.25 ms โ ๏ธ | 218.91 ms | 76.54 ms โ |
| CPU Usage | 400%+ | 340% | 383% |
๐ก CPU Efficiency Deep Dive
The raw CPU % numbers are misleading. What matters is CPU cost per request:
๐ Efficiency Ranking:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bun: 0.0072% CPU per request ๐ฅ โ
โ Go: 0.0090% CPU per request ๐ฅ โ
โ Node: 0.0129% CPU per request ๐ฅ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Calculation:
- Node: 400% / 31k RPS = 0.0129% per request
- Go: 340% / 37.6k RPS = 0.0090% per request
- Bun: 383% / 53.4k RPS = 0.0072% per request
๐ก Key Insight: Bun is genuinely the most efficient, but Node's higher absolute CPU usage just means all workers are busyโwhich is what you want under load.
๐ค Go's "idle" CPU: The 340% (leaving 60% unused) might indicate:
- GOMAXPROCS not set correctly
- Network socket polling leaving CPU headroom
- Or simply more efficient syscall handling
โ๏ธ Code Architecture Comparison
๐จ Critical Difference: The Go Code is Heavily Optimized
The Node and Bun code use default patterns, but the Go implementation uses production micro-optimizations:
Go optimizations applied:
var jsonResponse = []byte(`{"message":"Hello from Go!"}`) // Pre-rendered
w.Write(jsonResponse) // Direct byte write, no JSON encoding
Equivalent fair comparison would be:
// Fair comparison - same as Node/Bun pattern
json.NewEncoder(w).Encode(map[string]string{"message": "Hello from Go!"})
โ ๏ธ Impact: This makes Go ~15-20% slower but tests equivalent functionality. The current benchmark favors Go's implementation.
๐ Bun "Clustering" Isn't Actually Clustering
The Bun code uses reusePort: true in a single process. This enables kernel-level socket load balancing but doesn't spawn multiple processes like Node's cluster module.
For true architectural equivalence:
// This is what would match Node's architecture
import { spawn } from "bun";
for (let i = 0; i < 4; i++) {
spawn(["bun", "server.js"]);
}
๐ก Note: The current test compares single-process Bun vs multi-process Node, which actually makes Bun's performance even more impressive but should be disclosed.
๐ฏ What This Benchmark Actually Tells Us
โ Valid Conclusions
1. โก Bun's event loop has lower per-request overhead than Node
for simple HTTP responses
2. ๐ Bun scales efficiently to multiple cores via kernel-level
socket distribution
3. ๐ฏ Go provides predictable performance with excellent CPU
efficiency
4. ๐ Network hardware matters more than runtime choice once
you hit I/O limits
5. ๐ Node's cluster architecture has measurable IPC overhead
under high load
โ Invalid Conclusions
Speed โ ecosystem maturity, debugger support, APM tooling
This tests static JSON echo; database-heavy apps show different patterns
Developer productivity, ecosystem, and deployment complexity matter
Real apps do parsing, validation, DB queries, business logic
๐ Limitations & What's Missing
โ Not Tested
Click to expand - What this benchmark doesn't cover
- Realistic payloads: 10KB+ JSON parsing and validation
- Database I/O: Connection pooling, query performance
- Memory pressure: Behavior at 80%+ RAM utilization
- Sustained load: 24-hour endurance, memory leaks
- Error handling: Behavior under packet loss, slow clients
- Cold starts: Container spin-up time (critical for serverless)
- Long-tail latency: p95, p99, p99.9 percentiles over hours
๐ง Methodological Improvements Needed
๐ Statistical rigor
5+ runs per config with statistical significance testing๐ฏ Proper CPU pinning
Use--cpuset-cpusinstead of--cpusโ๏ธ GC tuning for Node
Test with optimized V8 flagsโ๏ธ Fair code comparison
Either optimize all three or use stock patterns for all๐ Proper clustering for Bun
Multi-process architecture to match Node
๐ฏ Production Recommendations
Choose based on your actual constraints:
๐ฐ Use Bun if:
โ
You have existing Node.js code and want drop-in performance gains
โ
Your workload is I/O-heavy API routing/proxying
โ
You're comfortable with a newer ecosystem (risk tolerance)
โ ๏ธ You can handle potential edge cases in package compatibility
๐ท Use Go if:
โ
You need predictable resource consumption for Kubernetes limits
โ
Your team values type safety and compile-time checks
โ
You're building infrastructure/platform services
โ
You need maximum efficiency per CPU core
โ
Long-term stability and tooling maturity matter
๐ข Use Node.js if:
โ
You have existing Node infrastructure and expertise
โ
Your bottleneck is database/external services (not event loop)
โ
Ecosystem maturity and package availability are critical
โ
You need battle-tested observability/APM tooling
๐ The Real Takeaway
For a 40-byte JSON echo server, Bun is measurably faster than Node.js.
But real applications aren't JSON echo servers. Your actual bottlenecks are probably:
๐๏ธ Database query time
๐ External API latency
๐งฎ Business logic complexity
๐ก Network infrastructure
Profile your real workload before choosing a runtime based on microbenchmarks.
That said, Bun's performance characteristics are impressive and worth evaluating for I/O-heavy services where event loop overhead matters.
๐ Full Code Listings
Node.js (Clustered)
const cluster = require('cluster');
const http = require('http');
const numCPUs = 4;
if (cluster.isMaster) {
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker) => {
console.log(`Worker ${worker.process.pid} died`);
cluster.fork();
});
} else {
http.createServer((req, res) => {
if (req.method === 'GET' && req.url === '/json') {
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ message: "Hello from Clustered Node!" }));
} else {
res.writeHead(404);
res.end();
}
}).listen(3000);
}
Bun (Single Process with reusePort)
Bun.serve({
port: 3000,
reusePort: true, // Kernel-level socket load balancing
fetch(request) {
const url = new URL(request.url);
if (request.method === 'GET' && url.pathname === '/json') {
return new Response(
JSON.stringify({ message: "Hello from Bun!" }),
{ headers: { 'Content-Type': 'application/json' } }
);
}
return new Response("Not Found", { status: 404 });
},
});
Go (Optimized - Not Fair Comparison)
โ ๏ธ This version pre-renders the response and skips JSON encoding
package main
import (
"fmt"
"net/http"
)
// Pre-rendered response eliminates JSON encoding overhead
var jsonResponse = []byte(`{"message":"Hello from Go!"}`)
func jsonHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
w.WriteHeader(http.StatusMethodNotAllowed)
return
}
w.Header().Set("Content-Type", "application/json")
w.Write(jsonResponse) // Note: Ignoring error - not production-safe
}
func main() {
server := &http.Server{
Addr: ":3000",
Handler: http.HandlerFunc(jsonHandler),
}
fmt.Println("Go server running on port 3000")
server.ListenAndServe()
}
Go (Fair Comparison - Uses JSON Encoding)
โ
This version matches Node/Bun's approach
package main
import (
"encoding/json"
"net/http"
)
type Response struct {
Message string `json:"message"`
}
func jsonHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
w.WriteHeader(http.StatusMethodNotAllowed)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(Response{Message: "Hello from Go!"})
}
func main() {
http.HandleFunc("/json", jsonHandler)
http.ListenAndServe(":3000", nil)
}
๐ Acknowledgments
Thanks to the readers who will inevitably point out additional issues I missed. Benchmarking is hard, and there's always room for improvement.
If you want to reproduce these tests or improve the methodology, feel free to reach out!
Found this useful? Drop a โค๏ธ and let me know what you'd like to see benchmarked next!
For further actions, you may consider blocking this person and/or reporting abuse
