VOOZH about

URL: https://dev.to/uaslimcreate/streaming-ai-responses-in-react-19-building-real-time-claude-integration-without-the-ux-jank-2i4a

⇱ Streaming AI Responses in React 19: Building Real-Time Claude Integration Without the UX Jank - DEV Community


Streaming AI Responses in React 19: Building Real-Time Claude Integration Without the UX Jank

Loading spinners are a lie we tell ourselves. When a user asks your AI agent something, they want to see the thinking happen. They want tokens appearing on screen in real-time, not a blank void for 3 seconds followed by a wall of text.

I spent two weeks building CitizenApp's chat feature before I realized most Claude streaming tutorials are incomplete. They show you how to get tokens flowing, then hand-wave away the hard parts: backpressure handling, component unmounts during active streams, network interruptions mid-response, and cancellation that doesn't leave dangling resources.

This is the post I needed when I started.

The Streaming Problem Most Tutorials Skip

Here's what happens in a naive implementation:

  1. User sends a prompt
  2. FastAPI opens an SSE connection and streams Claude tokens
  3. Browser receives a token every 50-200ms
  4. User navigates away
  5. Component unmounts
  6. Request keeps streaming in the background, consuming memory
  7. CPU spikes because React's trying to update an unmounted component
  8. You get 20 errors in the console

I prefer streaming over SSE (Server-Sent Events) instead of WebSockets for AI responses because:

  • SSE is unidirectional (all we need)
  • Built-in reconnection handling
  • Simpler to debug than WebSocket state machines
  • Naturally pairs with HTTP error codes
  • No connection pool exhaustion at scale

But SSE isn't magic. You need to actively manage the connection lifecycle.

The FastAPI Backend: Backpressure Matters

Let me show you the right way:

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
import anthropic
import asyncio
from typing import AsyncGenerator

app = FastAPI()

@app.post("/api/chat/stream")
async def stream_response(prompt: str) -> StreamingResponse:
 """Stream Claude response with proper resource cleanup."""

 async def generate() -> AsyncGenerator[str, None]:
 client = anthropic.Anthropic()

 try:
 # Use streaming parameter for token-by-token responses
 with client.messages.stream(
 model="claude-3-5-sonnet-20241022",
 max_tokens=1024,
 messages=[
 {"role": "user", "content": prompt}
 ]
 ) as stream:
 # Critical: handle backpressure and cancellation
 for text in stream.text_stream:
 # Yield SSE format
 yield f"data: {text}\n\n"

 # Respect backpressure by yielding control
 await asyncio.sleep(0)

 except asyncio.CancelledError:
 # Client disconnected; cleanup happens automatically
 raise
 except anthropic.APIError as e:
 yield f"data: [ERROR] API Error: {str(e)}\n\n"
 raise
 finally:
 # Anthropic SDK handles cleanup in __exit__
 pass

 return StreamingResponse(
 generate(),
 media_type="text/event-stream",
 headers={
 "Cache-Control": "no-cache",
 "Connection": "keep-alive",
 "X-Accel-Buffering": "no", # Disable nginx buffering
 }
 )

Why await asyncio.sleep(0) is critical: It yields control back to the event loop, allowing FastAPI to check if the client disconnected. Without it, if the user closes the tab, your server doesn't know for several seconds. This is backpressure handling—respecting what the client can consume.

The X-Accel-Buffering: no header prevents intermediate proxies (nginx, CloudFlare) from buffering the stream. You want tokens hitting the browser immediately.

React 19: The AbortController Pattern

In React 19, I avoid useEffect for streaming logic when possible (it's not a side effect container). Instead, I treat streaming as an async operation that a user triggers:

'use client';

import { useState, useRef, useCallback } from 'react';

interface StreamMessage {
 role: 'user' | 'assistant';
 content: string;
}

export function ChatStream() {
 const [messages, setMessages] = useState<StreamMessage[]>([]);
 const [isStreaming, setIsStreaming] = useState(false);
 const abortControllerRef = useRef<AbortController | null>(null);

 const handleStreamResponse = useCallback(async (userPrompt: string) => {
 // Abort any existing stream first
 abortControllerRef.current?.abort();

 const controller = new AbortController();
 abortControllerRef.current = controller;

 setMessages(prev => [
 ...prev,
 { role: 'user', content: userPrompt }
 ]);

 setIsStreaming(true);
 let assistantMessage = '';

 try {
 const response = await fetch('/api/chat/stream', {
 method: 'POST',
 headers: { 'Content-Type': 'application/json' },
 body: JSON.stringify({ prompt: userPrompt }),
 signal: controller.signal, // Pass abort signal
 });

 if (!response.ok) {
 throw new Error(`HTTP ${response.status}`);
 }

 // ReadableStream reader for token-by-token handling
 const reader = response.body?.getReader();
 if (!reader) throw new Error('No response body');

 const decoder = new TextDecoder();

 while (true) {
 const { done, value } = await reader.read();

 if (done) break;

 const chunk = decoder.decode(value, { stream: true });
 const lines = chunk.split('\n');

 for (const line of lines) {
 if (line.startsWith('data: ')) {
 const token = line.slice(6); // Remove "data: "

 if (token.startsWith('[ERROR]')) {
 throw new Error(token);
 }

 assistantMessage += token;

 // Update message in-place while streaming
 setMessages(prev => {
 const newMessages = [...prev];
 const lastMessage = newMessages[newMessages.length - 1];

 if (lastMessage?.role === 'assistant') {
 lastMessage.content = assistantMessage;
 } else {
 newMessages.push({
 role: 'assistant',
 content: assistantMessage
 });
 }

 return newMessages;
 });
 }
 }
 }
 } catch (error) {
 if (error instanceof Error && error.name === 'AbortError') {
 // User cancelled—clean, no error message
 return;
 }

 setMessages(prev => [
 ...prev,
 {
 role: 'assistant',
 content: `Error: ${error instanceof Error ? error.message : 'Unknown error'}`
 }
 ]);
 } finally {
 setIsStreaming(false);
 abortControllerRef.current = null;
 }
 }, []);

 const handleCancel = useCallback(() => {
 abortControllerRef.current?.abort();
 }, []);

 return (
 <div className="flex flex-col gap-4">
 <div className="space-y-3 h-96 overflow-y-auto">
 {messages.map((msg, i) => (
 <div
 key={i}
 className={`p-3 rounded-lg ${
 msg.role === 'user'
 ? 'bg-blue-100 text-right'
 : 'bg-gray-100'
 }`}
 >
 {msg.content}
 </div>
 ))}
 </div>

 <div className="flex gap-2">
 <input
 type="text"
 placeholder="Ask something..."
 onKeyDown={(e) => {
 if (e.key === 'Enter' && !isStreaming) {
 handleStreamResponse(e.currentTarget.value);
 e.currentTarget.value = '';
 }
 }}
 disabled={isStreaming}
 className="flex-1 px-3 py-2 border rounded-lg"
 />
 {isStreaming && (
 <button
 onClick={handleCancel}
 className="px-4 py-2 bg-red-500 text-white rounded-lg"
 >
 Stop
 </button>
 )}
 </div>
 </div>
 );
}

Key patterns here:

  1. AbortController: Every stream gets its own controller. If a new prompt comes in while streaming, we abort the old one first. This prevents message corruption.

  2. Manual ReadableStream reading: Instead of relying on a library, we read chunks and parse SSE format ourselves. This gives us exact control over backpressure—we don't add tokens to state faster than React can render them.

  3. In-place message updates: Rather than creating a new message object per token, we update the existing one. This prevents array thrashing and reduces re-renders.

  4. Graceful AbortError handling: If the user clicks "Stop" or navigates away, AbortError is thrown but it's expected, so we don't show an error message.

Gotcha: The Memory Leak That Bit Me

I initially didn't use AbortController properly. Every stream would complete fully even if the user navigated away. On CitizenApp with long-running analyses, this meant:

  • Multiple concurrent streams running invisibly
  • Memory climbing as response objects accumulated
  • Network requests taking up connection slots

The fix was simple (use AbortController), but it took 48 hours of profiling to realize the problem wasn't in React—it was the browser still pulling data from the network.

One More Thing: Error Recovery

Real production systems have flaky networks. Here's what I added after the first month:


typescript
// Add exponential backoff retry
const maxRetries = 3;
let retryCount = 0;

while (retryCount < maxRetries) {
 try {
 const response = await fetch('/api/chat/