DeepSeek API Integration with React and Next.js

👁 Matt Mickiewicz

Matt Mickiewicz

Published in

AI·JavaScript·APIs·

March 29, 2026

Share this article

👁 DeepSeek API Integration with React and Next.js

SitePoint Premium

Stay Relevant and Grow Your Career in Tech

Premium Results
Publish articles on SitePoint
Daily curated jobs
Learning Paths
Discounts to dev tools

Start Free Trial

7 Day Free Trial. Cancel Anytime.

DeepSeek has emerged as a serious contender in the large language model space, offering open-weight models that deliver strong performance at a fraction of the cost of comparable alternatives. For developers building AI-powered web applications, DeepSeek API integration with React and Next.js provides a production-viable path that uses an OpenAI-compatible endpoint structure, meaning existing tooling and SDK knowledge transfers directly. This tutorial walks through building a complete Next.js application with a React chat interface powered by the DeepSeek API, covering everything from project scaffolding to streaming responses to production deployment.

How to Integrate the DeepSeek API with React and Next.js

Scaffold a Next.js App Router project with TypeScript and install the OpenAI SDK.
Store your DeepSeek API key in .env.local and validate it at startup.
Create a shared DeepSeek client utility that points the OpenAI SDK at the DeepSeek base URL.
Build a Next.js Route Handler at /api/chat with streaming, input validation, and error handling.
Implement a React chat component that consumes the streaming response via ReadableStream and TextDecoder.
Add system prompts, context management, and token usage logging for cost control.
Deploy to Vercel with environment variables, rate limiting, and production streaming verification.

The target build is a Next.js App Router application that sends user messages to DeepSeek's chat completions endpoint, streams tokens back in real time, and renders them in a responsive chat UI. Prerequisites include Node.js 18 or later, working knowledge of React and Next.js fundamentals, and a DeepSeek API key obtained from the DeepSeek Platform. DeepSeek's API follows the same request and response conventions as OpenAI's Chat Completions API, which means the official openai Node.js SDK works as a client with minimal configuration changes.

Understanding the DeepSeek API

API Architecture and OpenAI Compatibility

DeepSeek's chat completions endpoint mirrors OpenAI's API structure almost exactly. The base URL is https://api.deepseek.com, and requests go to the /chat/completions path. Two primary models are available: deepseek-chat and deepseek-reasoner. At the time of writing, deepseek-chat maps to DeepSeek-V3 and deepseek-reasoner maps to DeepSeek-R1. Verify current model aliases at the DeepSeek API documentation, as mappings may change with new releases.

The request body accepts the same parameters developers are accustomed to from OpenAI: messages, temperature, max_tokens, stream, and top_p. Where DeepSeek differs is pricing and rate limits, not API shape. Per-token costs run significantly below OpenAI's equivalents (check DeepSeek's pricing page for current rates), making it attractive for high-volume applications where cost control matters.

Authentication and Rate Limits

Authenticate requests with a Bearer token in the Authorization header, identical to OpenAI's scheme. Generate API keys from the DeepSeek Platform dashboard. Your pricing tier determines your rate limits; consult the current documentation for your plan's requests-per-minute and tokens-per-minute caps.

One non-negotiable rule: API calls must happen server-side. Exposing a DeepSeek API key in client-side JavaScript means anyone can inspect network requests, extract the key, and run up charges against the account.

Next.js Route Handlers solve this cleanly by keeping all API communication on the server.

Project Setup and Configuration

Initializing the Next.js Project

Scaffold a new Next.js project using the App Router and install the OpenAI SDK:

npx create-next-app@latest deepseek-chat --typescript --app --tailwind --eslint
cd deepseek-chat
npm install openai@^4.28.0

This creates a Next.js 14+ project with TypeScript, Tailwind CSS for styling, and ESLint. The openai package is the only runtime dependency needed for DeepSeek communication. This tutorial targets the openai SDK v4.x. Run npm list openai to confirm the installed version is 4.28.0 or later.

Environment Configuration

Create a .env.local file in the project root:

DEEPSEEK_API_KEY=sk-your-deepseek-api-key-here
DEEPSEEK_BASE_URL=https://api.deepseek.com

Next.js automatically loads .env.local on the server side. Variables without the NEXT_PUBLIC_ prefix are never bundled into client code, which is exactly the behavior needed for secret management. Verify that .env.local is listed in your .gitignore before committing to prevent accidental secret exposure (create-next-app adds it by default, but existing projects may not):

grep ".env.local" .gitignore

The reason for using the OpenAI SDK rather than raw fetch calls is straightforward: the SDK handles authentication headers, request serialization, streaming iteration, error typing, and retry logic. Since DeepSeek's API is OpenAI-compatible, the SDK works without modification beyond pointing it at a different base URL.

Creating the DeepSeek Client Utility

Create a shared client instance at lib/deepseek.ts:

import OpenAI from "openai";
const ALLOWED_BASE_URL = "https://api.deepseek.com";
function getValidatedBaseURL(): string {
 const url = process.env.DEEPSEEK_BASE_URL ?? ALLOWED_BASE_URL;
 try {
 const parsed = new URL(url);
 if (parsed.protocol !== "https:") {
 throw new Error(`DEEPSEEK_BASE_URL must use HTTPS, got: ${url}`);
 }
 // Allowlist check — remove or expand for multi-provider setups
 if (!url.startsWith(ALLOWED_BASE_URL)) {
 throw new Error(`DEEPSEEK_BASE_URL not in allowlist: ${url}`);
 }
 return url;
 } catch (e) {
 throw new Error(`Invalid DEEPSEEK_BASE_URL: ${url} — ${String(e)}`);
 }
}
if (!process.env.DEEPSEEK_API_KEY) {
 throw new Error(
 "Missing required environment variable: DEEPSEEK_API_KEY"
 );
}
const deepseek = new OpenAI({
 apiKey: process.env.DEEPSEEK_API_KEY,
 baseURL: getValidatedBaseURL(),
});
export default deepseek;

This module instantiates the OpenAI client with DeepSeek's credentials and base URL. The startup guard ensures the application fails fast with a clear error message if the API key is missing, rather than producing cryptic authentication failures at request time. The base URL is validated against an HTTPS allowlist to prevent misconfiguration from becoming a server-side request forgery vector. Importing this module in any server-side file provides a ready-to-use client. Because this file references process.env variables without the NEXT_PUBLIC_ prefix, it can only run on the server, which enforces the security boundary by design.

Confirm tsconfig.json contains "paths": { "@/*": ["./*"] } before using the @/ import alias. create-next-app adds this by default; existing projects may require manual configuration.

API Route Handler

Creating a Next.js Route Handler

Create the file app/api/chat/route.ts with a basic non-streaming implementation:

import { NextRequest, NextResponse } from "next/server";
import deepseek from "@/lib/deepseek";
export async function POST(req: NextRequest) {
 const { messages } = await req.json();
 const completion = await deepseek.chat.completions.create({
 model: "deepseek-chat",
 messages,
 });
 return NextResponse.json({
 content: completion.choices[0].message.content,
 usage: completion.usage,
 });
}

The route accepts a POST request containing a messages array, formatted identically to the OpenAI Chat Completions schema (each message has a role and content field). The response includes the assistant's reply and token usage metadata.

Adding Streaming Support

Without streaming, the user waits for the entire completion before seeing any output. For conversational interfaces, streaming tokens as they generate means users see the first token within hundreds of milliseconds instead of waiting seconds for the full response. Update the route handler to support streaming:

import { NextRequest } from "next/server";
import deepseek from "@/lib/deepseek";
export async function POST(req: NextRequest) {
 const { messages } = await req.json();
 const stream = await deepseek.chat.completions.create({
 model: "deepseek-chat",
 messages,
 stream: true,
 });
 const encoder = new TextEncoder();
 const readable = new ReadableStream({
 async start(controller) {
 try {
 for await (const chunk of stream) {
 const content = chunk.choices[0]?.delta?.content;
 if (content) {
 controller.enqueue(encoder.encode(content));
 }
 }
 controller.close();
 } catch (err) {
 // Encode a sentinel error token the client can detect
 const errorPayload = JSON.stringify({
 __error: true,
 message: "Stream interrupted by server error",
 });
 controller.enqueue(
 encoder.encode(`[STREAM_ERROR:${errorPayload}]`)
 );
 controller.close();
 console.error("Stream error:", err);
 }
 },
 });
 return new Response(readable, {
 headers: {
 "Content-Type": "text/plain; charset=utf-8",
 "Cache-Control": "no-cache",
 },
 });
}

When stream: true is set, the SDK returns an async iterator. Each chunk contains a delta object with partial content. The TextEncoder converts each string fragment into bytes for the ReadableStream, which pipes tokens to the client as they arrive. The Cache-Control: no-cache header prevents intermediate proxies from buffering the stream. The try/catch around the iteration loop ensures that if the DeepSeek API throws mid-stream, an error sentinel is written to the stream so the client can detect the failure, even though the HTTP response status is already 200 OK.

Input Validation and Error Handling

Production routes need structured error handling. Wrap the logic in a try/catch, validate individual message objects, and handle error status codes safely:

import { NextRequest, NextResponse } from "next/server";
import { APIError } from "openai";
import deepseek from "@/lib/deepseek";
const MAX_MESSAGES = 100;
const MAX_CONTENT_LENGTH = 32_000; // characters
const ALLOWED_ROLES = new Set(["user", "assistant", "system"]);
interface ChatMessage {
 role: string;
 content: string;
}
function validateMessages(messages: unknown): ChatMessage[] {
 if (!Array.isArray(messages) || messages.length === 0) {
 throw new RangeError("messages must be a non-empty array");
 }
 if (messages.length > MAX_MESSAGES) {
 throw new RangeError(`messages array exceeds limit of ${MAX_MESSAGES}`);
 }
 for (const msg of messages) {
 if (typeof msg !== "object" || msg === null) {
 throw new TypeError("Each message must be an object");
 }
 const { role, content } = msg as Record<string, unknown>;
 if (typeof role !== "string" || !ALLOWED_ROLES.has(role)) {
 throw new TypeError(`Invalid role: ${String(role)}`);
 }
 if (typeof content !== "string") {
 throw new TypeError("Message content must be a string");
 }
 if (content.length > MAX_CONTENT_LENGTH) {
 throw new RangeError(
 `Message content exceeds ${MAX_CONTENT_LENGTH} characters`
 );
 }
 }
 return messages as ChatMessage[];
}
export async function POST(req: NextRequest) {
 try {
 const body = await req.json();
 const messages = validateMessages(body?.messages);
 const SYSTEM_PROMPT = {
 role: "system" as const,
 content:
 "You are a helpful technical assistant. Be concise and accurate.",
 };
 const MAX_CONTEXT_MESSAGES = 20;
 const contextMessages = [
 SYSTEM_PROMPT,
 ...messages.slice(-MAX_CONTEXT_MESSAGES),
 ];
 const stream = await deepseek.chat.completions.create({
 model: "deepseek-chat",
 messages: contextMessages,
 stream: true,
 max_tokens: 2048,
 stream_options: { include_usage: true },
 });
 const encoder = new TextEncoder();
 const readable = new ReadableStream({
 async start(controller) {
 try {
 for await (const chunk of stream) {
 const content = chunk.choices[0]?.delta?.content;
 if (content) {
 controller.enqueue(encoder.encode(content));
 }
 // Log usage from the final chunk
 if (chunk.usage) {
 console.info("deepseek_usage", {
 prompt_tokens: chunk.usage.prompt_tokens,
 completion_tokens: chunk.usage.completion_tokens,
 });
 }
 }
 controller.close();
 } catch (err) {
 const errorPayload = JSON.stringify({
 __error: true,
 message: "Stream interrupted by server error",
 });
 controller.enqueue(
 encoder.encode(`[STREAM_ERROR:${errorPayload}]`)
 );
 controller.close();
 console.error("Stream error:", err);
 }
 },
 });
 return new Response(readable, {
 headers: {
 "Content-Type": "text/plain; charset=utf-8",
 "Cache-Control": "no-cache",
 },
 });
 } catch (error) {
 if (error instanceof RangeError || error instanceof TypeError) {
 return NextResponse.json({ error: error.message }, { status: 400 });
 }
 if (error instanceof APIError) {
 const status =
 typeof error.status === "number" ? error.status : 500;
 const retryAfter =
 (error.headers as Record<string, string> | undefined)?.[
 "retry-after"
 ] ?? null;
 const message =
 status === 401
 ? "Authentication failed: check your API key"
 : status === 429
 ? "Rate limit exceeded: try again later"
 : "DeepSeek API error";
 return NextResponse.json(
 { error: message, ...(retryAfter ? { retryAfter } : {}) },
 { status }
 );
 }
 console.error("Unhandled route error", error);
 return NextResponse.json(
 { error: "Internal server error" },
 { status: 500 }
 );
 }
}

The validateMessages function checks that each message has a valid role from an allowlist and a content field that is a string within a reasonable length limit. This prevents malformed objects, non-string content, and oversized payloads from reaching the DeepSeek API. The max_tokens: 2048 cap prevents runaway completions from inflating costs. For deepseek-reasoner, consider increasing max_tokens or omitting the cap, as reasoning traces are significantly longer than standard chat completions. The OpenAI SDK throws APIError instances that include HTTP status codes; the handler safely falls back to 500 if error.status is undefined, preventing a runtime crash. This makes it possible to return appropriate 400, 401, 429, or 500 responses. For graceful degradation, the client can check the response status and display user-friendly messages rather than raw error text.

The React Chat Interface

Chat Component Architecture

The chat UI splits into three client components: Chat manages state and API communication, MessageList renders the conversation, and MessageInput captures user input. All three require the "use client" directive since they depend on React hooks and browser APIs.

Implementing the Chat Container

Create components/Chat.tsx:

"use client";
import { useState, useRef, useEffect } from "react";
import MessageList from "./MessageList";
import MessageInput from "./MessageInput";
interface Message {
 id: string;
 role: "user" | "assistant";
 content: string;
}
export default function Chat() {
 const [messages, setMessages] = useState<Message[]>([]);
 const [isLoading, setIsLoading] = useState(false);
 const [error, setError] = useState<string | null>(null);
 const scrollRef = useRef<HTMLDivElement>(null);
 const abortControllerRef = useRef<AbortController | null>(null);
 useEffect(() => {
 return () => {
 // Abort in-flight request on unmount
 abortControllerRef.current?.abort();
 };
 }, []);
 useEffect(() => {
 scrollRef.current?.scrollIntoView({ behavior: "smooth" });
 }, [messages]);
 async function handleSubmit(input: string) {
 abortControllerRef.current?.abort();
 abortControllerRef.current = new AbortController();
 setError(null);
 const userMessage: Message = {
 id: crypto.randomUUID(),
 role: "user",
 content: input,
 };
 const updatedMessages = [...messages, userMessage];
 setMessages(updatedMessages);
 setIsLoading(true);
 try {
 const response = await fetch("/api/chat", {
 method: "POST",
 headers: { "Content-Type": "application/json" },
 body: JSON.stringify({
 messages: updatedMessages.map(({ role, content }) => ({
 role,
 content,
 })),
 }),
 signal: abortControllerRef.current.signal,
 });
 if (!response.ok || !response.body) {
 const data = await response.json().catch(() => ({}));
 setError(
 (data as { error?: string }).error ??
 "Request failed. Please retry."
 );
 return;
 }
 const reader = response.body.getReader();
 const decoder = new TextDecoder();
 let assistantContent = "";
 const assistantId = crypto.randomUUID();
 setMessages((prev) => [
 ...prev,
 { id: assistantId, role: "assistant", content: "" },
 ]);
 while (true) {
 const { done, value } = await reader.read();
 if (done) {
 // Flush remaining bytes in the TextDecoder buffer
 const remaining = decoder.decode();
 if (remaining) {
 assistantContent += remaining;
 setMessages((prev) =>
 prev.map((m) =>
 m.id === assistantId
 ? { ...m, content: assistantContent }
 : m
 )
 );
 }
 break;
 }
 assistantContent += decoder.decode(value, { stream: true });
 setMessages((prev) =>
 prev.map((m) =>
 m.id === assistantId
 ? { ...m, content: assistantContent }
 : m
 )
 );
 }
 } catch (err) {
 if (err instanceof DOMException && err.name === "AbortError") {
 return;
 }
 setError("An unexpected error occurred. Please try again.");
 console.error("Chat error:", err);
 } finally {
 setIsLoading(false);
 }
 }
 return (
 <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
 <div className="flex-1 overflow-y-auto">
 <MessageList messages={messages} />
 <div ref={scrollRef} />
 </div>
 {error && (
 <p className="text-red-600 text-sm py-2" role="alert">
 {error}
 </p>
 )}
 <MessageInput onSubmit={handleSubmit} isLoading={isLoading} />
 </div>
 );
}

The getReader() and TextDecoder combination consumes the streaming response chunk by chunk. Each decoded fragment is appended to the running assistantContent string, and setMessages updates the message matching the assistant's unique ID to trigger a re-render. Using prev.map() with an ID match instead of updating the last array index ensures the correct message is always updated, even if concurrent state batches reorder the array. When the stream finishes, decoder.decode() is called without arguments to flush any remaining bytes buffered by the decoder in streaming mode, preventing truncated multi-byte UTF-8 characters. The scrollRef keeps the viewport pinned to the latest content. The AbortController prevents orphaned requests when a user sends a new message before the previous stream completes, and a cleanup effect aborts any in-flight request when the component unmounts to prevent setState calls on an unmounted component. The try/catch handles AbortError gracefully and surfaces all other errors to the user via an error state, while isLoading is always reset via the finally block.

Message Display and Input Components

Create components/MessageList.tsx:

"use client";
interface Message {
 id: string;
 role: "user" | "assistant";
 content: string;
}
export default function MessageList({ messages }: { messages: Message[] }) {
 return (
 <div className="space-y-4">
 {messages.map((msg) => (
 <div
 key={msg.id}
 className={`p-3 rounded-lg ${
 msg.role === "user"
 ? "bg-blue-100 ml-auto max-w-md text-right"
 : "bg-gray-100 mr-auto max-w-md"
 }`}
 >
 <p className="text-sm font-semibold mb-1">
 {msg.role === "user" ? "You" : "Assistant"}
 </p>
 <p className="whitespace-pre-wrap">{msg.content}</p>
 </div>
 ))}
 </div>
 );
}

Create components/MessageInput.tsx:

"use client";
import { useState } from "react";
interface Props {
 onSubmit: (input: string) => void;
 isLoading: boolean;
}
export default function MessageInput({ onSubmit, isLoading }: Props) {
 const [input, setInput] = useState("");
 function handleSubmit(e: React.FormEvent) {
 e.preventDefault();
 if (!input.trim() || isLoading) return;
 onSubmit(input.trim());
 setInput("");
 }
 return (
 <form onSubmit={handleSubmit} className="flex gap-2 pt-4">
 <input
 type="text"
 value={input}
 onChange={(e) => setInput(e.target.value)}
 placeholder="Type a message..."
 className="flex-1 border rounded-lg px-4 py-2"
 disabled={isLoading}
 />
 <button
 type="submit"
 disabled={isLoading}
 className="bg-blue-500 text-white px-4 py-2 rounded-lg disabled:opacity-50"
 >
 {isLoading ? "Sending..." : "Send"}
 </button>
 </form>
 );
}

For richer rendering of assistant responses that contain markdown, code blocks, or lists, add react-markdown to MessageList as a drop-in replacement for the plain <p> tag. Install it with npm install react-markdown.

Integrating into a Next.js Page

Update app/page.tsx:

import Chat from "@/components/Chat";
export default function Home() {
 return (
 <main className="min-h-screen bg-white">
 <Chat />
 </main>
 );
}

The page component itself remains a server component. The Chat component, marked with "use client", handles all interactive behavior on the client side.

Advanced Patterns and Optimization

System Prompts and Conversation Context

The production route handler above already prepends a system message and manages context length by slicing to the most recent 20 messages. This keeps per-request token costs predictable. For finer control, tiktoken or gpt-tokenizer can count tokens to trim precisely to the model's context limit rather than relying on a fixed message count.

Response Caching and Cost Optimization

For deterministic queries where the same input should produce the same output (documentation lookups, fixed transformations), setting temperature: 0 makes responses more consistent and cache-friendly. LLM outputs are not strictly deterministic even at temperature 0; hardware-level floating-point variation causes minor differences between runs. DeepSeek also offers a context caching feature that reduces costs for repeated prompt prefixes. When a sequence of messages shares the same prefix across requests, cached input tokens are billed at a reduced rate. This is particularly useful for applications that include lengthy system prompts or few-shot examples with every request.

DeepSeek also offers a context caching feature that reduces costs for repeated prompt prefixes. When a sequence of messages shares the same prefix across requests, cached input tokens are billed at a reduced rate.

Production Deployment Strategies

Security Best Practices

Beyond server-side-only API calls, production deployments should rate-limit the /api/chat route to prevent abuse. Libraries like @upstash/ratelimit (with Vercel KV) or Vercel's native rate limiting, both of which have documented App Router compatibility, can throttle requests per IP. A reasonable starting default is 10 requests per minute per IP. Input sanitization should strip or reject excessively long messages, and the max_tokens parameter should always be set to cap completion length and control costs.

Deploying to Vercel

Add environment variables (DEEPSEEK_API_KEY, DEEPSEEK_BASE_URL) through the Vercel dashboard under Project Settings > Environment Variables. For the route handler runtime, the default Node.js runtime supports streaming on Vercel, but function timeouts apply (10s on Hobby, 60s on Pro). For long-running streams, add export const maxDuration = 60 to the route file and confirm your plan's limit. The Edge Runtime is an option for lower latency at the edge, but the OpenAI SDK's full feature set is better tested against the Node.js runtime. Streaming compatibility on Vercel works out of the box with the ReadableStream pattern shown above.

Monitoring and Observability

The non-streaming response includes a usage object with prompt_tokens and completion_tokens fields. Logging these values per request enables cost tracking and anomaly detection. The production route handler above passes stream_options: { include_usage: true } in the streaming API call, so the final stream chunk contains a populated usage field with prompt_tokens and completion_tokens, which is logged server-side. Setting up cost alerts in the DeepSeek dashboard prevents unexpected billing spikes.

Logging these values per request enables cost tracking and anomaly detection. Setting up cost alerts in the DeepSeek dashboard prevents unexpected billing spikes.

Implementation Checklist

☐ DeepSeek API key obtained and stored in .env.local
☐ .env.local verified in .gitignore
☐ OpenAI SDK v4.28+ installed and configured with DeepSeek base URL
☐ tsconfig.json path alias @/* confirmed
☐ API key startup guard and base URL SSRF validation in lib/deepseek.ts
☐ Next.js API route created with streaming support, input validation, and error handling
☐ Error handling using instanceof APIError with safe status fallback for 400, 401, 429, and 500 responses
☐ React chat component with real-time stream consumption, error display, and unmount cleanup
☐ Stable unique keys on message list items
☐ System prompt and conversation context management
☐ Request cancellation with AbortController and AbortError handling
☐ Rate limiting on API route
☐ Input sanitization and max token caps
☐ Token usage logging and cost monitoring
☐ Environment variables configured in deployment platform
☐ Streaming verified in production environment

What Comes Next

The most immediate extension is multi-model switching between deepseek-chat and deepseek-reasoner. Add a model selector to the MessageInput component, pass the selected model name through the request body, and validate it server-side against an allowlist of supported models. Beyond that, persistent chat history with a database backend (Postgres via Prisma, or a simpler key-value store like Vercel KV) and per-user authentication open the door to usage tracking and conversation management. The DeepSeek API documentation covers additional capabilities including fill-in-the-middle (FIM) completions for code completion scenarios and function calling for tool-use workflows.

👁 Matt Mickiewicz
Matt Mickiewicz

Matt is the co-founder of SitePoint, 99designs and Flippa. He lives in Vancouver, Canada.