VOOZH about

URL: https://developers.openai.com/api/docs/guides/streaming-responses

⇱ Streaming API responses | OpenAI API


Search the API docs

Primary navigation

Evaluation

Legacy APIs

By default, when you make a request to the OpenAI API, we generate the model’s entire output before sending it back in a single HTTP response. When generating long outputs, waiting for a response can take time. Streaming responses lets you start printing or processing the beginning of the model’s output while it continues generating the full response.

This guide focuses on HTTP streaming (stream=true) over server-sent events (SSE). For persistent WebSocket transport with incremental inputs via previous_response_id, see the Responses API WebSocket mode.

Enable streaming

To start streaming responses, set stream=True in your request to the Responses endpoint:

The Responses API uses semantic events for streaming. Each event is typed with a predefined schema, so you can listen for events you care about.

For a full list of event types, see the API reference for streaming. Here are a few examples:

Streaming Chat Completions is fairly straightforward. However, we recommend using the Responses API for streaming, as we designed it with streaming in mind. The Responses API uses semantic events for streaming and is type-safe.

Stream a chat completion

To stream completions, set stream=True when calling the Chat Completions or legacy Completions endpoints. This returns an object that streams back the response as data-only server-sent events.

The response is sent back incrementally in chunks with an event stream. You can iterate over the event stream with a for loop, like this:

Read the responses

If you’re using our SDK, every event is a typed instance. You can also identity individual events using the type property of the event.

Some key lifecycle events are emitted only once, while others are emitted multiple times as the response is generated. Common events to listen for when streaming text are:

For a full list of events you can listen for, see the API reference for streaming.

When you stream a chat completion, the responses has a delta field rather than a message field. The delta field can hold a role token, content token, or nothing.

To stream only the text response of your chat completion, your code would like this:

Advanced use cases

For more advanced use cases, like streaming tool calls, check out the following dedicated guides:

Moderation risk

Note that streaming the model’s output in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. This may have implications for approved usage.

If you request moderation scores with a generation request, the scores arrive after the full generated output is available. They aren’t included with partial output deltas.

Loading docs agent...