Voozh

Reliable citations build trust and help readers verify the accuracy of responses. This guide provides practical guidance on how to prepare citable material and instruct the model to format citations effectively, using patterns that are familiar to OpenAI models.

Overview

A citation system has many parts: you decide what can be cited, represent that material clearly, instruct the model how to cite it, and validate the result before it renders to the user.

This guide covers five core elements experienced directly by the model:

Citable units: Define what the model is allowed to cite.
Material representation: Present the source material in a clear, structured format.
Citation format: Specify the exact format the model should use for citations.
Prompt instructions: Tell the model when to cite and how to do it correctly.
Citation parsing: Extract the citations from the model’s response for downstream use.

Choose citable units

Before writing prompts, clearly define what the model can cite. Common options include:

Citable unit	Best used for	Downside	Example
Document	You only need to show which document the answer came from.	Not very precise.	Cite the entire employee handbook when you only need to show which document supports the claim.
Block / chunk	You want a good balance between simplicity and precision.	Still not exact down to the line.	Cite the specific contract paragraph or retrieved chunk that contains the clause.
Line range	You need to show the exact supporting text.	More difficult for the model.	Cite lines `L42-L47` when the user needs to verify the precise passage.

A good citable unit should be:

Consistent: the same source should keep the same ID across runs.
Easy to inspect: a person should be able to read it and understand the surrounding context.
The right size: large enough to make sense, but small enough to stay precise.

For most systems, block-level citations are the best default. They are usually easier for the model than line-level citations and more useful to users than document-level citations.

Represent citable material

The model cannot cite material that has not been presented clearly. Whether material comes from a tool or is injected directly, ensure it has:

Stable Source ID: Consistent identifier like file1 or block1.
Readable Text: Clearly formatted source material.
Metadata (optional): URLs, timestamps, titles, and similar context.

Source IDs vs. locators: A source ID is a stable, model-generated identifier such as block1. A locator is the precise UI-rendered highlight, such as lines L8-L13 or Paragraph 21. In general, the model should emit the source ID, while your system resolves or renders the locator. Mixing the two too early tends to increase formatting errors.

Define citation format

You need to define the citation format that the model will generate. Use a format that is explicit, consistent, and easy for the model to reproduce reliably.

Below is our recommended citation format and the markers we recommend. These citation markers are highly recommended because they closely match the markers our models are trained on. If you choose different marker values, keep the overall citation format as similar as possible.

Piece	What it does	Recommended
`CITATION_START`	Opens the citation marker.	`\ue200`
Citation family	Identifies the citation type. Use `cite` for all supported sources.	`cite`
`CITATION_DELIMITER`	Separates fields inside the marker.	`\ue202`
Source ID	Identifies the cited unit. `turn#` is the turn number. `item#` is the specific file, block, or URL.	`turn0file1`, `turn0block1`, `turn0url1`
Locator (optional)	Narrows the citation to a precise span.	`L8-L13`
`CITATION_STOP`	Closes the citation marker.	`\ue201`

For tool calls, turnN increments once per tool invocation, not once per individual result. Within a single invocation, sources are distinguished by suffixes such as file0, file1, and so on. In a single-response system, all references will be turn0… only if the model makes exactly one tool call before answering. If it makes multiple tool calls, you may instead see references like turn0fileX, turn1fileX, and so on.

Template

Example

If your system does not use locators, omit that field:

Write effective citation instructions

To maintain maximum accuracy, use familiar citation patterns. Custom or unfamiliar formats increase cognitive load on the model, leading to citation errors, especially in:

low reasoning effort, where the model has less budget to recover from formatting mistakes.
high-complexity tasks, where most of the reasoning budget is spent on solving the task itself rather than cleaning up citation syntax.

Below, we recommend a citation format that is close to patterns the model is familiar with. You can use it as-is or adapt it to fit your own system.

If you want to define your own prompt, define:

the exact marker syntax.
where citations go.
when to cite and when not to cite.
how to cite multiple supports.
what formats are forbidden.
what to do when support is missing.

Parse citations

Once the model emits citations, you need to extract them from the response text so you can resolve source IDs, render links, or remove the raw markers before showing the answer to users.

The helper below is designed to be copied directly into your application. It parses single-source citations, multi-source citations, and optional line-range locators while preserving character offsets in the original text.

This example supports line locators only and should be adapted if your system uses a different locator format.

If your source IDs use a different shape, update SOURCE_ID_RE to match your system.

Examples

The examples below show two common citation patterns:

Retrieved tool context, where your tool returns citable material and IDs.
Injected context, where you provide citable blocks directly in the prompt.

Format citations for retrieved tool context

Use this pattern when the model retrieves context through a tool and cites that retrieved context in its answer.

Define citable units

You should choose the citable units based on the precision required for your use case. The examples below show a few possible tool outputs.

The examples below show a few recommended tool output formats. The underlying tool may vary by application, but what matters most is that the output is presented in a clear, stable structure like these examples.

Write prompt instructions

Example output:

Format citations for injected context

Use this pattern when you retrieve or prepare the context ahead of time and inject it directly into the prompt.

Define citable units

For injected context, a common pattern is to wrap source segments in explicit tags with stable reference IDs.

This makes the citable unit explicit and easy for the model to reference.

Write prompt instructions

Example output:

Note: OpenAI-hosted tools such as web search provide automatic inline citations. If you want to use hosted tools instead, see the tools overview, web search guide, and file search guide.

URL: https://developers.openai.com/api/docs/guides/citation-formatting

⇱ Citation Formatting | OpenAI API

Get started

Core concepts

Agents SDK

Tools

Run and scale

Evaluation

Realtime and audio

Specialized models

Going live

Legacy APIs

Resources

Getting Started

Using Codex

Configuration

Administration

Automation

Learn

Releases

Core Concepts

Plan

Build

Deploy

Conversion apps

Guides

Resources

Get started

Guides

File Upload

API

Measurement

Advertiser API

API Reference

Recent

Topics

Topics

Contribute

Categories

Topics

Programs

Events

Overview

Choose citable units

Represent citable material

Define citation format

Template

Example

Write effective citation instructions

Parse citations

Examples

Format citations for retrieved tool context

Define citable units

Write prompt instructions

Format citations for injected context

Define citable units

Write prompt instructions