VOOZH about

URL: https://developers.openai.com/api/docs/guides/citation-formatting

⇱ Citation Formatting | OpenAI API


Search the API docs

Primary navigation

Evaluation

Legacy APIs

Reliable citations build trust and help readers verify the accuracy of responses. This guide provides practical guidance on how to prepare citable material and instruct the model to format citations effectively, using patterns that are familiar to OpenAI models.

Overview

A citation system has many parts: you decide what can be cited, represent that material clearly, instruct the model how to cite it, and validate the result before it renders to the user.

This guide covers five core elements experienced directly by the model:

  1. Citable units: Define what the model is allowed to cite.
  2. Material representation: Present the source material in a clear, structured format.
  3. Citation format: Specify the exact format the model should use for citations.
  4. Prompt instructions: Tell the model when to cite and how to do it correctly.
  5. Citation parsing: Extract the citations from the model’s response for downstream use.

Choose citable units

Before writing prompts, clearly define what the model can cite. Common options include:

Citable unitBest used forDownsideExample
DocumentYou only need to show which document the answer came from.Not very precise.Cite the entire employee handbook when you only need to show which document supports the claim.
Block / chunkYou want a good balance between simplicity and precision.Still not exact down to the line.Cite the specific contract paragraph or retrieved chunk that contains the clause.
Line rangeYou need to show the exact supporting text.More difficult for the model.Cite lines L42-L47 when the user needs to verify the precise passage.

A good citable unit should be:

  • Consistent: the same source should keep the same ID across runs.
  • Easy to inspect: a person should be able to read it and understand the surrounding context.
  • The right size: large enough to make sense, but small enough to stay precise.

For most systems, block-level citations are the best default. They are usually easier for the model than line-level citations and more useful to users than document-level citations.

Represent citable material

The model cannot cite material that has not been presented clearly. Whether material comes from a tool or is injected directly, ensure it has:

  • Stable Source ID: Consistent identifier like file1 or block1.
  • Readable Text: Clearly formatted source material.
  • Metadata (optional): URLs, timestamps, titles, and similar context.

Source IDs vs. locators: A source ID is a stable, model-generated identifier such as block1. A locator is the precise UI-rendered highlight, such as lines L8-L13 or Paragraph 21. In general, the model should emit the source ID, while your system resolves or renders the locator. Mixing the two too early tends to increase formatting errors.

Define citation format

You need to define the citation format that the model will generate. Use a format that is explicit, consistent, and easy for the model to reproduce reliably.

Below is our recommended citation format and the markers we recommend. These citation markers are highly recommended because they closely match the markers our models are trained on. If you choose different marker values, keep the overall citation format as similar as possible.

PieceWhat it doesRecommended
CITATION_STARTOpens the citation marker.\ue200
Citation familyIdentifies the citation type. Use cite for all supported sources.cite
CITATION_DELIMITERSeparates fields inside the marker.\ue202
Source IDIdentifies the cited unit. turn# is the turn number. item# is the specific file, block, or URL.turn0file1, turn0block1, turn0url1
Locator (optional)Narrows the citation to a precise span.L8-L13
CITATION_STOPCloses the citation marker.\ue201

For tool calls, turnN increments once per tool invocation, not once per individual result. Within a single invocation, sources are distinguished by suffixes such as file0, file1, and so on. In a single-response system, all references will be turn0… only if the model makes exactly one tool call before answering. If it makes multiple tool calls, you may instead see references like turn0fileX, turn1fileX, and so on.

Template

Example

If your system does not use locators, omit that field:

Write effective citation instructions

To maintain maximum accuracy, use familiar citation patterns. Custom or unfamiliar formats increase cognitive load on the model, leading to citation errors, especially in:

  • low reasoning effort, where the model has less budget to recover from formatting mistakes.
  • high-complexity tasks, where most of the reasoning budget is spent on solving the task itself rather than cleaning up citation syntax.

Below, we recommend a citation format that is close to patterns the model is familiar with. You can use it as-is or adapt it to fit your own system.

If you want to define your own prompt, define:

  • the exact marker syntax.
  • where citations go.
  • when to cite and when not to cite.
  • how to cite multiple supports.
  • what formats are forbidden.
  • what to do when support is missing.

Parse citations

Once the model emits citations, you need to extract them from the response text so you can resolve source IDs, render links, or remove the raw markers before showing the answer to users.

The helper below is designed to be copied directly into your application. It parses single-source citations, multi-source citations, and optional line-range locators while preserving character offsets in the original text.

This example supports line locators only and should be adapted if your system uses a different locator format.

If your source IDs use a different shape, update SOURCE_ID_RE to match your system.

Examples

The examples below show two common citation patterns:

  • Retrieved tool context, where your tool returns citable material and IDs.
  • Injected context, where you provide citable blocks directly in the prompt.

Format citations for retrieved tool context

Use this pattern when the model retrieves context through a tool and cites that retrieved context in its answer.

Define citable units

You should choose the citable units based on the precision required for your use case. The examples below show a few possible tool outputs.

The examples below show a few recommended tool output formats. The underlying tool may vary by application, but what matters most is that the output is presented in a clear, stable structure like these examples.

Write prompt instructions

Example output:

Format citations for injected context

Use this pattern when you retrieve or prepare the context ahead of time and inject it directly into the prompt.

Define citable units

For injected context, a common pattern is to wrap source segments in explicit tags with stable reference IDs.

This makes the citable unit explicit and easy for the model to reference.

Write prompt instructions

Example output:

Note: OpenAI-hosted tools such as web search provide automatic inline citations. If you want to use hosted tools instead, see the tools overview, web search guide, and file search guide.

Loading docs agent...