Use OpenAI moderation models to detect harmful content in text and images. You can classify standalone inputs with the moderation endpoint or request moderation scores alongside a generated response. Use the results to enforce your application’s policy, such as filtering content, routing a request for review, or intervening with accounts that submit flagged content.
The omni-moderation-latest model accepts text and image inputs. It doesn’t classify audio. The moderation endpoint is free to use, and image files can be up to 20 MB.
Choose a moderation workflow
| Workflow | Use when |
|---|---|
| Moderate generated content | Your application generates text with the Responses API or Chat Completions API and needs moderation signals. |
| Classify standalone inputs | Your application needs to classify text or images without generating a model response. |
| Understand moderation results | Your application needs to interpret flags, categories, scores, or applied input types. |
| Review supported categories | Your application needs to know which harm categories apply to text, images, or both. |
Moderate generated content
When your application needs generated text and moderation scores together, pass a top-level moderation object in the generation request. The API returns moderation scores for the model input and generated output without a separate moderation request.
The model still generates normally. Review the moderation results before you show the output to a user or take downstream actions.
Set moderation.model when you create a response:
The Responses API returns an input moderation_result object at response.moderation.input and an output moderation_result object at response.moderation.output.
Set moderation.model when you create a chat completion:
Chat Completions returns moderation result containers at completion.moderation.input and completion.moderation.output. For a request with one generated choice, read the first input and output result at results[0]. If you request multiple choices, completion.moderation.output.results[i] corresponds to completion.choices[i].
Inline moderation results use the same category fields as a standalone moderation result. Start with flagged for a first-pass decision, then inspect categories and category_scores for logging, routing, audit trails, or human-review queues. A refusal or other safety-aware response can still trigger a flag if it discusses harmful content. Treat moderation scores as signals for your application’s policy, not as an automatic blocking decision.
Check the moderation result type before you read scores if your application needs to handle moderation failures. If a moderation step can’t complete, the corresponding input or output moderation field can contain an error instead of moderation scores.
For tool-calling requests, moderation covers tool-call arguments and tool outputs when they appear in conversation content. It doesn’t cover tool names, tool descriptions, tool schemas, or response-format schemas.
If you stream a generated response, moderation scores arrive after the full generated output is available. They aren’t included with partial output deltas.
Classify standalone inputs
Use the moderation endpoint to classify text or image inputs without generating a model response. The tabs below show how to use the OpenAI libraries and the omni-moderation-latest model:
Understand moderation results
Here’s a full example output for an image from a single frame of a war movie. The model identifies indicators of violence in the image, with a violence category score greater than 0.8.
The JSON response includes fields that describe which categories are present in the input and the model’s confidence in each category.
| Output category | Description |
|---|---|
flagged | Set to |
categories | Contains a dictionary of per-category violation flags. For each category,
the value is |
category_scores | Contains a dictionary of per-category scores. Each score represents the model’s confidence that the input contains content in the category. The value is between 0 and 1, where higher values denote higher confidence. |
category_applied_input_types | Contains the input types that the category score applies to. For example,
if the |
We plan to continuously upgrade the moderation endpoint’s underlying model.
Therefore, custom policies that rely on category_scores may need
recalibration over time.
Review supported categories
The table below describes the content categories that the moderation endpoint can detect and the input types that each category supports.
Categories marked as “Text only” do not support image inputs. If you send only
images (without accompanying text) to the omni-moderation-latest model, it
will return a score of 0 for these unsupported categories. Image files are
limited to 20 MB.
| Category | Description | Inputs |
|---|---|---|
harassment | Content that expresses, incites, or promotes harassing language towards any target. | Text only |
harassment/threatening | Harassment content that also includes violence or serious harm towards any target. | Text only |
hate | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment. | Text only |
hate/threatening | Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. | Text only |
illicit | Content that gives advice or instruction on how to commit illicit acts. A phrase like “how to shoplift” would fit this category. | Text only |
illicit/violent | The same types of content flagged by the | Text only |
self-harm | Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders. | Text and images |
self-harm/intent | Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders. | Text and images |
self-harm/instructions | Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts. | Text and images |
sexual | Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness). | Text and images |
sexual/minors | Sexual content that includes an individual who is under 18 years old. | Text only |
violence | Content that depicts death, violence, or physical injury. | Text and images |
violence/graphic | Content that depicts death, violence, or physical injury in graphic detail. | Text and images |
