Voozh

Use OpenAI moderation models to detect harmful content in text and images. You can classify standalone inputs with the moderation endpoint or request moderation scores alongside a generated response. Use the results to enforce your application’s policy, such as filtering content, routing a request for review, or intervening with accounts that submit flagged content.

The omni-moderation-latest model accepts text and image inputs. It doesn’t classify audio. The moderation endpoint is free to use, and image files can be up to 20 MB.

Choose a moderation workflow

Workflow	Use when
Moderate generated content	Your application generates text with the Responses API or Chat Completions API and needs moderation signals.
Classify standalone inputs	Your application needs to classify text or images without generating a model response.
Understand moderation results	Your application needs to interpret flags, categories, scores, or applied input types.
Review supported categories	Your application needs to know which harm categories apply to text, images, or both.

Moderate generated content

When your application needs generated text and moderation scores together, pass a top-level moderation object in the generation request. The API returns moderation scores for the model input and generated output without a separate moderation request.

The model still generates normally. Review the moderation results before you show the output to a user or take downstream actions.

Set moderation.model when you create a response:

The Responses API returns an input moderation_result object at response.moderation.input and an output moderation_result object at response.moderation.output.

Set moderation.model when you create a chat completion:

Chat Completions returns moderation result containers at completion.moderation.input and completion.moderation.output. For a request with one generated choice, read the first input and output result at results[0]. If you request multiple choices, completion.moderation.output.results[i] corresponds to completion.choices[i].

Inline moderation results use the same category fields as a standalone moderation result. Start with flagged for a first-pass decision, then inspect categories and category_scores for logging, routing, audit trails, or human-review queues. A refusal or other safety-aware response can still trigger a flag if it discusses harmful content. Treat moderation scores as signals for your application’s policy, not as an automatic blocking decision.

Check the moderation result type before you read scores if your application needs to handle moderation failures. If a moderation step can’t complete, the corresponding input or output moderation field can contain an error instead of moderation scores.

For tool-calling requests, moderation covers tool-call arguments and tool outputs when they appear in conversation content. It doesn’t cover tool names, tool descriptions, tool schemas, or response-format schemas.

If you stream a generated response, moderation scores arrive after the full generated output is available. They aren’t included with partial output deltas.

Classify standalone inputs

Use the moderation endpoint to classify text or image inputs without generating a model response. The tabs below show how to use the OpenAI libraries and the omni-moderation-latest model:

Moderate text inputs

Moderate images and text

Understand moderation results

Here’s a full example output for an image from a single frame of a war movie. The model identifies indicators of violence in the image, with a violence category score greater than 0.8.

The JSON response includes fields that describe which categories are present in the input and the model’s confidence in each category.

Output category	Description
`flagged`	Set to `true` if the model classifies the content as potentially harmful, `false` otherwise.
`categories`	Contains a dictionary of per-category violation flags. For each category, the value is `true` if the model flags the corresponding category as violated, `false` otherwise.
`category_scores`	Contains a dictionary of per-category scores. Each score represents the model’s confidence that the input contains content in the category. The value is between 0 and 1, where higher values denote higher confidence.
`category_applied_input_types`	Contains the input types that the category score applies to. For example, if the `violence/graphic` category applies to both image and text inputs, the `violence/graphic` property is set to `["image", "text"]`.

We plan to continuously upgrade the moderation endpoint’s underlying model. Therefore, custom policies that rely on category_scores may need recalibration over time.

Review supported categories

The table below describes the content categories that the moderation endpoint can detect and the input types that each category supports.

Categories marked as “Text only” do not support image inputs. If you send only images (without accompanying text) to the omni-moderation-latest model, it will return a score of 0 for these unsupported categories. Image files are limited to 20 MB.

Category	Description	Inputs
`harassment`	Content that expresses, incites, or promotes harassing language towards any target.	Text only
`harassment/threatening`	Harassment content that also includes violence or serious harm towards any target.	Text only
`hate`	Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.	Text only
`hate/threatening`	Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.	Text only
`illicit`	Content that gives advice or instruction on how to commit illicit acts. A phrase like “how to shoplift” would fit this category.	Text only
`illicit/violent`	The same types of content flagged by the `illicit` category, but also includes references to violence or procuring a weapon.	Text only
`self-harm`	Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.	Text and images
`self-harm/intent`	Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.	Text and images
`self-harm/instructions`	Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.	Text and images
`sexual`	Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).	Text and images
`sexual/minors`	Sexual content that includes an individual who is under 18 years old.	Text only
`violence`	Content that depicts death, violence, or physical injury.	Text and images
`violence/graphic`	Content that depicts death, violence, or physical injury in graphic detail.	Text and images

URL: https://developers.openai.com/api/docs/guides/moderation

⇱ Moderation | OpenAI API

Get started

Core concepts

Agents SDK

Tools

Run and scale

Evaluation

Realtime and audio

Specialized models

Going live

Legacy APIs

Resources

Getting Started

Using Codex

Configuration

Administration

Automation

Learn

Releases

Core Concepts

Plan

Build

Deploy

Conversion apps

Guides

Resources

Get started

Guides

File Upload

API

Measurement

Advertiser API

API Reference

Recent

Topics

Topics

Contribute

Categories

Topics

Programs

Events

Choose a moderation workflow

Moderate generated content

Classify standalone inputs

Understand moderation results

Review supported categories