VOOZH about

URL: https://developers.openai.com/api/reference/resources/evals/methods/create

⇱ Create eval | OpenAI API Reference


Skip to content

Create eval

POST/evals

Create the structure of an evaluation that can be used to test a model’s performance. An evaluation is a set of testing criteria and the config for a data source, which dictates the schema of the data used in the evaluation. After creating an evaluation, you can run it on different models and model parameters. We support several types of graders and datasources. For more information, see the Evals guide.

Body ParametersJSONExpand Collapse
metadata: optional Metadata

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

name: optional string

The name of the evaluation.

ReturnsExpand Collapse
id: string

Unique identifier for the evaluation.

created_at: number

The Unix timestamp (in seconds) for when the eval was created.

formatunixtime
metadata: Metadata

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

name: string

The name of the evaluation.

object: "eval"

The object type.

Create eval

curl https://api.openai.com/v1/evals \
 -H "Authorization: Bearer $OPENAI_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "name": "Sentiment",
 "data_source_config": {
 "type": "stored_completions",
 "metadata": {
 "usecase": "chatbot"
 }
 },
 "testing_criteria": [
 {
 "type": "label_model",
 "model": "o3-mini",
 "input": [
 {
 "role": "developer",
 "content": "Classify the sentiment of the following statement as one of 'positive', 'neutral', or 'negative'"
 },
 {
 "role": "user",
 "content": "Statement: {{item.input}}"
 }
 ],
 "passing_labels": [
 "positive"
 ],
 "labels": [
 "positive",
 "neutral",
 "negative"
 ],
 "name": "Example label grader"
 }
 ]
 }'
{
 "object": "eval",
 "id": "eval_67b7fa9a81a88190ab4aa417e397ea21",
 "data_source_config": {
 "type": "stored_completions",
 "metadata": {
 "usecase": "chatbot"
 },
 "schema": {
 "type": "object",
 "properties": {
 "item": {
 "type": "object"
 },
 "sample": {
 "type": "object"
 }
 },
 "required": [
 "item",
 "sample"
 ]
 },
 "testing_criteria": [
 {
 "name": "Example label grader",
 "type": "label_model",
 "model": "o3-mini",
 "input": [
 {
 "type": "message",
 "role": "developer",
 "content": {
 "type": "input_text",
 "text": "Classify the sentiment of the following statement as one of positive, neutral, or negative"
 }
 },
 {
 "type": "message",
 "role": "user",
 "content": {
 "type": "input_text",
 "text": "Statement: {{item.input}}"
 }
 }
 ],
 "passing_labels": [
 "positive"
 ],
 "labels": [
 "positive",
 "neutral",
 "negative"
 ]
 }
 ],
 "name": "Sentiment",
 "created_at": 1740110490,
 "metadata": {
 "description": "An eval for sentiment analysis"
 }
}
Returns Examples
{
 "object": "eval",
 "id": "eval_67b7fa9a81a88190ab4aa417e397ea21",
 "data_source_config": {
 "type": "stored_completions",
 "metadata": {
 "usecase": "chatbot"
 },
 "schema": {
 "type": "object",
 "properties": {
 "item": {
 "type": "object"
 },
 "sample": {
 "type": "object"
 }
 },
 "required": [
 "item",
 "sample"
 ]
 },
 "testing_criteria": [
 {
 "name": "Example label grader",
 "type": "label_model",
 "model": "o3-mini",
 "input": [
 {
 "type": "message",
 "role": "developer",
 "content": {
 "type": "input_text",
 "text": "Classify the sentiment of the following statement as one of positive, neutral, or negative"
 }
 },
 {
 "type": "message",
 "role": "user",
 "content": {
 "type": "input_text",
 "text": "Statement: {{item.input}}"
 }
 }
 ],
 "passing_labels": [
 "positive"
 ],
 "labels": [
 "positive",
 "neutral",
 "negative"
 ]
 }
 ],
 "name": "Sentiment",
 "created_at": 1740110490,
 "metadata": {
 "description": "An eval for sentiment analysis"
 }
}