Voozh

4 min read

👁 mrviduus profile

Vasyl

Jun 17

AI Evals, Part 4: LLM-as-Judge, Done Right

#ai #evals #llm #dotnet

Add Comment

5 min read

👁 mrviduus profile

Vasyl

Jun 16

AI Evals, Part 3: Golden Datasets That Dont Lie

#ai #evals #llm #dotnet

Add Comment

5 min read

👁 javieraguilarai profile

JaviMaligno

Jun 14

LLM-as-Judge Is Three Decisions

#aiagents #evals #llm #observability

Add Comment

6 min read

👁 mrviduus profile

Vasyl

Jun 12

AI Evals, Part 2: Error Analysis The Unglamorous Superpower Behind Good Evals

#ai #evals #llm #dotnet

Add Comment

5 min read

👁 jangwook_kim_e31e7291ad98 profile

Jangwook Kim

Jun 11

OpenAI Agent Builder and Evals Winddown Migration Checklist

#openai #agentbuilder #evals #agentssdk

Add Comment

11 min read

👁 mrviduus profile

Vasyl

Jun 10

AI Evals, Explained: How We Actually Know Our AI Is Any Good

#ai #evals #llm #dotnet

Add Comment

6 min read

👁 dishant_sethi profile

Dishant Sethi

May 27

How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions

#evals #ai #llmops #agents

Add Comment

9 min read

👁 david_aronchick_ea415de50 profile

David Aronchick

May 5

The Loop Is Only as Good as the Metric

#ai #evals #machinelearning #data

Add Comment

7 min read

👁 aasawari_sahasrabuddhe_3c profile

aasawari sahasrabuddhe

Apr 23

Why Most AI Teams Are Flying Blind: And What to Do About It

#ai #evals #genai #womenintech

1 comment

13 min read

👁 frank_brsrk profile

Frank Brsrk

Apr 22

Wait, you guys run evals?

#ai #evals #llm

Add Comment

1 min read

👁 sattensil888 profile

Scarlett Attensil

May 14

If You Can Survive a Toddler, You Can Ship LLMs in Production

#ai #evals #llm

👁 Image
👁 Image
👁 Image
5 reactions

3 comments

5 min read

👁 LangWatch logo
👁 draismaaaa profile

Manouk Draisma

for LangWatch

Mar 24

From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills

#ai #agents #evals #claudecode

Add Comment

7 min read

👁 draismaaaa profile

Manouk Draisma

Mar 23

Your coding agent already knows how to test your AI agent (we just turned it into a Skill)

#agents #agentskills #evals #simulations

👁 Image
1 reaction

Add Comment

4 min read

👁 jonesrussell profile

Russell Jones

Mar 30

Build an eval harness for 184 AI agent prompts with promptfoo

#promptfoo #evals #aiagents #llm

Add Comment

8 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

URL: https://dev.to/t/evals

⇱ Evals - DEV Community

AI Evals, Part 5: From a Number to a Gate Evals in CI and Production

AI Evals, Part 4: LLM-as-Judge, Done Right

AI Evals, Part 3: Golden Datasets That Dont Lie

LLM-as-Judge Is Three Decisions

AI Evals, Part 2: Error Analysis The Unglamorous Superpower Behind Good Evals

OpenAI Agent Builder and Evals Winddown Migration Checklist

AI Evals, Explained: How We Actually Know Our AI Is Any Good

How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions

The Loop Is Only as Good as the Metric

Why Most AI Teams Are Flying Blind: And What to Do About It

Wait, you guys run evals?

If You Can Survive a Toddler, You Can Ship LLMs in Production

From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills

Your coding agent already knows how to test your AI agent (we just turned it into a Skill)

Build an eval harness for 184 AI agent prompts with promptfoo