VOOZH
about
URL: https://dev.to/t/evals
⇱ Evals - DEV Community
AI Evals, Part 5: From a Number to a Gate Evals in CI and Production
👁 mrviduus profile
Vasyl
👁 Image
Vasyl
Jun 17
AI Evals, Part 5: From a Number to a Gate Evals in CI and Production
#
ai
#
evals
#
llm
#
dotnet
Add Comment
4 min read
AI Evals, Part 4: LLM-as-Judge, Done Right
👁 mrviduus profile
Vasyl
👁 Image
Vasyl
Jun 17
AI Evals, Part 4: LLM-as-Judge, Done Right
#
ai
#
evals
#
llm
#
dotnet
Add Comment
5 min read
AI Evals, Part 3: Golden Datasets That Dont Lie
👁 mrviduus profile
Vasyl
👁 Image
Vasyl
Jun 16
AI Evals, Part 3: Golden Datasets That Dont Lie
#
ai
#
evals
#
llm
#
dotnet
Add Comment
5 min read
LLM-as-Judge Is Three Decisions
👁 javieraguilarai profile
JaviMaligno
👁 Image
JaviMaligno
Jun 14
LLM-as-Judge Is Three Decisions
#
aiagents
#
evals
#
llm
#
observability
Add Comment
6 min read
AI Evals, Part 2: Error Analysis The Unglamorous Superpower Behind Good Evals
👁 mrviduus profile
Vasyl
👁 Image
Vasyl
Jun 12
AI Evals, Part 2: Error Analysis The Unglamorous Superpower Behind Good Evals
#
ai
#
evals
#
llm
#
dotnet
Add Comment
5 min read
OpenAI Agent Builder and Evals Winddown Migration Checklist
👁 jangwook_kim_e31e7291ad98 profile
Jangwook Kim
👁 Image
Jangwook Kim
Jun 11
OpenAI Agent Builder and Evals Winddown Migration Checklist
#
openai
#
agentbuilder
#
evals
#
agentssdk
Add Comment
11 min read
AI Evals, Explained: How We Actually Know Our AI Is Any Good
👁 mrviduus profile
Vasyl
👁 Image
Vasyl
Jun 10
AI Evals, Explained: How We Actually Know Our AI Is Any Good
#
ai
#
evals
#
llm
#
dotnet
Add Comment
6 min read
How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions
👁 dishant_sethi profile
Dishant Sethi
👁 Image
Dishant Sethi
May 27
How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions
#
evals
#
ai
#
llmops
#
agents
Add Comment
9 min read
The Loop Is Only as Good as the Metric
👁 david_aronchick_ea415de50 profile
David Aronchick
👁 Image
David Aronchick
May 5
The Loop Is Only as Good as the Metric
#
ai
#
evals
#
machinelearning
#
data
Add Comment
7 min read
Why Most AI Teams Are Flying Blind: And What to Do About It
👁 aasawari_sahasrabuddhe_3c profile
aasawari sahasrabuddhe
👁 Image
aasawari sahasrabuddhe
Apr 23
Why Most AI Teams Are Flying Blind: And What to Do About It
#
ai
#
evals
#
genai
#
womenintech
1
comment
13 min read
Wait, you guys run evals?
👁 frank_brsrk profile
Frank Brsrk
👁 Image
Frank Brsrk
Apr 22
Wait, you guys run evals?
#
ai
#
evals
#
llm
Add Comment
1 min read
If You Can Survive a Toddler, You Can Ship LLMs in Production
👁 sattensil888 profile
Scarlett Attensil
👁 Image
Scarlett Attensil
May 14
If You Can Survive a Toddler, You Can Ship LLMs in Production
#
ai
#
evals
#
llm
👁 Image
👁 Image
👁 Image
5
reactions
3
comments
5 min read
From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills
👁 LangWatch logo
👁 draismaaaa profile
Manouk Draisma
👁 Image
Manouk Draisma
for
LangWatch
Mar 24
From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills
#
ai
#
agents
#
evals
#
claudecode
Add Comment
7 min read
Your coding agent already knows how to test your AI agent (we just turned it into a Skill)
👁 draismaaaa profile
Manouk Draisma
👁 Image
Manouk Draisma
Mar 23
Your coding agent already knows how to test your AI agent (we just turned it into a Skill)
#
agents
#
agentskills
#
evals
#
simulations
👁 Image
1
reaction
Add Comment
4 min read
Build an eval harness for 184 AI agent prompts with promptfoo
👁 jonesrussell profile
Russell Jones
👁 Image
Russell Jones
Mar 30
Build an eval harness for 184 AI agent prompts with promptfoo
#
promptfoo
#
evals
#
aiagents
#
llm
Add Comment
8 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
👁 DEV Community
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account
👁 Image
👁 Image
👁 Image
👁 Image
👁 Image