![]() |
VOOZH | about |
dotnet add package AgentEval --version 0.12.2-beta
NuGet\Install-Package AgentEval -Version 0.12.2-beta
<PackageReference Include="AgentEval" Version="0.12.2-beta" />
<PackageVersion Include="AgentEval" Version="0.12.2-beta" />Directory.Packages.props
<PackageReference Include="AgentEval" />Project file
paket add AgentEval --version 0.12.2-beta
#r "nuget: AgentEval, 0.12.2-beta"
#:package AgentEval@0.12.2-beta
#addin nuget:?package=AgentEval&version=0.12.2-beta&prereleaseInstall as a Cake Addin
#tool nuget:?package=AgentEval&version=0.12.2-beta&prereleaseInstall as a Cake Tool
The .NET Evaluation Toolkit for AI Agents
Built first for Microsoft Agent Framework (MAF) and Microsoft.Extensions.AI. What RAGAS and DeepEval do for Python, AgentEval does for .NET.
because reasons, and assertion scopesusing AgentEval;
using AgentEval.MAF;
using AgentEval.Assertions;
// Create evaluation harness
var harness = new MAFEvaluationHarness(evaluatorClient);
// Run evaluation with tool tracking
var result = await harness.RunEvaluationAsync(agent, new TestCase
{
Name = "Feature Planning Test",
Input = "Plan a user authentication feature",
EvaluationCriteria = ["Should include security considerations"]
});
// Assert tool usage with "because" reasons
result.ToolUsage!
.Should()
.HaveCalledTool("SecurityTool", because: "auth features require security review")
.BeforeTool("FeatureTool")
.WithoutError()
.And()
.HaveNoErrors();
// Assert performance
result.Performance!
.Should()
.HaveTotalDurationUnder(TimeSpan.FromSeconds(10))
.HaveEstimatedCostUnder(0.10m);
var result = await AttackPipeline.Create()
.WithAllAttacks()
.ScanAsync(agent);
result.Should().HaveOverallScoreAbove(85);
result.ExportAsync("security-report.sarif", ExportFormat.Sarif);
Capture agent executions for deterministic replay — no LLM calls needed in CI:
// Record
await using var recorder = new TraceRecordingAgent(realAgent, "weather_test");
var response = await recorder.InvokeAsync("What's the weather?");
await TraceSerializer.SaveToFileAsync(recorder.Trace, "trace.json");
// Replay (deterministic, free)
var trace = await TraceSerializer.LoadFromFileAsync("trace.json");
var replayer = new TraceReplayingAgent(trace);
var replayed = await replayer.InvokeAsync("What's the weather?");
var result = await comparer.CompareModelsAsync(
factories: [gpt4oFactory, gpt4oMiniFactory],
testCases: testSuite,
options: new ComparisonOptions(RunsPerModel: 5));
Console.WriteLine(result.ToMarkdown());
dotnet add package AgentEval --prerelease
Single package, modular internals — AgentEval ships as one NuGet package containing 6 focused assemblies:
AgentEval.Abstractions — Public contracts and interfacesAgentEval.Core — Metrics, assertions, comparison, tracingAgentEval.DataLoaders — Data loading and export (JSON, YAML, CSV, JSONL)AgentEval.MAF — Microsoft Agent Framework integrationAgentEval.RedTeam — Security testing (multiple attack types and probes)// Register all services at once (recommended):
services.AddAgentEvalAll();
// Or register selectively:
services.AddAgentEval(); // Core services only
services.AddAgentEvalDataLoaders(); // DataLoaders + Exporters
services.AddAgentEvalRedTeam(); // Red Team security testing
MIT License — See LICENSE for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 is compatible. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
This package is not used by any NuGet packages.
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.12.2-beta | 0 | 6/18/2026 |
| 0.12.1-beta | 0 | 6/18/2026 |
| 0.12.0-beta | 50 | 6/14/2026 |
| 0.10.1-beta | 393 | 5/18/2026 |
| 0.10.0-beta | 90 | 5/17/2026 |
| 0.9.0-beta | 66 | 5/17/2026 |
| 0.8.1-beta | 676 | 4/29/2026 |
| 0.8.0-beta | 70 | 4/28/2026 |
| 0.6.0-beta | 1,632 | 3/5/2026 |
| 0.5.4-beta | 107 | 3/3/2026 |
| 0.5.3-beta | 130 | 3/1/2026 |
| 0.5.2-beta | 102 | 2/28/2026 |
| 0.5.1-beta | 97 | 2/28/2026 |
| 0.4.0-beta | 114 | 2/22/2026 |
| 0.3.0-beta | 148 | 1/25/2026 |
| 0.2.1-beta | 88 | 1/24/2026 |
| 0.2.0-beta | 88 | 1/18/2026 |
| 0.1.1-alpha | 97 | 1/3/2026 |
| 0.1.0-alpha | 90 | 1/3/2026 |
Per-version release notes: https://github.com/AgentEvalHQ/AgentEval/blob/main/CHANGELOG.md