Voozh

Add community evaluation results for AIME_2026, GPQA, HLE, HMMT_FEB_2026, MMLU-PRO, SWE-BENCH_PRO, SWE-BENCH_VERIFIED, TERMINAL-BENCH-2.0

by nielsr HF Staff - opened Apr 22

←

Apr 22

This PR adds community-provided evaluation results for the following benchmarks:

These results were extracted from the model card. This is based on the new evaluation results feature.

Note: This is an automated PR. Please review the evaluation results before merging.

Apr 22

YAML Metadata Error:Invalid content in Eval Result file .eval_results/hle.yaml
Check out the documentation for more information.

@nielsr can you please update your script to ensure it only emits valid evaluations.

Love the huggingface leaderboards!

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment