Add community evaluation results for AIME_2026, GPQA, HLE, HMMT_FEB_2026, MMLU-PRO, SWE-BENCH_PRO, SWE-BENCH_VERIFIED, TERMINAL-BENCH-2.0
#2
by nielsr HF Staff - opened
This PR adds community-provided evaluation results for the following benchmarks:
These results were extracted from the model card. This is based on the new evaluation results feature.
Note: This is an automated PR. Please review the evaluation results before merging.
YAML Metadata Error:Invalid content in Eval Result file .eval_results/hle.yaml
Check out the documentation for more information.
@nielsr can you please update your script to ensure it only emits valid evaluations.
Love the huggingface leaderboards!
