VOOZH about

URL: https://huggingface.co/Qwen/Qwen3.5-397B-A17B/discussions/80

⇱ Qwen/Qwen3.5-397B-A17B · Add ResearchClawBench evaluation result


Add ResearchClawBench evaluation result

#80
by black-yt - opened

Hi Qwen team,

This PR adds the ResearchClawBench overall evaluation result for Qwen3.5-397B-A17B.

ResearchClawBench is an end-to-end scientific research benchmark for evaluating AI agents and LLMs on tasks that require reading task data and related work, writing and executing code, producing figures, and generating publication-style reports. Final reports are scored against expert checklists derived from human-authored target papers.

The run was executed with ResearchHarness, using tools enabled, code execution, and a file-system workspace. The submitted value is the overall mean score out of 100 over completed ResearchClawBench tasks:

  • Model: Qwen3.5-397B-A17B
  • Score: 14.23 / 100
  • Completed tasks: 40/40
  • Run date: 2026-04-16
  • Benchmark task id: overall

The detailed leaderboard is available here: https://internscience.github.io/ResearchClawBench-Home/

Thank you!

Ready to merge
This branch is ready to get merged automatically.

· Sign up or log in to comment