Multi-Turn Evaluation Benchmarks A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction Viewer • Updated Nov 1, 2025 • 647 • 206 • 1 Viewer • Updated Dec 3, 2024 • 968 • 30 • 4 Viewer • Updated Dec 2, 2025 • 1k • 2.01k • 5
Stark Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge Viewer • Updated Nov 6, 2024 • 93.6k • 34 • 3 Viewer • Updated Nov 6, 2024 • 53.3k • 74 • 2 Viewer • Updated Nov 6, 2024 • 899k • 44 • 1 Viewer • Updated Nov 6, 2024 • 1.72M • 63 • 3
Thanos Skill-of-Mind-Infused LLM 1B • Updated Nov 8, 2024 • 5 3B • Updated Nov 8, 2024 • 7 • 4 8B • Updated Nov 8, 2024 • 6 • 3 Viewer • Updated Nov 8, 2024 • 100k • 21 • 5
Ultron Multi-modal conversation model & Multi-modal dialogue summarization model 1B • Updated Nov 6, 2024 • 3 3B • Updated Nov 6, 2024 • 1 • 3 8B • Updated Nov 6, 2024 • 4 • 2 11B • Updated Nov 6, 2024 • 2 • 1
Multi-Turn Evaluation Benchmarks A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction Viewer • Updated Nov 1, 2025 • 647 • 206 • 1 Viewer • Updated Dec 3, 2024 • 968 • 30 • 4 Viewer • Updated Dec 2, 2025 • 1k • 2.01k • 5
Thanos Skill-of-Mind-Infused LLM 1B • Updated Nov 8, 2024 • 5 3B • Updated Nov 8, 2024 • 7 • 4 8B • Updated Nov 8, 2024 • 6 • 3 Viewer • Updated Nov 8, 2024 • 100k • 21 • 5
Stark Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge Viewer • Updated Nov 6, 2024 • 93.6k • 34 • 3 Viewer • Updated Nov 6, 2024 • 53.3k • 74 • 2 Viewer • Updated Nov 6, 2024 • 899k • 44 • 1 Viewer • Updated Nov 6, 2024 • 1.72M • 63 • 3
Ultron Multi-modal conversation model & Multi-modal dialogue summarization model 1B • Updated Nov 6, 2024 • 3 3B • Updated Nov 6, 2024 • 1 • 3 8B • Updated Nov 6, 2024 • 4 • 2 11B • Updated Nov 6, 2024 • 2 • 1