VOOZH about

URL: https://openreview.net/forum?id=t4lDjFq5lb

⇱ Position: We Need A Unified Definition of Hallucination (It’s The World Model, Stupid!) | OpenReview


Position: We Need A Unified Definition of Hallucination (It’s The World Model, Stupid!)

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 Position Paper Track regularEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Despite numerous attempts at mitigation since the inception of language models, hallucinations remain a persistent problem even in today's frontier LLMs. Why is this? We review existing definitions of hallucination and fold them into a single, unified definition wherein prior definitions are subsumed. This position paper argues that hallucination can be unified by defining it as simply inaccurate (internal) world modeling, in a form where it is observable to the user. For example, stating a fact which contradicts a knowledge base OR producing a summary which contradicts the source. By varying the reference world model and conflict policy, our framework unifies prior definitions. We argue that this unified view is useful because it forces evaluations to clarify their assumed reference ``world'', distinguishes true hallucinations from planning or reward errors, and provides a common language for comparison across benchmarks and discussion of mitigation strategies. Building on this definition, we also connect our framework to HalluWorld (Liu et al., 2026), a complementary benchmark that instantiates fully specified reference world models for stress-testing model hallucinations.
Lay Summary: Despite numerous attempts at mitigation since the inception of language models, hallucinations remain a persistent problem even in today's frontier LLMs. Why is this? We review existing definitions of hallucination and fold them into a single, unified definition wherein prior definitions are subsumed. This position paper argues that hallucination can be unified by defining it as simply inaccurate (internal) world modeling, in a form where it is observable to the user. For example, stating a fact which contradicts a knowledge base OR producing a summary which contradicts the source. By varying the reference world model and conflict policy, our framework unifies prior definitions. We argue that this unified view is useful because it forces evaluations to clarify their assumed reference ``world'', distinguishes true hallucinations from planning or reward errors, and provides a common language for comparison across benchmarks and discussion of mitigation strategies. Building on this definition, we also connect our framework to HalluWorld (Liu et al., 2026), a complementary benchmark that instantiates fully specified reference world models for stress-testing model hallucinations.
Primary Area: Model Understanding, Explainability, Interpretability, and Trust
Keywords: large language models, hallucination detection, hallucination mitigation, world models, natural language processing, machine learning, artificial intelligence
Originally Submitted PDF: pdf
Submission Number: 305
Loading