Voozh

You need to agree to share your contact information to access this dataset

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

The Grounding Claims Dataset is a multi-domain dataset for evaluating whether a natural language claim is grounded (i.e., supported or entailed) by a document. The dataset is organized into four subsets, each requiring different types of reasoning:

general (1500 examples): Broad, everyday reasoning
logical (1000 examples): Logical consistency and inference
time_and_dates (100 examples): Temporal reasoning
prices_and_math (100 examples): Numerical and mathematical reasoning

Each entry consists of:

doc: A short context or passage
claim: A natural language statement to verify against the doc
label: A binary label indicating whether the claim is grounded in the document (1 for grounded, 0 for ungrounded)
dataset: The source subset name (e.g., "general")

📌 Features

Feature	Type	Description
`doc`	string	The document or passage providing the context
`claim`	string	A statement to verify against the document
`label`	string	grounded or ungrounded
`dataset`	string	The domain/subset the instance belongs to

📊 Usage

This dataset can be used to train and evaluate models on factual verification, natural language inference (NLI), and claim grounding tasks across multiple domains.

🏷️ Labels

grounded — The claim is grounded in the document.
ungrounded — The claim is ungrounded or contradicted by the document.

Downloads last month: 16

Models trained or fine-tuned on qualifire/grounding-benchmark

4B • Updated Jul 16, 2025 • 4 • 7

Collection including qualifire/grounding-benchmark

models and datasets related to RAG grounding and system prompt grounding • 2 items • Updated Jul 30, 2025