MultiRC
Reading Comprehension over Multiple Sentences
Introduction
MultiRC (Multi-Sentence
Reading Comprehension) is a dataset of short
paragraphs and multi-sentence questions that can
be answered from the content of the paragraph.
We have designed the dataset with three key challenges in mind:
-
The number of correct answer-options for each question is not pre-specified. This removes
the over-reliance
of current approaches on answer-options and forces
them to decide on the correctness of each candidate answer independently of others. In other
words, unlike
previous work, the task here is not
to simply identify the best answer-option, but to evaluate the correctness of each
answer-option
individually.
- The correct answer(s) is not required to be a span in the text.
- The paragraphs in our dataset have diverse provenance by being extracted from 7 different
domains such as
news, fiction, historical text etc., and hence are
expected to be more diverse in their contents as compared to single-domain datasets.
The goal of this dataset is to encourage the research community to explore approaches that can do
more than
sophisticated lexical-level matching.
Example
Each question is associated with several choices for answer-options, out of which one or more
correctly
answer the question.
Each instance consists of a multi-sentence paragraph, a question, and answer-options. All instances
were
constructed
such that it is not possible to answer a question correctly without gathering information from
multiple
sentences.
Here is an example:
Paragraph:
Sent 1: Most young mammals, including humans, like to play.
Sent 2: Play is one way they learn the skills that they will need as adults.
Sent 3: Think about how kittens play.
Sent 4: They pounce on toys and chase each other.
Sent 5: This helps them learn how to be better predators.
Sent 6: Big cats also play.
Sent 7: The lion cubs pictured below are playing.
Sent 8: At the same time, they are also practicing their hunting skills.
Sent 9: The dogs are playing tug-of-war with a toy.
Sent 10: What do you think they are learning by playing together this way?
Sent 11: Human children learn by playing as well.
Sent 12: For example, playing games and sports can help them learn to follow rules.
Sent 13: They also learn to work together.
Sent 14: The young child pictured below is playing in the sand.
Sent 15: She is learning about the world through play.
Sent 16: What do you think she might be learning?
-
Question: What do human children learn by playing games and sports?
-
to
follow rules
They
learn to follow rules and work together.
They
learn about the world
Learn to
work together
skills
that they will need as adult
they
learn about
how to cheat
how to
hunt
tug-of-war
only
learns to
follow rules
only
learns
working together
hunting
skills
News
- 05/2019: MultiRC is now part of SuperGLUE!
Hence, from now on we will not support the leaderboard on this website.
- 03/2019: Check out recent papers using MultiRC:
- 05/2018: Website is up!
Releases and Downloads
The entire corpus consists of ~10K questionss (~6k multiple-sentence questions).
We release about 60% of this data as training/dev data.
The rest of the data is saved for evaluation. Every we will include a new unseen
additional evaluation data. The purpose of this is to prevent unintentional overfitting
over time, through many evaluations. Here is our current expected release plan:
Update (May 2019): The union of
R1 +
R2 is now part of
SuperGLUE evaluation now.
Paper
If you find this data helpful in your work, please cite
this
paper:
@inproceedings{MultiRC2018,
author = {Daniel Khashabi and Snigdha Chaturvedi and Michael Roth and Shyam Upadhyay and Dan Roth},
title = {Looking Beyond the Surface:A Challenge Set for Reading Comprehension over Multiple Sentences},
booktitle = {Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL)},
year = {2018}
}