A new kind of question-answering dataset that combines commonsense, text-based, and unanswerable questions, balanced for different genres and reasoning types. Reasoning type annotation for 9 types of reasoning: temporal, causality, factoid, coreference, character properties, their belief states, subsequent entity states, event durations, and unanswerable. Genres: CC license fiction, Voice of America news, blogs, user stories from Quora 800 texts, 18 questions for each (~14K questions).
Contact Matt Downey regarding the leaderboard, and Anna Rogers regarding QuAIL data.
Rank | All | Temp. | Caus. | Fact. | Char. | Ent. | Belief | Sub. | Dur. | Unans. | Team | Submitted | Model |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 53.4 | 53.3 | 61.2 | 62.1 | 42.9 | 55.4 | 58.8 | 53.3 | 62.9 | 30.8 | matt.downey18 | 31 Oct 20 09:02 EDT | TML BERT Baseline |
2 | 31.1 | 34.2 | 29.2 | 35.4 | 33.3 | 26.2 | 25.8 | 25.0 | 25.8 | 45.0 | matt.downey18 | 30 Oct 20 20:16 EDT | TML PMI Baseline |
The leaderboard is live!
After submission, please allow a few hours for your results to show up in the leaderboard.