Automatic Speech Recognition (ASR) Error Robustness
Helpful Sentences from Reviews
Learning to Rank and Filter – community question answering
AI2 TabMCQ: Multiple Choice Questions aligned with the Aristo Tablestore
The Klarna Product-Page Dataset
MultiCoNER Dataset
Low Context Name Entity Recognition (NER) Datasets with Gazetteer
WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation
Common Screens
REDASA COVID-19 Open Data
Sudachi Language Resources
Japanese Tokenizer Dictionaries
Answer Reformulation
Common Crawl
NLP – fast.ai datasets
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
VoiSeR
OpenAlex dataset
ZEST: ZEroShot learning from Task descriptions
Pre- and post-purchase product questions
Amazon-PQA
CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in OMOP Common Data Model
AI2 Diagram Dataset (AI2D)
Textbook Question Answering (TQA)
Synthea synthetic patient generator data in OMOP Common Data Model
AI2 Tablestore (November 2015 Snapshot)
Humor Detection from Product Question Answering Systems
Aristo Tuple KB
Humor patterns used for querying Alexa traffic
Discrete Reasoning Over the content of Paragraphs (DROP)
The Massively Multilingual Image Dataset (MMID)
Wizard of Tasks
Reasoning Over Paragraph Effects in Situations (ROPES)
Quoref
Provision of Web-Scale Parallel Corpora for Official European Languages (ParaCrawl)
Enriched Topical-Chat Dataset for Knowledge-Grounded Dialogue Systems
National Archives Catalog
Google Books Ngrams
PASS: Perturb-and-Select Summarizer for Product Reviews
Multilingual Name Entity Recognition (NER) Datasets with Gazetteer
Phrase Clustering Dataset (PCD)
The Multilingual Amazon Reviews Corpus
Software Heritage Graph Dataset