Open Source Data Science Datasets

Path: .

Subsets of IMDb data are available for access to customers for personal and non-commercial use

dataset nlp tabular dvc git

0 0 0

Path: .

The test data for the Large Text Compression Benchmark is the first 109 bytes of the English Wikipedia

dataset nlp dvc git

0 0 0

Path: .

SQuAD (Stanford Question Answering Dataset) is a dataset for reading comprehension. It consists of a list of questions by crowdworkers on a set of Wikipedia articles. The answers to each of the questions is a segment of text, or span, from the corresponding Wikipedia reading passage. Alternatively, the question may also be unanswerable.

dataset nlp question answering reading comprehension dvc git

0 0 0

Path: .

This archive contains the LAMBADA dataset (LAnguage Modeling Broadened to Account for Discourse Aspects)

dataset nlp language modelling dvc git

0 0 0

Path: .

A Dataset for Diverse, Explainable Multi-hop Question Answering

dataset nlp dvc git github

0 0

Path: README.md

This repositor mainly discusses the application of Mchine learning and optimization approaches to the decision-making process

dataset model nlp classification tensorflow dvc git

0 0 0

Path: raw

RPPP – Reddit Post Popularity Predictor A project with two goals: 1. Given a Reddit post, predict how popular it's going to be (what it's score will be) 2. Showcasing a remote working file system with DVC

dataset model nlp tabular dvc git

3 0 0

Path: .

A repo for the tutorial explaining the benefits of DVC and DAGsHub, using the classification of questions for the Cross Validated statistics Stack Exchange as an example problem

dataset nlp classification dvc git