Path: .
This repository contains the code to import and integrate the book and rating data that we work with. It imports and integrates data from several sources in a homogenous tabular outputs; import scripts are primarily Rust, with Python implement analyses.
Updated 4 months ago
Path: .
classification mail text on scan pdf images
dataset nlp classification object detection image classification dvc git
Updated 5 months ago
DPT is a QA-bot designed to help answer questions about DagsHub. It is a fork of the brilliant buster project. Using DagsHub's documentation as reference and sentence-transformers/all-MiniLM-L6-v2 for sentence similarity, we identify documents that contain relevant information to a given query. This is then passed to OpenAI's GPT-3.5 Turbo, that uses the information and the query given a prompt to return an answer to the user query, that's hopefully helpful.
Updated 1 year ago
Path: .
A subset of the LAION Aesthetics V2 dataset that contains only images with an aesthetics score of 6.5 or larger.
dataset nlp computer vision text-to-image generation dvc git
Updated 1 year ago
Path: .
databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.
Updated 1 year ago
Path: .
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.
Updated 1 year ago
Path: .
The purpose of the project is to make available a standard training and test setup for language modeling experiments.
Updated 1 year ago
Path: .
SQuAD (Stanford Question Answering Dataset) is a dataset for reading comprehension. It consists of a list of questions by crowdworkers on a set of Wikipedia articles. The answers to each of the questions is a segment of text, or span, from the corresponding Wikipedia reading passage. Alternatively, the question may also be unanswerable.
dataset nlp question answering reading comprehension dvc git
Updated 1 year ago
Path: .
This archive contains the LAMBADA dataset (LAnguage Modeling Broadened to Account for Discourse Aspects)
Updated 1 year ago
Path: README.md
This repositor mainly discusses the application of Mchine learning and optimization approaches to the decision-making process