Path: .
Showcasing DagsHub Annotations, Label Studio integration, Discussions, and other related features
Path: .
This repository contains the code to import and integrate the book and rating data that we work with. It imports and integrates data from several sources in a homogenous tabular outputs; import scripts are primarily Rust, with Python implement analyses.
Updated 4 months ago
Path: .
classification mail text on scan pdf images
dataset nlp classification object detection image classification dvc git
Updated 5 months ago
DPT is a QA-bot designed to help answer questions about DagsHub. It is a fork of the brilliant buster project. Using DagsHub's documentation as reference and sentence-transformers/all-MiniLM-L6-v2 for sentence similarity, we identify documents that contain relevant information to a given query. This is then passed to OpenAI's GPT-3.5 Turbo, that uses the information and the query given a prompt to return an answer to the user query, that's hopefully helpful.
Updated 1 year ago
Path: .
A subset of the LAION Aesthetics V2 dataset that contains only images with an aesthetics score of 6.5 or larger.
dataset nlp computer vision text-to-image generation dvc git
Updated 1 year ago
Path: .
databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.
Updated 1 year ago
Path: .
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.
Updated 1 year ago
Path: .
The purpose of the project is to make available a standard training and test setup for language modeling experiments.