Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
hlib 34e81dc3d2
Merge branch 'master' of https://github.com/giganticode/datasets
3 years ago
..
0e666e309a
add stage for computing the stats for devanbu small corpus
3 years ago
a13964473d
rename extract-25k-vocab-corpus.sh to be able to reuse it to extract other corpora
3 years ago
71e95fd455
improments to pre-processing stage: track also the resulting vocab; use a separate venv to run codeprep; extract codeprpe version with yq
3 years ago

Comments

Loading...