Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Commit History
Message Author SHA1 Date
Commit wmt14_en_de_token.dvc   Tolstoyevsky 5 years ago
Merge branch 'dvc' into dvc-train-single-batch   Tolstoyevsky 5 years ago
Fixed reference to resume checkpoint   Tolstoyevsky 5 years ago
Skipped the unzipping stage completely   Tolstoyevsky 5 years ago
Created a training and validation set that should fit in a single batch, to try to overfit it to validate the model is working   Tolstoyevsky 5 years ago
Gave better names to the tokenization stage, and moved the prep command to a script   Tolstoyevsky 5 years ago
Fix path names in resume-checkpoint.dvc   Tolstoyevsky 5 years ago
Removed the unzipped files cache   Tolstoyevsky 5 years ago
Refactored the checkpoint moving stage   Tolstoyevsky 5 years ago
Training iteration   Guy 5 years ago
Added training step stub   Tolstoyevsky 5 years ago
Ran data prep dvc stage   Guy 5 years ago
Change logo size   Dean 5 years ago
Calculated BPE using DVC   Guy 5 years ago
Added missing commoncrawl dependency   Tolstoyevsky 5 years ago
Refactor data preparation a bit   Tolstoyevsky 5 years ago
Universal encoder seems ready   Tolstoyevsky 5 years ago
Small things - hyperparam yaml, script to clean corrupted unicode   Tolstoyevsky 5 years ago
Finished preprocessing   Tolstoyevsky 5 years ago
Setup the stub of the preprocessing dvc stage   Tolstoyevsky 5 years ago
Fixed commoncrawl yet again   Tolstoyevsky 5 years ago
Fixed training.dvc   Tolstoyevsky 5 years ago
Reorganized dvc   Tolstoyevsky 5 years ago
Working on the prepare script, tokenization looks like it's working   Tolstoyevsky 5 years ago
Switched to the correct dataset (nc-12 instead of nc-13)   Tolstoyevsky 5 years ago
Unzipped raw data   Tolstoyevsky 5 years ago
Moved data into data folder   Tolstoyevsky 5 years ago
Init DVC + downloaded and tracking data   Tolstoyevsky 5 years ago
stitch preprocessing pipeline   Ruty Rinott 5 years ago
Add CheckpointManager to keep avg checkpoint weights in memory to reduce disk read when averaging + various checkpoint refactoring   Wei Ho 5 years ago