Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Christian Orr 36c7db8c44
bugfix in decode function
2 years ago
..
36c7db8c44
bugfix in decode function
2 years ago
d17350a31d
add support for character-level language models, a new character-level shakespeare dataset, a new config file that shows how to train a character-level baby GPT on it, and adjust the sample function to figure out if it should decode with characters or GPT2 bpe tokens. The current implementation is a bit hacky and basically assumes just these two possibilities. In the future we may want to support more general encoders or decoders.
2 years ago

readme.md

You have to be logged in to leave a comment. Sign In

tiny shakespeare, character-level

Tiny shakespeare, of the good old char-rnn fame :) Treated on character-level.

After running prepare.py:

  • train.bin has 1,003,854 tokens
  • val.bin has 111,540 tokens
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...