Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

train_shakespeare_char.py 1.1 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
  1. # train a miniature character-level shakespeare model
  2. # good for debugging and playing on macbooks and such
  3. out_dir = 'out-shakespeare-char'
  4. eval_interval = 250 # keep frequent because we'll overfit
  5. eval_iters = 200
  6. log_interval = 10 # don't print too too often
  7. # we expect to overfit on this small dataset, so only save when val improves
  8. always_save_checkpoint = False
  9. wandb_log = False # override via command line if you like
  10. wandb_project = 'shakespeare-char'
  11. wandb_run_name = 'mini-gpt'
  12. dataset = 'shakespeare_char'
  13. gradient_accumulation_steps = 1
  14. batch_size = 64
  15. block_size = 256 # context of up to 256 previous characters
  16. # baby GPT model :)
  17. n_layer = 6
  18. n_head = 6
  19. n_embd = 384
  20. dropout = 0.2
  21. learning_rate = 1e-3 # with baby networks can afford to go a bit higher
  22. max_iters = 5000
  23. lr_decay_iters = 5000 # make equal to max_iters usually
  24. min_lr = 1e-4 # learning_rate / 10 usually
  25. beta2 = 0.99 # make a bit bigger because number of tokens per iter is small
  26. warmup_iters = 100 # not super necessary potentially
  27. # on macbook also add
  28. # device = 'cpu' # run on cpu only
  29. # compile = False # do not torch compile the model
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...