|
0.4.0 -> 0.5.0
|
Myle Ott
|
|
6 years ago |
|
Update README.md
|
Myle Ott
|
|
6 years ago |
|
add links to pretrained language models
|
Alexei Baevski
|
|
6 years ago |
|
add default architecture for gbw fconv lm
|
alexeib
|
|
6 years ago |
|
Change --path to be colon-separated instead of comma-separated
|
Myle Ott
|
|
6 years ago |
|
Faster generation when using a single model (rather than ensemble)
|
Myle Ott
|
|
6 years ago |
|
Fix bidirectional lstm
|
Myle Ott
|
|
6 years ago |
|
Fix tests
|
Myle Ott
|
|
6 years ago |
|
Updates for latest PyTorch
|
Myle Ott
|
|
6 years ago |
|
Add FairseqTask
|
Myle Ott
|
|
6 years ago |
|
torch.arange default return type is changed in the latest pytorch version https://github.com/pytorch/pytorch/pull/7016
|
Sergey Edunov
|
|
6 years ago |
|
Fix length penalty when combined with --no-early-stop
|
Myle Ott
|
|
6 years ago |
|
initialize normalization constant for fconv_lm
|
alexeib
|
|
6 years ago |
|
build optimizer only once, otherwise it leaks cuda memory
|
Alexei Baevski
|
|
6 years ago |
|
Update README.md
|
Myle Ott
|
|
6 years ago |
|
Add more integration tests (LM, stories, transformer, lstm)
|
Myle Ott
|
|
6 years ago |
|
Suppress stdout in test_train
|
Myle Ott
|
|
6 years ago |
|
Small fixes
|
Myle Ott
|
|
6 years ago |
|
create examples dir and add conv lm + stories readme
|
Alexei Baevski
|
|
6 years ago |
|
Merge validate and val_loss functions (simplify train.py)
|
Myle Ott
|
|
6 years ago |
|
Use symlinks for redundant checkpoints
|
Myle Ott
|
|
6 years ago |
|
Unify various sharding into ShardedIterator
|
Myle Ott
|
|
6 years ago |
|
Migrate all binaries to use options.parse_args_and_arch
|
Myle Ott
|
|
6 years ago |
|
Nits
|
Myle Ott
|
|
6 years ago |
|
fix model loading in eval_lm
|
alexeib
|
|
6 years ago |
|
save best val loss in checkpoint
|
Alexei Baevski
|
|
6 years ago |
|
minor parameter fixes for stories model
|
Angela Fan
|
|
6 years ago |
|
modified writing prompts model parameters to make readme cleaner
|
Angela Fan
|
|
6 years ago |
|
added multiscale gated self attention layer with multiple heads, and pretrained fusion models
|
Angela Fan
|
|
6 years ago |
|
fix default params
|
alexeib
|
|
6 years ago |