You have to be logged in to leave a comment.

Scaling Neural Machine Translation (Ott et al., 2018)

This page includes instructions for reproducing results from the paper Scaling Neural Machine Translation (Ott et al., 2018).

Pre-trained models

Description	Dataset	Model	Test set(s)
Transformer (Ott et al., 2018)	WMT14 English-French	download (.tar.bz2)	newstest2014 (shared vocab): download (.tar.bz2)
Transformer (Ott et al., 2018)	WMT16 English-German	download (.tar.bz2)	newstest2014 (shared vocab): download (.tar.bz2)

Training a new model on WMT'16 En-De

Please first download the preprocessed WMT'16 En-De data provided by Google. Then:

Extract the WMT'16 En-De data:

$ TEXT=wmt16_en_de_bpe32k
$ mkdir $TEXT
$ tar -xzvf wmt16_en_de.tar.gz -C $TEXT

Preprocess the dataset with a joined dictionary:

$ python preprocess.py --source-lang en --target-lang de \
  --trainpref $TEXT/train.tok.clean.bpe.32000 \
  --validpref $TEXT/newstest2013.tok.bpe.32000 \
  --testpref $TEXT/newstest2014.tok.bpe.32000 \
  --destdir data-bin/wmt16_en_de_bpe32k \
  --nwordssrc 32768 --nwordstgt 32768 \
  --joined-dictionary

Train a model:

$ python train.py data-bin/wmt16_en_de_bpe32k \
  --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
  --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
  --lr 0.0005 --min-lr 1e-09 \
  --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  --max-tokens 3584 \
  --fp16

Note that the --fp16 flag requires you have CUDA 9.1 or greater and a Volta GPU.

If you want to train the above model with big batches (assuming your machine has 8 GPUs):

add --update-freq 16 to simulate training on 8*16=128 GPUs
increase the learning rate; 0.001 works well for big batches

Citation

@inproceedings{ott2018scaling,
  title = {Scaling Neural Machine Translation},
  author = {Ott, Myle and Edunov, Sergey and Grangier, David and Auli, Michael},
  booktitle = {Proceedings of the Third Conference on Machine Translation (WMT)},
  year = 2018,
}

Tip!

Press p or to see the previous file or, n or to see the next file

README.md 2.8 KB

Permalink History Raw

Scaling Neural Machine Translation (Ott et al., 2018)

Pre-trained models

Training a new model on WMT'16 En-De

Citation

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

marcelomata / fairseq forked from Guy/fairseq

README.md 2.8 KB Permalink History Raw

Scaling Neural Machine Translation (Ott et al., 2018)

Pre-trained models

Training a new model on WMT'16 En-De

Citation

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

marcelomata
/
fairseq
forked from Guy/fairseq

README.md 2.8 KB

Permalink History Raw