Are you sure you want to delete this access key?
We provide the code to reproduce results obtained in the paper: https://arxiv.org/abs/1711.04956. The code is based on fairseq-py, please refer to the fairseq-py README for general usage instructions.
This branch is using an older version of pytorch v0.3.1, we recommend installing it from the source: https://github.com/pytorch/pytorch/tree/v0.3.1
For IWSLT dataset, first train a baseline model (every command here assumes 8 GPU setup):
python ./train.py $DATA -a fconv_iwslt_de_en --lr 0.25 \
--clip-norm 0.1 --dropout 0.3 --max-tokens 1000 \
--label-smoothing 0.1 --force-anneal 200 --save-dir checkpoints_tls
Please, refer to "Data Pre-processing" section of this readme to get the dataset.
Then, to fine-tune the trained model with sequence level training, do one of the following commands.
SeqNLL:
mkdir checkpoints_seqnll
cp checkpoints_tls/checkpoint200.pt checkpoints_seqnll/checkpoint_last.pt
python train.py $DATA -a fconv_iwslt_de_en --clip-norm 0.1 \
--momentum 0.9 --lr 0.25 \
--label-smoothing 0.1 --dropout 0.3 --max-tokens 1000 \
--seq-max-len-a 1.5 --seq-max-len-b 5 --seq-criterion SequenceCrossEntropyCriterion \
--seq-combined-loss-alpha 0.4 --force-anneal 220 --seq-beam 16 --save-dir checkpoints_seqnll
Risk:
mkdir checkpoints_risk
cp checkpoints_tls/checkpoint200.pt checkpoints_risk/checkpoint_last.pt
python train.py $DATA -a fconv_iwslt_de_en --clip-norm 0.1 \
--momentum 0.9 --lr 0.25 \
--label-smoothing 0.1 --dropout 0.3 --max-tokens 1000 \
--seq-max-len-a 1.5 --seq-max-len-b 5 --seq-criterion SequenceRiskCriterion \
--seq-combined-loss-alpha 0.5 --force-anneal 220 --seq-beam 16 --save-dir checkpoints_risk
Max-margin:
mkdir checkpoints_maxmargin
cp checkpoints_tls/checkpoint200.pt checkpoints_maxmargin/checkpoint_last.pt
python train.py $DATA -a fconv_iwslt_de_en --clip-norm 0.1 \
--momentum 0.9 --lr 0.25 \
--label-smoothing 0.1 --dropout 0.3 --max-tokens 1000 \
--seq-max-len-a 1.5 --seq-max-len-b 5 --seq-criterion SequenceMaxMarginCriterion \
--seq-combined-loss-alpha 0.4 --force-anneal 220 --seq-beam 16 --save-dir checkpoints_maxmargin
Multi-Margin:
mkdir checkpoints_multimargin
cp checkpoints_tls/checkpoint200.pt checkpoints_multimargin/checkpoint_last.pt
python train.py $DATA -a fconv_iwslt_de_en --clip-norm 0.1 \
--momentum 0.9 --lr 0.25 \
--label-smoothing 0.1 --dropout 0.3 --max-tokens 1000 \
--seq-max-len-a 1.5 --seq-max-len-b 5 --seq-criterion SequenceMultiMarginCriterion \
--seq-combined-loss-alpha 0.4 --force-anneal 220 --seq-beam 16 --save-dir checkpoints_multimargin
Softmax-Margin:
mkdir checkpoints_softmaxmargin
cp checkpoints_tls/checkpoint200.pt checkpoints_softmaxmargin/checkpoint_last.pt
python train.py $DATA -a fconv_iwslt_de_en --clip-norm 0.1 \
--momentum 0.9 --lr 0.25 \
--label-smoothing 0.1 --dropout 0.3 --max-tokens 1000 \
--seq-max-len-a 1.5 --seq-max-len-b 5 --seq-criterion SequenceSoftMaxMarginCriterion \
--seq-combined-loss-alpha 0.4 --force-anneal 220 --seq-beam 16 --save-dir checkpoints_softmaxmargin
Then, you can evaluate models normally.
fairseq-py is BSD-licensed. The license applies to the pre-trained models as well. We also provide an additional patent grant.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?