Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

sacrebleu_pregen.sh 574 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  1. #!/bin/bash
  2. if [ $# -ne 4 ]; then
  3. echo "usage: $0 TESTSET SRCLANG TGTLANG GEN"
  4. exit 1
  5. fi
  6. TESTSET=$1
  7. SRCLANG=$2
  8. TGTLANG=$3
  9. GEN=$4
  10. echo 'Cloning Moses github repository (for tokenization scripts)...'
  11. git clone https://github.com/moses-smt/mosesdecoder.git
  12. SCRIPTS=mosesdecoder/scripts
  13. DETOKENIZER=$SCRIPTS/tokenizer/detokenizer.perl
  14. grep ^H $GEN \
  15. | sed 's/^H\-//' \
  16. | sort -n -k 1 \
  17. | cut -f 3 \
  18. | perl $DETOKENIZER -l $TGTLANG \
  19. | sed "s/ - /-/g" \
  20. > $GEN.sorted.detok
  21. sacrebleu --test-set $TESTSET --language-pair "${SRCLANG}-${TGTLANG}" < $GEN.sorted.detok
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...