Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

read_binarized.py 1.0 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
  1. #!/usr/bin/env python3
  2. # Copyright (c) 2017-present, Facebook, Inc.
  3. # All rights reserved.
  4. #
  5. # This source code is licensed under the license found in the LICENSE file in
  6. # the root directory of this source tree. An additional grant of patent rights
  7. # can be found in the PATENTS file in the same directory.
  8. #
  9. import argparse
  10. from fairseq.data import dictionary
  11. from fairseq.data import IndexedDataset
  12. def get_parser():
  13. parser = argparse.ArgumentParser(
  14. description='writes text from binarized file to stdout')
  15. # fmt: off
  16. parser.add_argument('--dict', metavar='FP', required=True, help='dictionary containing known words')
  17. parser.add_argument('--input', metavar='FP', required=True, help='binarized file to read')
  18. # fmt: on
  19. return parser
  20. def main(args):
  21. dict = dictionary.Dictionary.load(args.dict)
  22. ds = IndexedDataset(args.input, fix_lua_indexing=True)
  23. for tensor_line in ds:
  24. print(dict.string(tensor_line))
  25. if __name__ == '__main__':
  26. parser = get_parser()
  27. args = parser.parse_args()
  28. main(args)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...