Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

tasks.py 1.0 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
  1. from pathlib import Path
  2. import subprocess as sp
  3. import os
  4. from invoke import task
  5. data_dir = Path('data')
  6. tgt_dir = Path('target')
  7. bin_dir = tgt_dir / 'release'
  8. @task
  9. def build(c, debug=False):
  10. "Compile the Rust support executables"
  11. global bin_dir
  12. if debug:
  13. c.run('cargo build')
  14. bin_dir = tgt_dir / 'debug'
  15. else:
  16. c.run('cargo build --release')
  17. @task(build)
  18. def convert_viaf(c, date='20181104', progress=True):
  19. infile = data_dir / f'viaf-{date}-clusters-marc21.xml.gz'
  20. outfile = data_dir / f'viaf-{date}-clusters.psql.gz'
  21. pv = sp.Popen(['pv', infile], stdin=sp.DEVNULL, stdout=sp.PIPE)
  22. gz = sp.Popen(['gunzip'], stdin=pv.stdout, stdout=sp.PIPE)
  23. parse = sp.Popen([bin_dir / 'parse-marc'], stdin=gz.stdout, stdout=sp.PIPE)
  24. fno = os.open(outfile, os.O_WRONLY | os.O_CREAT | os.O_TRUNC)
  25. gzout = sp.Popen(['gzip'], stdin=parse.stdout, stdout=fno)
  26. gzout.wait()
  27. if __name__ == '__main__':
  28. import invoke.program
  29. program = invoke.program.Program()
  30. program.run()
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...