Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

gandlf_splitCSV 1.9 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
  1. #!usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. import os, argparse, sys, yaml
  4. from GANDLF.cli import copyrightMessage, split_data_and_save_csvs
  5. def main():
  6. parser = argparse.ArgumentParser(
  7. prog="GANDLF_SplitCSV",
  8. formatter_class=argparse.RawTextHelpFormatter,
  9. description="Split the data into training, validation, and testing sets and save them as csvs in the output directory.\n\n"
  10. + copyrightMessage,
  11. )
  12. parser.add_argument(
  13. "-i",
  14. "--inputCSV",
  15. metavar="",
  16. default=None,
  17. type=str,
  18. required=True,
  19. help="Input CSV file which contains the data to be split.",
  20. )
  21. parser.add_argument(
  22. "-c",
  23. "--config",
  24. metavar="",
  25. default=None,
  26. required=True,
  27. type=str,
  28. help="The GaNDLF config (in YAML) with the `nested_training` key specified to the folds needed.",
  29. )
  30. parser.add_argument(
  31. "-o",
  32. "--outputDir",
  33. metavar="",
  34. default=None,
  35. type=str,
  36. required=True,
  37. help="Output directory to save the split data.",
  38. )
  39. args = parser.parse_args()
  40. # check for required parameters - this is needed here to keep the cli clean
  41. for param_none_check in [args.inputCSV, args.outputDir, args.config]:
  42. if param_none_check is None:
  43. sys.exit("ERROR: Missing required parameter:", param_none_check)
  44. inputCSV = os.path.normpath(args.inputCSV)
  45. outputDir = os.path.normpath(args.outputDir)
  46. # initialize default
  47. config = {"nested_training": {"testing": 5, "validation": 5}}
  48. if os.path.isfile(args.config):
  49. config = yaml.safe_load(open(args.config, "r"))
  50. print("Config used for split:", config)
  51. split_data_and_save_csvs(inputCSV, outputDir, config)
  52. print("Finished successfully.")
  53. # main function
  54. if __name__ == "__main__":
  55. main()
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...