Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

split_data.py 1010 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  1. import os
  2. import argparse
  3. import pandas as pd
  4. from sklearn.model_selection import train_test_split
  5. from get_data import read_params
  6. def split_and_saved_data(config_path):
  7. config = read_params(config_path)
  8. test_data_path = config["split_data"]["test_path"]
  9. train_data_path = config["split_data"]["train_path"]
  10. raw_data_path = config["load_data"]["raw_dataset_csv"]
  11. split_ratio = config["split_data"]["test_size"]
  12. random_state = config["base"]["random_state"]
  13. df = pd.read_csv(raw_data_path, sep=",")
  14. train, test = train_test_split(
  15. df,
  16. test_size=split_ratio,
  17. random_state=random_state
  18. )
  19. train.to_csv(train_data_path, sep=",", index=False, encoding="utf-8")
  20. test.to_csv(test_data_path, sep=",", index=False, encoding="utf-8")
  21. if __name__=="__main__":
  22. args = argparse.ArgumentParser()
  23. args.add_argument("--config", default="params.yaml")
  24. parsed_args = args.parse_args()
  25. split_and_saved_data(config_path=parsed_args.config)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...