Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

make_dataset.py 776 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  1. import yaml
  2. from datasets import load_dataset
  3. import pandas as pd
  4. import os
  5. import pprint
  6. def make_dataset(dataset="cnn_dailymail", split="train"):
  7. """make dataset for summarisation"""
  8. if not os.path.exists("data/raw"):
  9. os.makedirs("data/raw")
  10. dataset = load_dataset(dataset, "3.0.0", split=split)
  11. df = pd.DataFrame()
  12. df["article"] = dataset["article"]
  13. df["highlights"] = dataset["highlights"]
  14. df.to_csv("data/raw/{}.csv".format(split))
  15. if __name__ == "__main__":
  16. with open("data_params.yml") as f:
  17. params = yaml.safe_load(f)
  18. pprint.pprint(params)
  19. make_dataset(dataset=params["data"], split="train")
  20. make_dataset(dataset=params["data"], split="test")
  21. make_dataset(dataset=params["data"], split="validation")
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...