Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

prepare.py 1.1 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
  1. import io
  2. import sys
  3. import os
  4. import numpy as np
  5. import pandas as pd
  6. import pickle
  7. input = sys.argv[1] #data/data.csv
  8. output = os.path.join('data', 'prepared', 'data.pkl')
  9. def text_cleaning(input):
  10. df = pd.read_csv(input)
  11. # text cleaning
  12. df['Star color'] = df['Star color'].apply(lambda x: x.lower()) # lower case
  13. df['Star color'] = df['Star color'].apply(lambda x: x.strip()) # strip white spaces
  14. df['Star color'] = df['Star color'].str.replace('-', ' ') # remove '-'
  15. df['Star color'] = df['Star color'].replace({
  16. 'yellowish white': 'white yellow',
  17. 'yellow white': 'white yellow',
  18. 'yellowish': 'orange'
  19. }
  20. ) # replace string values
  21. return df
  22. os.makedirs(os.path.join('data', 'prepared'), exist_ok=True)
  23. with io.open(input, encoding='utf8') as fd_in:
  24. with io.open(output, 'wb') as fd_out:
  25. pickle.dump(text_cleaning(fd_in), fd_out)
  26. # python src/prepare.py data/data.csv
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...