Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

processing.py 1.2 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
  1. # Figure out way to get rid of sklearn deprecation warning
  2. import numpy as np
  3. import pandas as pd
  4. import time
  5. import json
  6. import pickle
  7. import sklearn as sk
  8. def processing():
  9. """Module for data preprocessing and feature engineering for project
  10. predicting source star from radio astronomy data"""
  11. print("Reading in raw data...")
  12. s = pd.read_csv("./data/raw/stars.csv", low_memory=False)
  13. start_time = time.time()
  14. print("Removing unnecessary columns...")
  15. s = s[['DriftRate', 'Freq', 'SNR', 'Source']]
  16. print("Renaming columns to be friendlier...")
  17. s = s.rename(columns={"DriftRate": "Drift_rate(Hz/sec)", "Freq": "Frequency(MHz)",
  18. "SNR": "Signal_to_noise_ratio"})
  19. print("Saving processed data...")
  20. s.to_csv("./data/processed/stars_proc.csv", )
  21. end_time = time.time()
  22. # Save processing time as a metric
  23. print("Saving processing time...\n")
  24. with open("./metrics/proc_time.json",'w') as f:
  25. json.dump({'Processing_time': end_time-start_time}, f)
  26. # Print processing time to terminal
  27. print("Processing time: " + str(end_time-start_time) + " seconds\n")
  28. print("Data featurization completed.\n")
  29. if __name__ == '__main__':
  30. processing()
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...