Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

feature_engineering.py 1.1 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
  1. import os
  2. import pandas as pd
  3. import plac
  4. from sklearn.decomposition import PCA
  5. @plac.annotations(
  6. data_path=("Path to source data", "option", "i", str),
  7. out_path=("Path to save featurized data", "option", "o", str)
  8. )
  9. def main(data_path='data/split/', out_path='data/features/'):
  10. train = pd.read_csv(f'{data_path}train.csv')
  11. test = pd.read_csv(f'{data_path}test.csv')
  12. source_features = train.drop(columns=['class'])
  13. pca = PCA(n_components=2).fit(source_features)
  14. train_feature = pd.DataFrame(pca.transform(source_features))
  15. test_feature = pd.DataFrame(pca.transform(test.drop(columns=['class'])))
  16. train_feature['class'] = train['class']
  17. test_feature['class'] = test['class']
  18. if not os.path.isdir(out_path):
  19. os.mkdir(out_path)
  20. train_feature.to_csv(f'{out_path}train.csv', index=False)
  21. test_feature.to_csv(f'{out_path}test.csv', index=False)
  22. print(f'Finished Feature Engineering:\nStats:')
  23. print(f'\tExplained Variance: {pca.explained_variance_}')
  24. print(f'\tExplained Variance Ratio: {pca.explained_variance_ratio_}')
  25. if __name__ == '__main__':
  26. plac.call(main)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...