Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

train_preprocess.py 860 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
  1. import pandas as pd
  2. import numpy as np
  3. train_df = pd.read_csv('../readonly/final_project_data/sales_train.csv.gz')
  4. test_df = pd.read_csv('../readonly/final_project_data/test.csv.gz')
  5. item_df = pd.read_csv('../readonly/final_project_data/items.csv')
  6. trainf_gb = train_df.groupby(["date_block_num","shop_id", "item_id"])
  7. trainf_agg = trainf_gb.agg({
  8. 'item_price' : 'mean',
  9. 'item_cnt_day' : 'sum'
  10. })
  11. train_data = trainf_agg.reset_index()
  12. train_data["cat_id"] = train_data["item_id"].map(pd.Series(item_df.loc[:, "item_category_id"].values, index = item_df.loc[:, "item_id"].values))
  13. train_data.rename(columns={'item_cnt_day' : 'item_cnt_block'}, inplace=True)
  14. train_data = train_data.loc[:, ["date_block_num", "shop_id", "item_id", "cat_id", "item_price", "item_cnt_block"]]
  15. train_data.to_csv('../readonly/features/training_data.csv', index=False)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...