Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

test_preprocess.py 880 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  1. import numpy as np
  2. import pandas as pd
  3. test_df = pd.read_csv('../readonly/final_project_data/test.csv.gz')
  4. train_df = pd.read_csv('../readonly/final_project_data/training_data.csv')
  5. item_df = pd.read_csv('../readonly/final_project_data/items.csv')
  6. date_block_ser = pd.Series(34, index=range(test_df.shape[0]))
  7. test_df["date_block_num"] = date_block_ser
  8. test_df["cat_id"] = test_df["item_id"].map(pd.Series(item_df.loc[:, "item_category_id"].values, index = item_df.loc[:, "item_id"].values))
  9. train_gb = train_df.groupby(["shop_id","item_id"])
  10. train_agg = train_gb.agg({
  11. 'item_price' : 'last'
  12. })
  13. price_df = train_agg.reset_index()
  14. test_df = pd.merge(test_df, price_df, how='left', on=["item_id", "shop_id"])
  15. replace_val = test_df.item_price.mean()
  16. test_df.item_price.fillna(replace_val, inplace=True)
  17. test_df.to_csv('../readonly/features/testing_data.csv', index=False)
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...