Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

cm_dask.py 850 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  1. import dask
  2. import dask.dataframe as dd
  3. from dask.distributed import Client, LocalCluster
  4. from coef_matrix import classify
  5. cluster = LocalCluster(n_workers=20)
  6. cli = Client(cluster)
  7. raw = dd.read_csv('./hbpv10days.csv',
  8. usecols=[0, 1, 2, 3, 4, 6],
  9. names=['time', 'station', 'lev1', 'lev2', 'strno', 'current'],
  10. dtype={'lev1':str, 'lev2':str, 'strno':str},
  11. parse_dates = ['time']).dropna()
  12. raw = cli.persist(raw)
  13. inp = raw[(raw['current'] >= 0) & (raw['current'] < 10)]
  14. inp['day'] = inp['time'].map(lambda x: str(x.date()))
  15. inp['cid'] = inp.day + inp.station + inp.lev1 + inp.lev2
  16. inp2 = inp.set_index('cid')
  17. inp3 = inp2.drop(['station', 'lev1', 'lev2', 'day'], axis=1)
  18. df = inp3.groupby(inp3.index).apply(classify,
  19. meta={'coef': 'float64'})
  20. df.to_csv('res-*.csv')
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...