Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

best_model.py 1.5 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
  1. from statsmodels.tools import eval_measures
  2. import numpy as np
  3. import pandas as pd
  4. import statsmodels.api as sm
  5. import statsmodels.formula.api as smf
  6. def get_best_model(train, test):
  7. # Step 1: specify the form of the model
  8. model_formula = "total_cases ~ 1 + " \
  9. "reanalysis_specific_humidity_g_per_kg + " \
  10. "reanalysis_dew_point_temp_k + " \
  11. "station_min_temp_c + " \
  12. "station_avg_temp_c"
  13. grid = 10 ** np.arange(-8, -3, dtype=np.float64)
  14. best_alpha = []
  15. best_score = 1000
  16. # Step 2: Find the best hyper parameter, alpha
  17. for alpha in grid:
  18. model = smf.glm(formula=model_formula,
  19. data=train,
  20. family=sm.families.NegativeBinomial(alpha=alpha))
  21. results = model.fit()
  22. predictions = results.predict(test).astype(int)
  23. score = eval_measures.meanabs(predictions, test.total_cases)
  24. if score < best_score:
  25. best_alpha = alpha
  26. best_score = score
  27. print('best alpha = ', best_alpha)
  28. print('best score = ', best_score)
  29. # Step 3: refit on entire dataset
  30. full_dataset = pd.concat([train, test])
  31. model = smf.glm(formula=model_formula,
  32. data=full_dataset,
  33. family=sm.families.NegativeBinomial(alpha=best_alpha))
  34. fitted_model = model.fit()
  35. return [fitted_model, best_alpha, best_score]
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...