Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

methods.rst 2.1 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
  1. Methods
  2. =======
  3. Two branches of statistical learning tools are widely used today:
  4. Unsupervised Learning:
  5. An unsupervised learning method takes inbound unlabeled data and extracts or discovers classificiations and labels in the data.
  6. Supervised Learning
  7. In a supervised learning context, inbound data are labeled and an analyst builds features that they likely believe will hold predictive power. They might use an understanding of the physics or mechanisms of a system to decide on those features.
  8. This study explored the use of two supervised learning methods:
  9. Logistic Regression:
  10. A binary classification algorithm which assigns a probability that some set of features could be labeled in a certain way.
  11. .. toctree::
  12. :maxdepth: 1
  13. :caption: notebooks
  14. notebooks/1-ttu-logistic-weather
  15. Statistical Learning Process
  16. ----------------------------
  17. 1. A class of variables that we want to predict and use to predict are labeled in an existing dataset. The predictive variables are called "estimators"
  18. 2. The data are partitioned into two parts - a "training" set which is used to develop the model, and a "test" set which is used to validate the model.
  19. 3. A statistical model is "fit" to the training data, providing a statistical function which could take in new observations and predict the originals labels.
  20. 4. The model is used to predict the "test" dataset, and the true label values are compared against the predicted label values.
  21. 5. Performance metrics of the model are calculated against the test data.
  22. 6. If the model is robust, it could in principle be deployed (either in a programmatic or manual environment) to ingest a future data stream where events are not known a-priori.
  23. The choice of a statistical model
  24. Tooling
  25. -------
  26. The process above was implemented in scikit-learn :cite:`scikit-learn`, a popular machine learning library in the python ecosystem. Other tools of note include:
  27. `pycaret <https://pycaret.org/>`_
  28. PyCaret is a machine learning framework for quickly producing un-optimized a
  29. .. toctree::
  30. :maxdepth: 1
  31. :caption: notebooks
  32. notebooks/3-ttu-train_models
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...