Are you sure you want to delete this access key?
In the previous part of the Get Started section, we learned how to track and push files to DagsHub using Git and DVC. This part will cover how to track your Data Science Experiments and save their parameters and metrics. We assume you have a project that you want to add experiment tracking to. We will be showing an example based on the result of the last section, but you can adapt it to your project in a straightforward way.
!!! illustration "Video for this tutorial" Prefer to follow along with a video instead of reading? Check out the video for this section below:
<center>
<iframe width="400" height="225" src="https://www.youtube.com/embed/HlbtcoA9VX8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</center>
??? Example "Start From This Part" To start the project from this part, please follow the instructions below:
- Fork the [hello-world](https://dagshub.com/nirbarazida/hello-world){target=_blank} repository.
- Clone the repository and work on the start-track-experiments branch using the following command (change the user name):<br/>
```bash
git clone -b start-track-experiments https://dagshub.com/<DagsHub-user-name>/hello-world.git
```
- Create and activate a virtual environment.
- Install the python dependencies:
```bash
pip3 install -r requirements.txt
pip3 install dvc
```
- Configure DVC locally and set DagsHub storage as the remote.
- Download the files using following commands:
```bash
dvc get --rev processed-data https://dagshub.com/nirbarazida/hello-world-files data/
```
- Track the data directory using DVC and the `data.dvc` file using Git.
- Push the files to Git and DVC remotes.
!!! Important
To avoid conflicts, **work on the start-track-experiments branch** for the rest of the tutorial.
DagsHub logger is a plain Python Logger for your metrics and parameters. The logger saves the information as human-readable files – CSV for metrics files, and YAML for parameters. Once you push these files to your DagsHub repository, they will be automatically parsed and visualized in the Experiments Tab. For further information please see the Experiment Tab documentation and the DagsHub Logger repository{target=_blank}.
!!! Note
Since DagsHub Experiments uses generic formats, you don't have to use DagsHub Logger. Instead, you can write your metrics and parameters into metrics.csv
and params.yml
files however you want, and push them to your DagsHub repository, where they will automatically be scanned and added to the experiment tab.
We will start by installing the 'dagshub' python package on the project's virtual environment:
=== "Mac, Linux, Windows"
bash pip3 install dagshub
Next, we will import 'dagshub' to modeling.py
module and track the Random Forest Classifier
Hyperparameters and ROC AUC Score. You can copy the code below into your modeling.py
folder:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
import pandas as pd
from const import *
import dagshub
print(M_MOD_INIT,'\n'+M_MOD_LOAD_DATA)
X_train = pd.read_csv(X_TRAIN_PATH)
X_test = pd.read_csv(X_TEST_PATH)
y_train = pd.read_csv(Y_TRAIN_PATH)
y_test = pd.read_csv(Y_TEST_PATH)
print(M_MOD_RFC)
with dagshub.dagshub_logger() as logger:
rfc = RandomForestClassifier(n_estimators=1, random_state=0)
# log the model's parameters
logger.log_hyperparams(model_class=type(rfc).__name__)
logger.log_hyperparams({'model': rfc.get_params()})
# Train the model
rfc.fit(X_train, y_train.values.ravel())
y_pred = rfc.predict(X_test)
# log the model's performances
logger.log_metrics({f'roc_auc_score':round(roc_auc_score(y_test, y_pred),3)})
print(M_MOD_SCORE, round(roc_auc_score(y_test, y_pred),3))
??? checkpoint "Checkpoint"
Check that the current status of your Git tracking matches the following:
=== "Mac, Linux, Windows"
```bash
git status -s
M modeling.py
```
Track and commit the changes with Git
=== "Mac, Linux, Windows"
bash git add src/modeling.py git commit -m "Add DagsHub Logger to the modeling module"
As mentioned above, to create a new experiment, we need to update at least one of the two metrics.csv
,params.yml
files, track them using Git, and push them to the DagsHub repository. After editing the modeling.py
module, once we
run its script it will generate those two files.
Run the modeling.py
script
=== "Mac, Linux, Windows"
bash python3 src/modeling.py [DEBUG] Initialize Modeling [DEBUG] Loading data sets for modeling [DEBUG] Running Random Forest Classifier [INFO] Finished modeling with AUC Score: 0.931 git status -s ?? metrics.csv ?? params.yml
As we can see from the output above, two new files were created containing the current experiment's information.
??? Info "The Files Content"
The `metrics.csv` file has four fields:
- <u>Name</u> - the name of the Metric.
- <u>Value</u> - the value of the Metric.
- <u>Timestamp</u> - the time that the log was written.
- <u>Step</u> - the step number when logging multi-step metrics like loss.
The `params.yml` file holds all the hyperparameters of the Random Forest Classifier
**Example of the files content:**
=== "Mac, Linux"
```bash
cat metrics.csv
Name,Value,Timestamp,Step
"roc_auc_score",0.931,1615794229099,1
cat params.yml
model:
bootstrap: true
ccp_alpha: 0.0
class_weight: null
criterion: gini
max_depth: null
max_features: auto
max_leaf_nodes: null
max_samples: null
min_impurity_decrease: 0.0
min_impurity_split: null
min_samples_leaf: 1
min_samples_split: 2
min_weight_fraction_leaf: 0.0
n_estimators: 1
n_jobs: null
oob_score: false
random_state: 0
verbose: 0
warm_start: false
model_class: RandomForestClassifier
```
=== "Windows"
```bash
type metrics.csv
Name,Value,Timestamp,Step
"roc_auc_score",0.931,1615794229099,1
type params.yml
model:
bootstrap: true
ccp_alpha: 0.0
class_weight: null
criterion: gini
max_depth: null
max_features: auto
max_leaf_nodes: null
max_samples: null
min_impurity_decrease: 0.0
min_impurity_split: null
min_samples_leaf: 1
min_samples_split: 2
min_weight_fraction_leaf: 0.0
n_estimators: 1
n_jobs: null
oob_score: false
random_state: 0
verbose: 0
warm_start: false
model_class: RandomForestClassifier
```
Commit and push the files to our DagsHub repository using Git
=== "Mac, Linux, Windows"
bash git add metrics.csv params.yml git commit -m "New Experiment - Random Forest Classifier with basic processing" git push
Let's check the new status of our repository
The two files were added to the repository and one experiment was created.
The information about the experiment is displayed under the Experiment Tab.
Congratulations - You created your first Experiment!
This part covers the Experiment Tracking workflow. We highly recommend reading the experiment tab documentation to explore the various features that it has to offer. In the next part, we will learn how to explore a new hypothesis and switch between project versions
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?