You have to be logged in to leave a comment.

Track Experiments

In the previous part of the Get Started section, we learned how to track and push files to DagsHub using Git and DVC. This part will cover how to track your Data Science Experiments and save their parameters and metrics. We assume you have a project that you want to add experiment tracking to. We will be showing an example based on the result of the last section, but you can adapt it to your project in a straightforward way.

!!! illustration "Video for this tutorial" Prefer to follow along with a video instead of reading? Check out the video for this section below:

<center>
<iframe width="400" height="225" src="https://www.youtube.com/embed/HlbtcoA9VX8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</center>

??? Example "Start From This Part" To start the project from this part, please follow the instructions below:

- Fork the [hello-world](https://dagshub.com/nirbarazida/hello-world){target=_blank} repository.
- Clone the repository and work on the start-track-experiments branch using the following command (change the user name):<br/>
```bash
git clone -b start-track-experiments https://dagshub.com/<DagsHub-user-name>/hello-world.git
```

- Create and activate a virtual environment.
- Install the python dependencies:
    ```bash
    pip3 install -r requirements.txt
    pip3 install dvc
    ```
- Configure DVC locally and set DagsHub storage as the remote.
- Download the files using following commands:
```bash
dvc get --rev processed-data https://dagshub.com/nirbarazida/hello-world-files data/
```
- Track the data directory using DVC and the `data.dvc` file using Git.
- Push the files to Git and DVC remotes.

!!! Important
    To avoid conflicts, **work on the start-track-experiments branch** for the rest of the tutorial.

Add DagsHub Logger

DagsHub logger is a plain Python Logger for your metrics and parameters. The logger saves the information as human-readable files – CSV for metrics files, and YAML for parameters. Once you push these files to your DagsHub repository, they will be automatically parsed and visualized in the Experiments Tab. For further information please see the Experiment Tab documentation and the DagsHub Logger repository{target=_blank}.

!!! Note Since DagsHub Experiments uses generic formats, you don't have to use DagsHub Logger. Instead, you can write your metrics and parameters into metrics.csv and params.yml files however you want, and push them to your DagsHub repository, where they will automatically be scanned and added to the experiment tab.

We will start by installing the 'dagshub' python package on the project's virtual environment:

=== "Mac, Linux, Windows" bash pip3 install dagshub

Next, we will import 'dagshub' to modeling.py module and track the Random Forest Classifier Hyperparameters and ROC AUC Score. You can copy the code below into your modeling.py folder:

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
import pandas as pd
from const import *
import dagshub

print(M_MOD_INIT,'\n'+M_MOD_LOAD_DATA)
X_train = pd.read_csv(X_TRAIN_PATH)
X_test = pd.read_csv(X_TEST_PATH)
y_train = pd.read_csv(Y_TRAIN_PATH)
y_test = pd.read_csv(Y_TEST_PATH)

print(M_MOD_RFC)
with dagshub.dagshub_logger() as logger:
    rfc = RandomForestClassifier(n_estimators=1, random_state=0)
    # log the model's parameters
    logger.log_hyperparams(model_class=type(rfc).__name__)
    logger.log_hyperparams({'model': rfc.get_params()})

    # Train the model
    rfc.fit(X_train, y_train.values.ravel())
    y_pred = rfc.predict(X_test)

    # log the model's performances
    logger.log_metrics({f'roc_auc_score':round(roc_auc_score(y_test, y_pred),3)})
    print(M_MOD_SCORE, round(roc_auc_score(y_test, y_pred),3))

??? checkpoint "Checkpoint"

Check that the current status of your Git tracking matches the following:

=== "Mac, Linux, Windows"
    ```bash
    git status -s
        M modeling.py
    ```

Track and commit the changes with Git

=== "Mac, Linux, Windows" bash git add src/modeling.py git commit -m "Add DagsHub Logger to the modeling module"

Create New Experiment

As mentioned above, to create a new experiment, we need to update at least one of the two metrics.csv ,params.yml files, track them using Git, and push them to the DagsHub repository. After editing the modeling.py module, once we run its script it will generate those two files.

Run the modeling.py script

=== "Mac, Linux, Windows" bash python3 src/modeling.py [DEBUG] Initialize Modeling [DEBUG] Loading data sets for modeling [DEBUG] Running Random Forest Classifier [INFO] Finished modeling with AUC Score: 0.931 git status -s ?? metrics.csv ?? params.yml
As we can see from the output above, two new files were created containing the current experiment's information.

??? Info "The Files Content"

The `metrics.csv` file has four fields:

- <u>Name</u> - the name of the Metric.
- <u>Value</u> - the value of the Metric.
- <u>Timestamp</u> - the time that the log was written.
- <u>Step</u> - the step number when logging multi-step metrics like loss. 

The `params.yml` file holds all the hyperparameters of the Random Forest Classifier

**Example of the files content:**

=== "Mac, Linux"
    ```bash
    cat metrics.csv
        Name,Value,Timestamp,Step
        "roc_auc_score",0.931,1615794229099,1
    cat params.yml
        model:
          bootstrap: true
          ccp_alpha: 0.0
          class_weight: null
          criterion: gini
          max_depth: null
          max_features: auto
          max_leaf_nodes: null
          max_samples: null
          min_impurity_decrease: 0.0
          min_impurity_split: null
          min_samples_leaf: 1
          min_samples_split: 2
          min_weight_fraction_leaf: 0.0
          n_estimators: 1
          n_jobs: null
          oob_score: false
          random_state: 0
          verbose: 0
          warm_start: false
        model_class: RandomForestClassifier
    ```

=== "Windows"
    ```bash
    type metrics.csv
        Name,Value,Timestamp,Step
        "roc_auc_score",0.931,1615794229099,1
    type params.yml
        model:
          bootstrap: true
          ccp_alpha: 0.0
          class_weight: null
          criterion: gini
          max_depth: null
          max_features: auto
          max_leaf_nodes: null
          max_samples: null
          min_impurity_decrease: 0.0
          min_impurity_split: null
          min_samples_leaf: 1
          min_samples_split: 2
          min_weight_fraction_leaf: 0.0
          n_estimators: 1
          n_jobs: null
          oob_score: false
          random_state: 0
          verbose: 0
          warm_start: false
        model_class: RandomForestClassifier
    ```

Commit and push the files to our DagsHub repository using Git

=== "Mac, Linux, Windows" bash git add metrics.csv params.yml git commit -m "New Experiment - Random Forest Classifier with basic processing" git push
Let's check the new status of our repository The two files were added to the repository and one experiment was created.
The information about the experiment is displayed under the Experiment Tab. Congratulations - You created your first Experiment!

This part covers the Experiment Tracking workflow. We highly recommend reading the experiment tab documentation to explore the various features that it has to offer. In the next part, we will learn how to explore a new hypothesis and switch between project versions

Tip!

Press p or to see the previous file or, n or to see the next file

track_experiments.md 8.5 KB

Permalink History Raw

Track Experiments

Add DagsHub Logger

Create New Experiment

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

smamindla57 / dagshub-docs forked from DAGsHub-Official/dagshub-docs

track_experiments.md 8.5 KB Permalink History Raw

Track Experiments

Add DagsHub Logger

Create New Experiment

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

smamindla57
/
dagshub-docs
forked from DAGsHub-Official/dagshub-docs

track_experiments.md 8.5 KB

Permalink History Raw