Are you sure you want to delete this access key?
DagsHub Tracking unlocks fully reproducible experiments using Git. Git is one of the cornerstone tools for managing data science projects, which lets us track, version, and reproduce code files easily. Therefore, DagsHub supports Git and expands its capabilities to track experiments as well. Using Git to track the experiment, we can also encapsulate the code, data, and model that produced the results. This way, even when the project evolves or grows in complexity, we can easily reproduce experimental results.
Creating a new experiment using Git tracking can be done in two ways:
You can save the information of the experiment to open-source format files that end with params.yml
/ params.json
for
parameters and metrics.csv
for metrics. Then, track them using Git and push them to the remote repository. DagsHub
will parse the files in the Git server for these specific names, and when finding a new or modified file, a new
experiment will be generated with the information the file contains.
When creating a pipeline with DVC, you will define output files as
parameters and/or
metrics. Then, you will use Git to track these files with
the updated lock.dvc and dvc.yml files and push them to the remote repository. DagsHub will parse the lock.dvc
and
dvc.yml
files in the Git server and will look for the relevant parameters and metrics files. When finding a new or
modified file, a new experiment will be generated with the information they contain.
The files of the parameters and metrics should hold the data in the following way.
The parameters format is a simple key: value pair saved as a .yaml or .json file. The value can be a string, number or boolean value. Here is a simple example:
=== "params.yaml"
yaml batch_size: 32 learning_rate: 0.02 max_nb_epochs: 2
=== "params.json"
json { "batch_size": 32, "learning_rate": 0.02, "max_nb_epochs": 2 }
For metrics, the format is a .csv file with the following headers:
Name
can be any string or number.Value
can be any real number.Timestamp
is the UNIX epoch time in milliseconds.Step
represents the training step/iteration/batch index. It can be any positive integer. This will serve as the
X-axis in most metric charts.This format enables you to save multiple metrics in one metric file by modifying the Name column while writing to the file. DagsHub knows to plot the graph where needed and show you the last value for each metric. Here is a simple example:
=== "metrics.csv"
csv Name,Value,Timestamp,Step loss,2.29,1573118607163,1 epoch,0,1573118607163,1 loss,2.26,1573118607366,11 epoch,0,1573118607366,11 loss,1.44,1573118607572,21 epoch,0,1573118607572,21 loss,0.65,1573118607773,31 avg_val_loss,0.17,1573118812491,3375
To help you log the experiment information in the format stated above, DagsHub created the open-source "dagshub logger". With this logger, you can log the parameters and metrics to file within the python scripts or by using auto-logging with libraries such as PyTorch Lightning and fast.ai.
!!!info "Installing & Using the logger" Use pip to install the logger:
```bash
$ pip3 install dagshub
```
=== "Manual Logging" ```python from dagshub import dagshub_logger, DAGsHubLogger
# Option 1 - As a context manager:
with dagshub_logger( metrics_path="logs/test_metrics.csv", hparams_path="logs/test_params.yml") as logger:
# Metric logging:
logger.log_metrics(loss=3.14, step_num=1)
# OR:
logger.log_metrics({'loss': 3.14}, step_num=1)
# Hyperparameters logging:
logger.log_hyperparams(optimizer='sgd')
# OR:
logger.log_hyperparams({'optimizer': 'sgd'})
# Option 2 - As a normal Python object:
logger = DAGsHubLogger(metrics_path="logs/test_metrics.csv", hparams_path="logs/test_params.yml")
logger.log_hyperparams(optimizer='sgd')
logger.log_metrics(loss=3.14, step_num=1)
# ...
logger.save()
logger.close()
```
=== "Auto-logging: PyTorch Lightning" ```python
from dagshub.pytorch_lightning import DAGsHubLogger
from pytorch_lightning import Trainer
trainer = Trainer(
logger=DAGsHubLogger(metrics_path="logs/test_metrics.csv", hparams_path="logs/test_params.yml"),
default_save_path='lightning_logs',
)
```
=== "Auto-logging: fast.ai" ```python
from dagshub.fastai import DAGsHubLogger
# To log only during a single training phase
learn.fit(..., cbs=DAGsHubLogger(metrics_path="logs/test_metrics.csv",
hparams_path="logs/test_params.yml"))
```
Running the above script will generate two files: test_metrics.csv
test_params.csv
. You will use Git to track those
files and push them to the remote repository.
$ git add logs/test_metrics.csv logs/test_params.csv
$ git commit -m "New experiment - learning rate 1e-4"
$ git push
The above action will generate a new experiment.
 Git experimentWe shared an example{target=_blank} of experiment tracking with Git to DagsHub in a Colab environment.
Using DagsHub Tracking to log experiments enables you to reproduce their results easily. However, in cases that you don't want to reproduce the results, it can be a hassle. Therefore, we recommend using DagsHub Tracking to track the experiments that produced meaningful results that you might want to reproduce in the future.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?