Skip to content
Reader Mode

Found a problem?
Let us know (or fix it):

Edit this Page

Have a question?
Join our community now:

Discord Chat

Ready to build your own project? It's free

Sign Up

Git Tracking

Git is one of the cornerstone tools for managing data science projects, which lets us track, version, and reproduce code files easily. Therefore, DAGsHub supports Git and expands its capabilities to track experiments as well.

Using Git to track the experiment, we can also encapsulate the code, data, and model that produced the results. This way, even when the project evolves or grows in complexity, we can easily reproduce experimental results.

How Does It Work?

Creating a new experiment using Git tracking can be done in two ways:

Track Files With Specific Names

You can save the information of the experiment to open-source format files that end with params.yml / params.json for parameters and metrics.csv for metrics. Then, track them using Git and push them to the remote repository. DAGsHub will parse the files in the Git server for these specific names, and when finding a new or modified file, a new experiment will be generated with the information the file contains.

Track Files With DVC

When creating a pipeline with DVC, you will define output files as parameters and/or metrics. Then, you will use Git to track these files with the updated lock.dvc and dvc.yml files and push them to the remote repository. DAGsHub will parse the lock.dvc and dvc.yml files in the Git server and will look for the relevant parameters and metrics files. When finding a new or modified file, a new experiment will be generated with the information they contain.

What Should The Files Contain?

The files of the parameters and metrics should hold the data in the following way.

Parameter Schema

The parameters format is a simple key: value pair saved as a .yaml or .json file. The value can be a string, number or boolean value. Here is a simple example:

batch_size: 32
learning_rate: 0.02
max_nb_epochs: 2
{
  "batch_size": 32,
  "learning_rate": 0.02,
  "max_nb_epochs": 2
}

Metrics Schema

For metrics, the format is a .csv file with the following headers:

  • Name can be any string or number.
  • Value can be any real number.
  • Timestamp is the UNIX epoch time in milliseconds.
  • Step represents the training step/iteration/batch index. It can be any positive integer. This will serve as the X-axis in most metric charts.

This format enables you to save multiple metrics in one metric file by modifying the Name column while writing to the file. DAGsHub knows to plot the graph where needed and show you the last value for each metric. Here is a simple example:

Name,Value,Timestamp,Step
loss,2.29,1573118607163,1
epoch,0,1573118607163,1
loss,2.26,1573118607366,11
epoch,0,1573118607366,11
loss,1.44,1573118607572,21
epoch,0,1573118607572,21
loss,0.65,1573118607773,31
avg_val_loss,0.17,1573118812491,3375

How To Create A New Experiment?

To help you log the experiment information in the format stated above, DAGsHub created the open-source "dagshub logger". With this logger, you can log the parameters and metrics to file within the python scripts or by using auto-logging with libraries such as PyTorch Lightning and fast.ai.

Installing & Using the logger

Use pip to install the logger:

$ pip install dagshub
from dagshub import dagshub_logger, DAGsHubLogger

# Option 1 - As a context manager:
with dagshub_logger( metrics_path="logs/test_metrics.csv", hparams_path="logs/test_params.yml") as logger:

# Metric logging: 
logger.log_metrics(loss=3.14, step_num=1) 
# OR: 
logger.log_metrics({'loss': 3.14}, step_num=1) 
# Hyperparameters logging: 
logger.log_hyperparams(optimizer='sgd') 
# OR: 
logger.log_hyperparams({'optimizer': 'sgd'}) 

# Option 2 - As a normal Python object: 
logger = DAGsHubLogger(metrics_path="logs/test_metrics.csv", hparams_path="logs/test_params.yml") 
logger.log_hyperparams(optimizer='sgd') 
logger.log_metrics(loss=3.14, step_num=1) 
# ... 
logger.save() 
logger.close()
from dagshub.pytorch_lightning import DAGsHubLogger
from pytorch_lightning import Trainer

trainer = Trainer(
          logger=DAGsHubLogger(metrics_path="logs/test_metrics.csv", hparams_path="logs/test_params.yml"),
          default_save_path='lightning_logs',
                 )
from dagshub.fastai import DAGsHubLogger

# To log only during a single training phase
learn.fit(..., cbs=DAGsHubLogger(metrics_path="logs/test_metrics.csv",
                                hparams_path="logs/test_params.yml"))

Running the above script will generate two files: test_metrics.csv test_params.csv. You will use Git to track those files and push them to the remote repository.

$ git add logs/test_metrics.csv logs/test_params.csv
$ git commit -m "New experiment - learning rate 1e-4"
$ git push

The above action will generate a new experiment.

Git experiment
Git experiment

  • For more information about auto-logging:

How To Use Git Tracking In A Colab Environment?

We shared an example of experiment tracking with Git to DAGsHub in a Colab environment.

When To Use It?

Using Git to track the experiments enables you to reproduce their results easily. However, in cases that you don't want to reproduce the results, it can be a hassle. Therefore, we recommend using Git to track the experiments that produced meaningful results that you might want to reproduce in the future.