Skip to content

Open Data Science Formats

Managing project experiments relies heavily on the ability to track the parameters used and the metrics achieved in every experiments.

Ideally, parameters and metrics defined for a project should be similar throughout all the experiments, to make sure we can compare apples to apples.

To make this as accessible and understandable as possible, DAGsHub's experiment tracking capabilities are built on top of open, simple and readable formats.

We didn't invent some proprietary, complicated format, but rely on writing parameters and metrics to .yaml (or .json) and .csv files respectively. No more obscure formats!

Below are explanations about the formats we use, and a simple logger that doesn't do black magic, but simply writes files.

Parameter Formats

The parameters format is a simple key: value pair saved as a .yaml (or .json) file. The value can be a string, number or boolean value.

The only things necessary for DAGsHub to know it is looking at a parameter file are:

  • It should end in the word param or params (i.e. the following names work: param.yaml, params.json, theBestParams.yaml, myparam.json).
  • It should be part of a DVC pipeline, either as a dependency or an output.

Here is a simple example:

1
2
3
4
batch_size: 32
gpus: null
learning_rate: 0.02
max_nb_epochs: 2
1
2
3
4
5
6
{
  "batch_size": 32,
  "gpus": null,
  "learning_rate": 0.02,
  "max_nb_epochs": 2
}

Metric Formats

For metrics, the format is a .csv file with the header Name,Value,Timestamp,Step. This format enables you to save multiple metrics in one metric file, by modifying the Name column while writing to the file.

DAGsHub knows to plot the graph where needed, and to show you the last value for each metric as well. In this case, the only necessary requirements to let DAGsHub know that a file is a metric file are:

  • That it ends in metric.csv or metrics.csv (i.e the following names work: metric.csv, myMetrics.csv, theBestMetric.csv and so on).
  • That it's a metric output in the DVC pipeline.

    • Note: just marking the output as a metric is enough, no type or xpath definition is required.
  • Name can be any string or number.

  • Value can be any real number.
  • Timestamp is the UNIX epoch time in milliseconds.
  • Step represents the training step/iteration/batch index. Can be any positive integer. This will serve as the X axis in most metric charts.

Here is a simple example:

1
2
3
4
5
6
7
8
9
Name,Value,Timestamp,Step
loss,2.29,1573118607163,1
epoch,0,1573118607163,1
loss,2.26,1573118607366,11
epoch,0,1573118607366,11
loss,1.44,1573118607572,21
epoch,0,1573118607572,21
loss,0.65,1573118607773,31
avg_val_loss,0.17,1573118812491,3375

Once you have defined proper metrics, their latest values will be shown in the graph view (when expanding a metric node) and in more detail in the experiment table and single experiment view.

Screenshot
An expanded metric node

Support For DVC Metric Formats

DAGsHub also supports the regular DVC metric options, if you use those.

DVC metrics will be shown similarly to the csv format metrics, both in the graph view, and the experiment tracking view.

DVC metrics effectively let you save only one metric per file, and don't support value changes over time, which may be critical in many cases.

Logger Helper Library

Writing parameters and metrics to .yaml and .csv files is pretty simple.

Nevertheless, in some cases we'd like to have a shorthand command to use within our python scripts, or we use well known libraries that automatically log parameters and metrics for us. Many tools provide some logger to log these parameters and metrics in their proprietary format.

DAGsHub relies on an simple, generally readable and open logger for this purpose. It lets you manually log parameters and metrics, and wraps some common data science frameworks for auto-logging in just one line of code!

Pull requests for more features and support for more data science frameworks are very welcome!
We want the logger to be useful to as many people as possible, whether or not they use DAGsHub.

Installing & Using the Open Logger

To install the Open Logger using pip:

1
$ pip install dagshub

Manual logging usage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from dagshub import dagshub_logger, DAGsHubLogger

# As a context manager:
with dagshub_logger() as logger:
    # Metrics:
    logger.log_metrics(loss=3.14, step_num=1)
    # OR:
    logger.log_metrics({'val_loss': 6.28}, step_num=2)

    # Hyperparameters:
    logger.log_hyperparams(lr=1e-4)
    # OR:
    logger.log_hyperparams({'optimizer': 'sgd'})


# As a normal Python object:
logger = DAGsHubLogger()
logger.log_hyperparams(num_layers=32)
logger.log_metrics(batches_per_second=100, step_num=42)
# ...
logger.save()
logger.close()

Supported frameworks for auto-logging:

Other Parameter / Metric Formats

If the above options don't fit your use case, we'd love to hear about it and improve DAGsHub, or contribute it to the open logger.

Keep in touch - contact@DAGsHub.com