You have to be logged in to leave a comment.

Open Data Science Formats

TL;DR

Just use the logger, commit the files it outputs to git, and push to DAGsHub.

Overview

Managing project experiments relies heavily on the ability to track the parameters used and the metrics achieved in every experiments.

Ideally, parameters and metrics defined for a project should be similar throughout all the experiments, to make sure we can compare apples to apples.

To make this as accessible and understandable as possible, DAGsHub's experiment tracking capabilities are built on top of open, simple and readable formats.

We didn't invent some proprietary, complicated format, but rely on writing parameters and metrics to .yaml (or .json) and .csv files respectively. No more obscure formats!

Below are explanations about the formats we use, and a simple logger that doesn't do black magic, but simply writes files.

Parameter Formats

The parameters format is a simple key: value pair saved as a .yaml (or .json) file. The value can be a string, number or boolean value.

For DAGsHub to know it is looking at a parameter file, its name should end in the word param or params (i.e. the following names work: param.yaml, params.json, theBestParams.yaml, myparam.json).

Here is a simple example:

=== "params.yaml" yaml batch_size: 32 learning_rate: 0.02 max_nb_epochs: 2 === "params.json" json { "batch_size": 32, "learning_rate": 0.02, "max_nb_epochs": 2 }

Metric Formats

For metrics, the format is a .csv file with the header Name,Value,Timestamp,Step. This format enables you to save multiple metrics in one metric file, by modifying the Name column while writing to the file.

DAGsHub knows to plot the graph where needed, and to show you the last value for each metric as well. To let DAGsHub know that a file is a metric file, its name should end in metric.csv or metrics.csv (i.e the following names work: metric.csv, myMetrics.csv, theBestMetric.csv and so on).

Name can be any string or number.
Value can be any real number.
Timestamp is the UNIX epoch time in milliseconds.
Step represents the training step/iteration/batch index. Can be any positive integer. This will serve as the X axis in most metric charts.

Here is a simple example: === "metrics.csv" csv Name,Value,Timestamp,Step loss,2.29,1573118607163,1 epoch,0,1573118607163,1 loss,2.26,1573118607366,11 epoch,0,1573118607366,11 loss,1.44,1573118607572,21 epoch,0,1573118607572,21 loss,0.65,1573118607773,31 avg_val_loss,0.17,1573118812491,3375

Once you have defined proper metrics, their latest values will be shown in the graph view (when expanding a metric node) and in more detail in the experiment table and single experiment view.

![Screenshot](assets/metric_node_expanded.png)
_{An expanded metric node}

Support For DVC Metric Formats

DAGsHub also supports the regular DVC metric options, if you use those.

DVC metrics will be shown similarly to the csv format metrics, both in the graph view, and the experiment tracking view.

Logger Helper Library

Writing parameters and metrics to .yaml and .csv files is pretty simple.

Nevertheless, in some cases we'd like to have a shorthand command to use within our python scripts, or we use well known libraries that automatically log parameters and metrics for us. Many tools provide some logger to log these parameters and metrics in their proprietary format.

DAGsHub relies on an simple, generally readable and open logger for this purpose. It lets you manually log parameters and metrics, and wraps some common data science frameworks for auto-logging in just one line of code!

Pull requests for more features and support for more data science frameworks are very welcome!
We want the logger to be useful to as many people as possible, whether or not they use DAGsHub.

!!!info "Installing & Using the logger" To install the logger using pip:

```bash
$ pip install dagshub
```

### Manual logging usage:
```python
from dagshub import dagshub_logger, DAGsHubLogger

# As a context manager:
with dagshub_logger() as logger:
    # Metrics:
    logger.log_metrics(loss=3.14, step_num=1)
    # OR:
    logger.log_metrics({'val_loss': 6.28}, step_num=2)

    # Hyperparameters:
    logger.log_hyperparams(lr=1e-4)
    # OR:
    logger.log_hyperparams({'optimizer': 'sgd'})


# As a normal Python object:
logger = DAGsHubLogger()
logger.log_hyperparams(num_layers=32)
logger.log_metrics(batches_per_second=100, step_num=42)
# ...
logger.save()
logger.close()
```

### Supported frameworks for auto-logging:
<div style="display: flex; width: 100%; justify-content: space-evenly;">
    <div>
        <a href="https://github.com/DAGsHub/client/blob/master/dagshub/pytorch_lightning">
        <img width="70" src="../assets/lightning_logo.svg" />
        </a>
    </div>
    <div>
        <a href="https://github.com/DAGsHub/client/tree/master/dagshub/fastai">
        <img width="70" src="../assets/fastai.png" >
        </a>
    </div>
</div>

- [pytorch-lightning](https://github.com/DAGsHub/client/blob/master/dagshub/pytorch_lightning)
- [FastAI](https://github.com/DAGsHub/client/tree/master/dagshub/fastai)
- More - coming soon!

Other Parameter / Metric Formats

If the above options don't fit your use case, we'd love to hear about it and improve DAGsHub, or contribute it to the open logger.

Keep in touch - contact@DAGsHub.com

Tip!

Press p or to see the previous file or, n or to see the next file

open_data_science_formats.md 6.0 KB

History Raw

Open Data Science Formats

TL;DR

Overview

Parameter Formats

Metric Formats

Support For DVC Metric Formats

Logger Helper Library

Other Parameter / Metric Formats

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

DAGsHub-Official / dagshub-docs

open_data_science_formats.md 6.0 KB History Raw

Open Data Science Formats

TL;DR

Overview

Parameter Formats

Metric Formats

Support For DVC Metric Formats

Logger Helper Library

Other Parameter / Metric Formats

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

DAGsHub-Official
/
dagshub-docs

open_data_science_formats.md 6.0 KB

History Raw