Skip to content
Reader Mode

Found a problem?
Let us know (or fix it):

Edit this Page

Have a question?
Join our community now:

Discord Chat

MLflow Tracking

Real-time logging provides valuable information and visibility while running a data science experiment. It lets users monitor the progress of the training process and take action if necessary. To enable DAGsHub users to log their experiments in real-time, DAGsHub provides an MLflow Tracking integration. This means that parameters and metrics can be displayed while the process is running, with the ability to monitor more than one experiment while being executed.

Feature Overview

MLflow Tracking is an open-source API for live logging of parameters, metrics, and metadata when running a machine learning code. To make MLflow Tracking output accessible outside your local machine, you’ll need to host it on a remote-tracking server. To connect all data science project components in one place, we automatically connect an MLflow server to your DAGsHub repository and integrate it seamlessly with the Experiment Tab.

How Does it work?

When you create a repository on DAGsHub, an MLflow server will be automatically created and connected to the repository. Your project's MLflow tracking server will be located at:

https://dagshub.com/<DAGsHub-user-name>/<repository-name>.mlflow

The server endpoint can also be found under the ‘Remote’ button:

MLflow Experiments

Note

Only a repository contributor can log experiments.

How to Use It?

Install and Import MLflow

  • You will start by installing the MLflow python package on your virtual environment using pip:

    pip install mlflow
    
  • Then, import MLflow to your python module using import mlflow.

Set the MLflow server URI

You can set the MLflow server URI by adding the following line to your code:

mlflow.set_tracking_uri(https://dagshub.com/<DAGsHub-user-name>/<repository-name>.mlflow)
Set the MLflow server URI using an environment variable

You can also define your MLflow server URI using the MLFLOW_TRACKING_URI environment variable.

We don't recommend this approach, since you might forget to reset the environment variable when switching between different projects. This might result in logging experiments to the wrong repository.

If you still prefer using the environment variable, we recommend setting it only for the current command, like the following:

MLFLOW_TRACKING_URI=https://dagshub.com/<username>/<repo>.mlflow python <file-name>.py

Set-up Credentials

The DAGsHub MLflow server has built-in access controls. Only a repository contributor can log experiments (someone who can git push to the repository).

  • In order to use basic authentication with MLflow, you need to set the following environment variables:

    • MLFLOW_TRACKING_USERNAME - Your DAGsHub username
    • MLFLOW_TRACKING_PASSWORD - Your DAGsHub password or preferably an access token
export MLFLOW_TRACKING_USERNAME=<username/token>
export MLFLOW_TRACKING_PASSWORD=<password>

Congratulations, you are ready to start logging experiments. Now, when you run your code, you will see new runs appear in the experiment tables, with their status and origin:

MLflow Experiments

MLflow Tracking Usage

This document does not cover the usage of MLflow tracking, but a tutorial will soon be available. In the meantime refer to the official MLflow docs. If you have any further questions about this feature or any other on DAGsHub, please visit our Discord channel.

Known Issues, Limitations & Restrictions

DAGsHub currently doesn't support artifacts, but we might soon. Please, contact us in our Discord channel if you find it important.