Real-time logging provides valuable information and visibility while running a data science experiment. It lets users monitor the progress of the training process and take action if necessary. To enable DAGsHub users to log their experiments in real-time, DAGsHub provides an MLflow Tracking integration. This means that parameters and metrics can be displayed while the process is running, with the ability to monitor more than one experiment while being executed.
MLflow Tracking is an open-source API for live logging of parameters, metrics, and metadata when running a machine learning code. To make MLflow Tracking output accessible outside your local machine, you’ll need to host it on a remote-tracking server. To connect all data science project components in one place, we automatically connect an MLflow server to your DAGsHub repository and integrate it seamlessly with the Experiment Tab.
How Does it work?¶
When you create a repository on DAGsHub, an MLflow server will be automatically created and connected to the repository. Your project's MLflow tracking server will be located at:
The server endpoint can also be found under the ‘Remote’ button:
Only a repository contributor can log experiments.
How to Use It?¶
Install and Import MLflow¶
You will start by installing the MLflow python package on your virtual environment using pip:
pip install mlflow
Then, import MLflow to your python module using
Set the MLflow server URI¶
You can set the MLflow server URI by adding the following line to your code:
Set the MLflow server URI using an environment variable
You can also define your MLflow server URI using the
MLFLOW_TRACKING_URI environment variable.
We don't recommend this approach, since you might forget to reset the environment variable when switching between different projects. This might result in logging experiments to the wrong repository.
If you still prefer using the environment variable, we recommend setting it only for the current command, like the following:
MLFLOW_TRACKING_URI=https://dagshub.com/<username>/<repo>.mlflow python <file-name>.py
The DAGsHub MLflow server has built-in access controls. Only a repository contributor can log experiments
(someone who can
git push to the repository).
In order to use basic authentication with MLflow, you need to set the following environment variables:
MLFLOW_TRACKING_USERNAME- Your DAGsHub username
MLFLOW_TRACKING_PASSWORD- Your DAGsHub password or preferably an access token
export MLFLOW_TRACKING_USERNAME=<username/token> export MLFLOW_TRACKING_PASSWORD=<password>
Congratulations, you are ready to start logging experiments. Now, when you run your code, you will see new runs appear in the experiment tables, with their status and origin:
MLflow Tracking Usage¶
This document does not cover the usage of MLflow tracking, but a tutorial will soon be available. In the meantime refer to the official MLflow docs. If you have any further questions about this feature or any other on DAGsHub, please visit our Discord channel.
Known Issues, Limitations & Restrictions¶
DAGsHub currently doesn't support artifacts, but we might soon. Please, contact us in our Discord channel if you find it important.