Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

mlflow_tracking.md 12 KB

You have to be logged in to leave a comment. Sign In

MLflow Tracking

MLflow{target=_blank} is an open-source tool to manage the machine learning lifecycle. It supports live logging of parameters, metrics, metadata, and artifacts when running a machine learning experiment. To manage the post training stage, it provides a model registry with deployment functionality to custom serving tools.

DagsHub provides a remote MLflow server with every repository. You can log experiments with MLflow to it, view its information under the experiment tab, and manage your trained models from the full-fledged MLflow UI built into your DagsHub project.

How does the integration of DagsHub with MLflow work?

When you create a repository on DagsHub, a remote MLflow server is automatically created and configured with the project. The repository's MLflow tracking server will be located at:

https://dagshub.com/<DagsHub-user-name>/<repository-name>.mlflow

The server endpoint can also be found under the ‘Remote’ button:

![MLflow Experiments](assets/mlflow/remote-mlflow-zoom-in.png) MLflow remote

!!! info "Team based access control" - Only a repository contributor can log experiments and access the DagsHub MLflow UI.

How to set DagsHub as the remote MLflow server?

1. Install and import MLflow

  • Start by installing the MLflow python package{target=_blank} in your virtual environment using pip:

    === "Mac, Linux, Windows" bash pip3 install mlflow

  • Then, you will import MLflow to our python module using import mlflow and log the information with MLflow logging functions{target=_blank}. .

2. Set DagsHub as the remote URI

You can set the MLflow server URI by adding the following line to our code:

mlflow.set_tracking_uri(https://dagshub.com/<DagsHub-user-name>/<repository-name>.mlflow)

??? info "Set the MLflow server URI using an environment variable"

You can also define your MLflow server URI using the `MLFLOW_TRACKING_URI
` environment variable.

**We don't recommend this approach**, since you might forget to reset the environment variable when
switching between different projects. This might result in logging experiments to the wrong repository.

If you still prefer using the environment variable, we recommend setting it only for the current
command, like the following:

=== "Mac, Linux, Windows"
    ```bash
    MLFLOW_TRACKING_URI=https://dagshub.com/<username>/<repo>.mlflow python3 <file-name>.py
    ```

3. Set-up your credentials

The DagsHub MLflow server has built-in access controls. Only a repository contributor can log experiments (someone who can git push to the repository).

  • In order to use basic authentication with MLflow, you need to set the following environment variables:

    • MLFLOW_TRACKING_USERNAME - DagsHub username
    • MLFLOW_TRACKING_PASSWORD - DagsHub password or preferably an access token

You can set these by typing in the terminal: === "Mac, Linux, Windows" bash export MLFLOW_TRACKING_USERNAME=<username> export MLFLOW_TRACKING_PASSWORD=<password/token>

You can also use your token as username; in this case the password is not needed: === "Mac, Linux, Windows" bash export MLFLOW_TRACKING_USERNAME=<token>

Congratulations, you are ready to start logging experiments. Now, when you run your code, you will see new runs appear in the experiment tables, with their status and origin:

![MLflow Experiments](../feature_guide/assets/mlflow_experiment_table.png)

How to log models and artifacts to DagsHub?

!!! info MLflow experiments created before August 10th 2022 won't be affected by that change. This means you cannot log artifacts using this technique for your existing Default MLflow experiment. If you already have a repository with MLflow runs, the recommended way to start using the proxy artifacts is by creating a new experiment through the MLflow CLI{target="_blank"}, the Python client{target="_blank"}, or the MLflow UI.

Option 1: Use DagsHub Storage

DagsHub's MLflow integration supports directly logging artifacts through the tracking server. In the past the MLflow tracking server used to manage the location of artifacts and models, but uploading and downloading was done using the client's local credentials and available packages (i.e boto3 or google-cloud-storage). Support for proxying upload and download requests through the tracking server was added in MLflow 1.24.0.

DagsHub lets you leverage this capability by directly hosting your artifacts by default. For every newly created repository or MLflow experiment, DagsHub will generate a dedicated artifact location similar to mlflow-artifacts:/<UUID>.

Option 2: Use external buckets

DagsHub's tracking server allows you to specify AWS S3 buckets for storing artifacts for newly created MLflow experiments. In order to configure this, you must create a new experiment and provide an s3:// URI as the artifact store. You can either do this by clicking the "Create Experiment (+)" button in the DagsHub MLflow UI, and entering the artifact location in the dialog box, or running the following python code.

![Create MLflow Experiment dialog](./assets/mlflow/create_experiment_dialog.png)

Set up MLflow

import mlflow
artifact_location = f"s3://{s3-bucket-name}/mlruns"
mlflow.create_experiment("Deploy", artifact_location)

Once the experiment is created, you must tell your code to select it over the default experiment. You can do this either by setting the environment variable

export MLFLOW_EXPERIMENT_NAME=Deploy

or adding this line of python code to your training code:

mlflow.set_experiment(experiment_name="Deploy")

Set up AWS

Before logging models or other artifacts to MLflow, you will need to download the boto3 package to allow MLflow to interact with the AWS S3 API.

=== "Mac, Linux, Windows" bash pip3 install boto3

You'll also need to ensure that your code has the required permissions to upload files to AWS S3. Obtain an AWS Access Token (consisting of a pair of an Access Key ID and a Secret Access Key), then either set them as the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables, or run the aws configure command to save them on the filesystem.

??? info "Securely providing AWS credentials" If you don't want to give your code permanent AWS credentials, you could make use of aws-vault{target="_blank"} to provide time-limited temporary tokens instead.

Doing all this will allow you to call mlflow.log_artifact(){target="_blank"} or mlflow.autolog(log_models=True){target="_blank"} to instruct MLflow to upload models or other artifacts to the artifact store and note their locations on the tracking server.

How to launch the DagsHub MLflow UI

The DagsHub MLflow tracking server provides access to the MLflow server user interface (MLflow UI). To view the MLflow UI, visit the tracking server URI (https://dagshub.com/<username>/<repo>.mlflow) in a browser. If you haven't interacted with the main DagsHub interface in a while, you may have to enter your DagsHub username and password/access token in to the authentication popup shown by your browser.

You should have full access to all views and actions provided by the MLflow UI. This includes viewing run details, comparing runs (within the same experiment only, to compare runs across experiments, visit the DagsHub experiment tracking interface), creating and managing experiments, and viewing and updating the model registry.

launch the DagsHub MLflow UI

How to deploy an MLflow model using DagsHub?

DagsHub's MLflow integration includes support for logged artifacts and the MLflow model registry. With this, you can use MLflow to deploy your trained models as batteries-included inference servers to the cloud with ease.

How to register MLflow model in DagsHub Model Registry?

Once you have logged a model as part of an MLflow run, you can save that model to the Model Registry for your repository. You run the following python code to do so:

import mlflow
run_id = '<run-id-here>'
artifact_name = 'model'
model_name = '<name-of-model-in-model-registry>'
mlflow.register_model(f'runs:/{run_id}/{artifact_name}', model_name)

How to deploy an MLflow model from DagsHub Model Registry?

Once the model is registered as a part of the DagsHub Model Registry, you can make use of standard MLflow tooling to deploy the model as a container, on AWS SageMaker, Azure ML, Apache Spark UDF, or any other platform.

Simply follow the instructions provided by MLflow{target="_blank"} to do so.

Process to deploy an MLflow model to Amazon AWS SageMaker

mlflow sagemaker build-and-push-container
mlflow sagemaker deploy \
	-m "models:/<name-of-model-in-model-registry>/latest" \
	-a <sagemaker-deployment-name> \
	--region-name <aws-region> \
	-e <sagemaker-role-arn> \
	--mode replace

Process to build a Docker container image from an MLflow model

mlflow models build-docker \
	-m "models:/<name-of-model-in-model-registry>/latest" \
	-n <name-of-docker-image> \
	--enable-mlserver

To run inference server locally:

docker run -p 80:8080 <name-of-docker-image>

Process to deploy an MLflow model to Microsoft Azure ML

mlflow deployments create \
	--name <azureml-deployment-name> \
	-m "models:/<name-of-model-in-model-registry>/latest" \
    -t <azureml-mlflow-tracking-url> \
    --deploy-config-file <(echo '{"computeType":"aci"}')

How To Use MLflow In A Colab Environment?

We shared two examples of experiment logging to DagsHub’s MLflow server in a Colab environment.

Known Issues, Limitations & Restrictions

The MLflow UI provided by DagsHub currently doesn't support displaying artifacts pushed to an external storage like S3. Please, contact us in our Discord channel{target=_blank} if you find it important.

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...