Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

#152 Add `mlflow-export-import` notes to MLFlow docs

Merged
Dean Pleban merged 1 commits into DAGsHub-Official:main from penguinsfly:mlflow-export-import
1 changed files with 126 additions and 9 deletions
  1. 126
    9
      docs/integration_guide/mlflow_tracking.md
@@ -4,12 +4,12 @@ description: Free remote MLflow server with team-based access control. Log exper
 ---
 ---
 # MLflow Tracking
 # MLflow Tracking
 
 
-[MLflow](https://mlflow.org/){target=_blank} is an open-source tool to manage the machine learning lifecycle. It supports 
-live logging of parameters, metrics, metadata, and artifacts when running a machine learning experiment. To manage the 
-post training stage, it provides a model registry with deployment functionality to custom serving tools. 
+[MLflow](https://mlflow.org/){target=_blank} is an open-source tool to manage the machine learning lifecycle. It supports
+live logging of parameters, metrics, metadata, and artifacts when running a machine learning experiment. To manage the
+post training stage, it provides a model registry with deployment functionality to custom serving tools.
 
 
-DagsHub provides a free hosted MLflow server with team-based access control for every repository. You can log experiments with MLflow to it, view its information 
-under the [experiment tab](../feature_guide/experiment_tracking.md), and manage your trained models from the full-fledged 
+DagsHub provides a free hosted MLflow server with team-based access control for every repository. You can log experiments with MLflow to it, view its information
+under the [experiment tab](../feature_guide/experiment_tracking.md), and manage your trained models from the full-fledged
 MLflow UI built into your DagsHub project.
 MLflow UI built into your DagsHub project.
 
 
 <style>
 <style>
@@ -61,7 +61,7 @@ The server endpoint can also be found under the ‘Remote’ button:
     - Only a repository contributor can log experiments and access the DagsHub MLflow UI.
     - Only a repository contributor can log experiments and access the DagsHub MLflow UI.
 
 
 
 
-## How to set DagsHub as the remote MLflow server? 
+## How to set DagsHub as the remote MLflow server?
 
 
 ### 1. Install and import MLflow
 ### 1. Install and import MLflow
 
 
@@ -78,7 +78,7 @@ your virtual environment using pip:
   [MLflow logging functions](https://www.mlflow.org/docs/latest/tracking.html#logging-functions){target=_blank}.
   [MLflow logging functions](https://www.mlflow.org/docs/latest/tracking.html#logging-functions){target=_blank}.
  .
  .
 
 
-### 2. Set DagsHub as the remote URI  
+### 2. Set DagsHub as the remote URI
 
 
 You can set the MLflow server URI by adding the following line to our code:
 You can set the MLflow server URI by adding the following line to our code:
 
 
@@ -118,7 +118,7 @@ You can set these by typing in the terminal:
     export MLFLOW_TRACKING_USERNAME=<username>
     export MLFLOW_TRACKING_USERNAME=<username>
     export MLFLOW_TRACKING_PASSWORD=<password/token>
     export MLFLOW_TRACKING_PASSWORD=<password/token>
     ```
     ```
-    
+
 You can also use your token as username; in this case the password is not needed:
 You can also use your token as username; in this case the password is not needed:
 === "Mac, Linux, Windows"
 === "Mac, Linux, Windows"
     ```bash
     ```bash
@@ -149,7 +149,7 @@ but uploading and downloading was done using the client's local credentials and
 packages (i.e `boto3` or `google-cloud-storage`).
 packages (i.e `boto3` or `google-cloud-storage`).
 Support for proxying upload and download requests through the tracking server was added in MLflow 1.24.0.
 Support for proxying upload and download requests through the tracking server was added in MLflow 1.24.0.
 
 
-DagsHub lets you leverage this capability by directly hosting your artifacts by default. 
+DagsHub lets you leverage this capability by directly hosting your artifacts by default.
 For every newly created repository or MLflow experiment,
 For every newly created repository or MLflow experiment,
 DagsHub will generate a dedicated artifact location similar to `mlflow-artifacts:/<UUID>`.
 DagsHub will generate a dedicated artifact location similar to `mlflow-artifacts:/<UUID>`.
 
 
@@ -273,6 +273,123 @@ We shared two examples of experiment logging to DagsHub’s MLflow server in a C
 - [Using MLflow with Tensorflow](https://colab.research.google.com/drive/1TrN7YEgiIzt7EelvshJPx2n4j-Qa6LBf?usp=sharing){target=_blank}
 - [Using MLflow with Tensorflow](https://colab.research.google.com/drive/1TrN7YEgiIzt7EelvshJPx2n4j-Qa6LBf?usp=sharing){target=_blank}
 - [Using MLflow with fast.ai](https://colab.research.google.com/drive/1DhHzI5blVbniFwx98EKXYSi0z_Icm07t?usp=sharing){target=_blank}
 - [Using MLflow with fast.ai](https://colab.research.google.com/drive/1DhHzI5blVbniFwx98EKXYSi0z_Icm07t?usp=sharing){target=_blank}
 
 
+## How to import MLflow local objects to DagsHub MLflow remote?
+
+Generally, you can use [`mlflow-export-import`](https://github.com/mlflow/mlflow-export-import) to export MLflow experiments, runs and models from one server to another.
+
+The following example demonstrates how to bulk export all objects that are created locally, then bulk import to DagsHub remote tracking server.
+
+### 1. Install `mlflow-export-import`
+
+- In the same environment that you originally install `mlflow`, you can install `mlflow-export-import` with:
+
+    ```bash
+    pip install mlflow-export-import
+    ```
+
+- If you need to install the latest version from Github source, do this instead:
+
+    ```bash
+    pip install git+https://github.com/mlflow/mlflow-export-import
+    ```
+
+### 2. Export all local objects
+
+- In one terminal, start the local `mlflow` server, for example with:
+
+    ```bash
+    mlflow server --host 0.0.0.0 --port 8888
+    ```
+
+- In **another** terminal, in the same virtual environment, export all objects to a folder called `mlflow-export` with:
+
+    ```bash
+    # note: the port needs to be same one that you request in the other terminal
+    MLFLOW_TRACKING_URI=http://localhost:8888 \
+    	export-all --output-dir mlflow-export
+    ```
+
+- If succeeded, you should see a report saying so, for example
+
+    ```text
+    3 experiments exported
+    37/37 runs succesfully exported
+    Duration for experiments export: 10.6 seconds
+    Duration for entire tracking server export: 10.8 seconds
+    ```
+
+- At this point, you can stop the local server in the first terminal.
+
+### 3. Import to DagsHub server
+
+- Find your DagsHub repository's MLflow remote variables, for example by going through the `Remote` button in your repository, then click on the `Experiments` tab.
+- Do the following in terminal:
+
+    ```bash
+    MLFLOW_TRACKING_URI=https://dagshub.com/<USER>/<REPO>.mlflow \
+  	MLFLOW_TRACKING_USERNAME=<USER> \
+  	MLFLOW_TRACKING_PASSWORD=<PASSWORD_OR_TOKEN> \
+    	import-all --input-dir mlflow-export
+    ```
+
+- If successful, you can launch the local server again, and visit `https://dagshub.com/<USER>/<REPO>.mlflow` to inspect if there are any discrepancies between the logged data, artifacts, models, runs, experiments. For example, see if there's anything missing.
+
+### Importing issues & workarounds
+
+There may be some issues with in **Step 3** with importing. Below are some potential workarounds to try, essentially by editing the package source codes after installation.
+
+!!! warning
+    - These workarounds are used with `mlflow-export-import 1.2.0`. In the future, these may be patched.
+    - These workarounds may work for only certain cases and issues.
+    - You should backup `mlflow-export` directory just in case.
+    - Always try to compare the local server and DagsHub remote tracking server after every `import-all` step to ensure they are the same.
+
+- First, find the source code directory by inspecting at the `Location` field when doing `pip show`, for example:
+
+    ```bash
+    pip show mlflow-export-import
+    ...
+    Location: .venv/lib/python3.10/site-packages/mlflow_export_import
+    ...
+    ```
+
+    ??? info "Alternative to editing in `site-packages`"
+        The alternative to editing source codes in `site-packages` is using editable installation with `mlflow-export-import`.
+
+        ```bash
+        # clone the repository
+        git clone https://github.com/mlflow/mlflow-export-import
+
+        # install in editable mode
+        pip install -e ./mlflow-export-import/
+        ```
+
+        Then you can edit files in the local `mlflow-export-import/mlflow_export_import` directory instead of your environment's `site-packages` directory.
+
+- If you see `{'error_code': 'BAD_REQUEST'}` in the outputs of `import-all`, comment out/delete the following lines in `common/mlflow_utils.py` file, under `set_experiment` function
+
+    ```python
+    if ex.error_code != "RESOURCE_ALREADY_EXISTS":
+    	raise MlflowExportImportException(ex, f"Cannot create experiment '{exp_name}'")
+    ```
+
+- Re-run **Step 3** with the same `import-all` command again.
+
+- If you still encounter issues, it could be because your local experiments did not log any data inputs. If none of your experiments did, attempt to comment out/delete the following line in `run/import_run.py` source file, under `import_run` function, inside the `try` block
+
+    ```python
+    _import_inputs(http_client, src_run_dct, run_id)
+    ```
+
+    !!! warning
+        Please note that this workaround assumes **none** of your experiments or runs log any inputs.
+
+        If there are some that do and some that do not, you will need to modify the logic of `_import_inputs` and/or the code surround this line to accommodate that.
+
+- Re-run **Step 3** with the same `import-all` command again.
+
+If there are still issues or there are discrepancies between the local server and the remote DagsHub server, please open a ticket on DagsHub and/or `mlflow-export-import` Github repository.
+
 ## Known Issues, Limitations & Restrictions
 ## Known Issues, Limitations & Restrictions
 
 
 The MLflow UI provided by DagsHub currently doesn't support displaying artifacts pushed to an external storage like S3.
 The MLflow UI provided by DagsHub currently doesn't support displaying artifacts pushed to an external storage like S3.
Discard