Skip to content

Apple MLX-LM

Apple MLX is a machine learning framework optimized for Apple Silicon, and MLX-LM provides LLM fine-tuning and inference utilities (including LoRA) built on top of it.

With DagsHub and MLX-LM, you can fine-tune LLMs locally on your Mac and still get full MLOps support: experiment tracking, dataset versioning, and model registry workflows. This integration is based on the MLflow server that comes with each DagsHub project.

Useful links:

How does MLX-LM work with DagsHub?

You run MLX-LM locally (training/inference on Apple Silicon). DagsHub provides the backend services:

  • Experiments: log params/metrics/artifacts to DagsHub's hosted MLflow server
  • Data Engine: query and download versioned training data by metadata (optional)
  • Model Registry: register and version fine-tuned artifacts (for example, LoRA adapter weights)

In practice, the integration is a small amount of code:

  1. Call dagshub.init(..., mlflow=True) to configure MLflow to log to your DagsHub repo
  2. Use standard MLflow logging (mlflow.log_param, mlflow.log_metric, mlflow.log_artifact / mlflow.pyfunc.log_model)
  3. (Optional) Use Data Engine to fetch the exact dataset slice used for a run, and rely on DagsHub autologging to link the dataset query to the run

Setup

This workflow requires macOS on Apple Silicon.

Install dependencies:

pip install "mlx>=0.18.0" "mlx-lm>=0.19.0" "dagshub>=0.3.0" "mlflow<3"

Initialize DagsHub + MLflow in your training script:

import dagshub

dagshub.init(
    repo_owner="your-username-or-org",
    repo_name="your-repo",
    mlflow=True,
)

Track MLX-LM fine-tuning runs (MLflow)

Start an MLflow run and log your training metadata and metrics as usual:

import mlflow

with mlflow.start_run(run_name="lora-qwen-on-mac"):
    mlflow.log_param("base_model", "mlx-community/Qwen2.5-1.5B-Instruct-4bit")
    mlflow.log_param("fine_tune_type", "lora")

    # During training, call mlflow.log_metric(...) for loss/throughput/etc.
    mlflow.log_metric("train_loss", 1.23, step=10)

View runs in your repo's Experiments tab:

https://dagshub.com/<repo_owner>/<repo_name>/experiments

If you're using MLX-LM's training utilities, you can simplify metric and artifact logging by using a custom MLflow training callback. See the reference implementation here:

Example usage:

from callbacks import MLflowCallback


callback = MLflowCallback(adapter_path="adapters")

# Pass callback to the MLX-LM training loop; it logs train/val metrics and can log adapters at the end.
train(..., training_callback=callback)

(Optional) Use Data Engine as your dataset source

If your dataset lives in DagsHub Data Engine, you can query by metadata (for example, a split field with values train/valid/test) and download only the files needed for a run:

from dagshub.data_engine import datasources

ds = datasources.get_datasource(repo="your-username-or-org/your-repo", name="my-dataset")

train_points = ds[ds["split"] == "train"].all()
valid_points = ds[ds["split"] == "valid"].all()

for dp in train_points:
    dp.download_file(target=f"data/train/{dp.path}")

Tip: run your Data Engine queries inside an active MLflow run; DagsHub can automatically associate the dataset query with the experiment.

After download_file(...), you'll typically need to convert the downloaded files into an MLX-LM-compatible dataset object (for example, ChatDataset), by reading/parsing the downloaded JSON/JSONL and mapping it into either:

  • chat format: { "messages": [{"role": "user", ...}, {"role": "assistant", ...}] }
  • completions format: { "prompt": "...", "completion": "..." }

For a concrete example of the folder parsing + conversion step, see load_folder_dataset in:

Log adapters and use the Model Registry

After fine-tuning with LoRA, log your adapter directory to MLflow so it can be registered in the Model Registry:

import mlflow
import mlflow.pyfunc


class AdapterWrapper(mlflow.pyfunc.PythonModel):
    pass


mlflow.pyfunc.log_model(
    artifact_path="model",
    python_model=AdapterWrapper(),
    artifacts={"adapters": "adapters/"},
    input_example={"prompt": "Hello"},
)

Once logged, you can register the resulting MLflow model from the DagsHub UI (Models/Registry tab) and version it like any other model.

Load adapters back for inference or serving

You can download adapters from the Model Registry using the standard MLflow artifacts API (after calling dagshub.init(..., mlflow=True)):

import mlflow

adapters_path = mlflow.artifacts.download_artifacts(
    artifact_uri="models:/my-mlx-lm-adapters/Production"
)

Then pass the downloaded adapter path to MLX-LM when loading the base model for inference.

For a complete working reference (training, Data Engine download, MLflow logging, Model Registry usage, and a local OpenAI-compatible server), see:

dagshub.com/Dean/hello-world-mlx