In this example, we train a Pytorch Lightning model to predict handwritten digits, leveraging early stopping.
The code, adapted from this repository, is almost entirely dedicated to model training, with the addition of a single
mlflow.pytorch.autolog() call to enable automatic logging of params, metrics, and models,
including the best model from early stopping.
To run the example via MLflow, navigate to the
mlflow/examples/pytorch/MNIST/example1 directory and run the command
mlflow run .
This will run
mnist_autolog_example1.py with the default set of parameters such as
--max_epochs=5. You can see the default value in the
In order to run the file with custom parameters, run the command
mlflow run . -P max_epochs=X
X is your desired value for
If you have the required modules for the file and would like to skip the creation of a conda environment, add the argument
mlflow run . --no-conda
Once the code is finished executing, you can view the run's metrics, parameters, and details by running the command
and navigating to http://localhost:5000.
For more details on MLflow tracking, see the docs.
The parameters can be overridden via the command line:
mlflow run . -P max_epochs=5 -P gpus=1 -P batch_size=32 -P num_workers=2 -P learning_rate=0.01 -P accelerator="ddp" -P patience=5 -P mode="min" -P monitor="val_loss" -P verbose=True
Or to run the training script directly with custom parameters:
python mnist_autolog_example1.py \ --max_epochs 5 \ --gpus 1 \ --accelerator "ddp" \ --batch_size 64 \ --num_workers 3 \ --lr 0.001 \ --es_patience 5 \ --es_mode "min" \ --es_monitor "val_loss" \ --es_verbose True
To configure MLflow to log to a custom (non-default) tracking location, set the MLFLOW_TRACKING_URI environment variable, e.g. via export MLFLOW_TRACKING_URI=http://localhost:5000/. For more details, see the docs.