Are you sure you want to delete this access key?
title | description |
---|---|
Track Your Machine Learning Experiments with DagsHub | Learn how to leverage MLflow and DagsHub to track your machine learning and LLM experiments. |
DagsHub integrates with MLflow to provide an easy way to track experiment parameters and metrics, and provides a built-in integration with Git and dataset management which means all your experiments become fully reproducible.
In this section, we'll learn how to track machine learning experiments on DagsHub with MLflow.
Machine Learning and Data Science are fundamentally experimental in nature, as we heavily rely on research and empirical analysis. However, as your project grows in complexity, keeping track of various experiments, their configurations, and results can quickly become overwhelming.
Therefore, tracking the experiment's source, parameters, and results is critical. This helps both in understanding your progress, knowing what approaches work better, and deciding which models to promote to production.
Each project on DagsHub comes with a full-fledged experiment tracking server based on MLflow and fully compatible with its API. To start you can create a new project or connect an existing GitHub, GitLab, or Bitbucket project.
For the purpose of this guide, let's assume you have your code for training. If you don't, below is a simple code snippet for training a PyTorch Lightning Autoencoder, with a Data Engine Dataset.
!!! info "Code snippet for autoencoder"
Install PyTorch Lightning and DagsHub
bash pip install lightning
Then use the following code, which should run end-to-end
```python
import os
from torch import optim, nn, utils, Tensor
from torchvision import transforms
import lightning as pl
# define the LightningModule
class LitAutoEncoder(pl.LightningModule):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(nn.Linear(480 * 640 * 3, 64), nn.ReLU(), nn.Linear(64, 3))
self.decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 3* 640 * 480))
self.transform = transforms.Resize((480, 640))
def training_step(self, batch, batch_idx):
# training_step defines the train loop.
# it is independent of forward
x = batch[0]
if x.shape[1] != 3:
x = x.expand(-1,3,-1,-1)
x = self.transform(x)
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
self.log("train_loss", loss)
return loss
def configure_optimizers(self):
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
# init the autoencoder
autoencoder = LitAutoEncoder()
# setup data
from dagshub.data_engine import datasources
ds = datasources.get('Dean/COCO_1K', 'COCO_1K')
dataset = ds.head().as_ml_dataloader(flavor="torch")
# train the model (hint: here are some helpful Trainer arguments for rapid idea iteration)
trainer = pl.Trainer(limit_train_batches=100, max_epochs=5)
trainer.fit(model=autoencoder, train_dataloaders=dataset)
```
If you run the snippet above, or your own training loop, it should run end-to-end without errors. Now let's see how to track these experiments with DagsHub.
To use MLflow to track the experiments in our project we need to add a couple lines of code. To make the experiments' information accessible outside our local machine, we'll utilize DagsHub hosted MLflow server that comes with your repository. This way you'll be able to share your experiment results with your team or the world!
Start by installing MLflow and DagsHub:
pip install mlflow dagshub
Next, let's set up the connection and authentication with required to log our experiment to the DagsHub experiment server. We can do this easily with the DagsHub client. Simply run the following line of python (or add it to your script):
import dagshub
dagshub.init(repo_name="<repo-name>", repo_owner="<repo-owner>")
??? info "Configure DagsHub from the CLI" You can also configure DagsHub's MLflow tracking server from the CLI. Read more about it in the MLflow integration page
To log the information of the experiment with MLflow we need to add only 3 lines of code to modeling.py
import mlflow
with mlflow.start_run():
- “scope” each run in one block of codemlflow.<framework>.autolog()
- automatic logging for your framework. Many frameworks are supportedAfter these changes, here is how the snippet from before looks: !!! info "Code snippet for autoencoder after MLflow instrumentation" ```python hl_lines="5 7 43"
import os
from torch import optim, nn, utils, Tensor
from torchvision import transforms
import lightning as pl
import mlflow
mlflow.autolog()
# define the LightningModule
class LitAutoEncoder(pl.LightningModule):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(nn.Linear(480 * 640 * 3, 64), nn.ReLU(), nn.Linear(64, 3))
self.decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 3* 640 * 480))
self.transform = transforms.Resize((480, 640))
def training_step(self, batch, batch_idx):
# training_step defines the train loop.
# it is independent of forward
x = batch[0]
if x.shape[1] != 3:
x = x.expand(-1,3,-1,-1)
x = self.transform(x)
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
self.log("train_loss", loss)
return loss
def configure_optimizers(self):
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
# init the autoencoder
autoencoder = LitAutoEncoder()
# setup data
from dagshub.data_engine import datasources
ds = datasources.get('Dean/COCO_1K', 'COCO_1K')
dataset = ds.head().as_ml_dataloader(flavor="torch")
with mlflow.start_run():
# train the model (hint: here are some helpful Trainer arguments for rapid idea iteration)
trainer = pl.Trainer(limit_train_batches=100, max_epochs=5)
trainer.fit(model=autoencoder, train_dataloaders=dataset)
```
!!! Note "MLflow auto-logging" MLflow supports the autologging of many popular frameworks such as PyTorch, Tensorflow, XGBoost and more. You can find all the information here.
Now, let's rerun the code. Going back to our project we'll see an experiment appear in your experiment tab, that will have parameters, metrics, and even our trained model reported to it.
To go into detail and see charts, you can click on the experiment name to go into the single experiment view:
You can also see your experiment in the MLflow UI, including the models and artifacts logged, by clicking on the "Go to MLflow UI button"
Now that you have your first experiment run, you can choose what to do next. Learn how to Deploy a model, build an active learning pipeline, or learn how to reproduce experiment results.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?