Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git github
647c9374ac
change dvc config
3 years ago
722941df16
rename config file
3 years ago
6c75c7bad6
change structure of the source code
3 years ago
6c75c7bad6
change structure of the source code
3 years ago
src
6c75c7bad6
change structure of the source code
3 years ago
722941df16
rename config file
3 years ago
c63ec9af13
add gitignore
3 years ago
a8a1e50fc2
Update README.md
3 years ago
3a56d91a76
Add files via upload
4 years ago
c63ec9af13
add gitignore
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Machine learning pipeline

This repo provides an example of how to incorporate popular machine learning tools such as DVC, MLflow, and Hydra in your machine learning project. I use my project on predicting aggressive tweets as an example.

Find the article on how to use MLflow and Hydra here

Find the article on how to use DVC here

DVC

DVC is a data version control tool. To install DVC, run

pip install dvc

Hydra

With Hydra, you can compose your configuration dynamically. To install Hydra, simply run

pip install hydra-core --upgrade

MLflow

MLflow is a platform to manage the ML lifecycle, including experimentation, reproducibility, and deployment. Install MLflow with

pip install mlflow

Structure's explanation

  • src: file for source code
  • mlruns: file for mlflow runs
  • configs: to keep config files
  • outputs: results from the runs of Hydra. Each time you run your function nested inside Hydra's decoration, the output will be saved here. If you want to change the directory in mlflow folder, use
import mlflow
import hydra
from hydra import utils

mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
  • src/preprocessing.py: file for preprocessing
  • src/train_pipeline.py: training's pipeline
  • src/train.py: file for training and saving model
  • src/predict.py: file for prediction and loading model

How to pull the data with DVC

Pull the data from Google Drive

dvc pull 

How to run this file

To run the configs and see how these experiments are displayed on MLflow's server, clone this repo and run

python src/train.py

Once the run is completed, you can access to MLflow's server with

mlflow ui

Access http://localhost:5000/ from the same directory that you run the file, you should be able to see your experiment like this image

Tip!

Press p or to see the previous file or, n or to see the next file

About

Example machine learning pipeline with MLflow and Hydra

Collaborators 1

Comments

Loading...