Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git mlflow
Ananddd06 0d8ef6e544
Added the readme.md
1 month ago
aecc5d25b8
Added the entire end to end code with mlflow and dvc
1 month ago
aecc5d25b8
Added the entire end to end code with mlflow and dvc
1 month ago
aecc5d25b8
Added the entire end to end code with mlflow and dvc
1 month ago
src
aecc5d25b8
Added the entire end to end code with mlflow and dvc
1 month ago
40e852e54d
Added the project structure
1 month ago
06d2059391
Stop tracking data/raw/diabetes.csv in Git, move to DVC
1 month ago
40e852e54d
Added the project structure
1 month ago
aecc5d25b8
Added the entire end to end code with mlflow and dvc
1 month ago
aecc5d25b8
Added the entire end to end code with mlflow and dvc
1 month ago
762f8e7173
Added the train.py
1 month ago
0d8ef6e544
Added the readme.md
1 month ago
762f8e7173
Added the train.py
1 month ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

readme.md

You have to be logged in to leave a comment. Sign In

๐Ÿš€ Project: End-to-End ML Pipeline with DVC & MLflow ๐Ÿ”๐Ÿ“Š

Welcome to an exciting project that demonstrates how to build a production-grade Machine Learning pipeline from scratch using DVC ๐Ÿ—“๏ธ for data version control and MLflow โš™๏ธ for experiment tracking.

๐ŸŽฏ Objective: Train a robust Random Forest Classifier ๐ŸŒฒ on the Pima Indians Diabetes Dataset ๐Ÿงฌ, with a modular and reproducible ML pipeline including:

  • ๐Ÿ” Data Preprocessing
  • ๐Ÿง  Model Training
  • ๐Ÿ“ˆ Model Evaluation

๐Ÿ”‘ Key Highlights

๐Ÿ“ฆ Data Versioning with DVC

With DVC, you can:

  • ๐Ÿงฌ Track datasets, models, and code changes
  • โš™๏ธ Structure workflows into stages (preprocess โžก๏ธ train โžก๏ธ evaluate)
  • ๐Ÿ” Automatically re-run affected stages when changes occur
  • โ˜๏ธ Connect to remote data storage (DagsHub/S3) for collaboration

๐Ÿ› ๏ธ Your pipeline becomes:

  • โœ… Modular
  • โœ… Reproducible
  • โœ… Scalable

๐Ÿ“Š Experiment Tracking with MLflow

MLflow allows:

  • ๐Ÿงช Tracking experiments: log parameters, metrics, models
  • ๐Ÿงฎ Comparing runs visually
  • ๐Ÿ“ Optimizing hyperparameters (n_estimators, max_depth, etc.)
  • ๐Ÿ“ฆ Storing & reusing model artifacts

๐Ÿ” โ€œWhat gets measured gets improved.โ€ โ€” With MLflow, you measure everything.


๐Ÿ“ Dataset Used

  • Pima Indians Diabetes Dataset ๐Ÿ“Š Medical data for binary classification โœ… Balanced features โœ… Real-world healthcare relevance

๐Ÿค– Model Used

  • ๐ŸŒฒ Random Forest Classifier โœ… Robust โœ… Handles missing values well โœ… Performs well on tabular data

๐Ÿ“ˆ Final Output

At the end of this project, you will have:

  • ๐ŸŽฏ A complete ML pipeline versioned with DVC
  • โš™๏ธ Multiple model experiments tracked via MLflow
  • โ˜๏ธ Integrated remote storage (DagsHub)
  • ๐Ÿ” Reproducible and scalable pipeline stages

๐Ÿ”ฅ Tech Stack

  • ๐Ÿ Python
  • ๐Ÿ“ฆ DVC
  • โš™๏ธ MLflow
  • โ˜๏ธ DagsHub
  • ๐Ÿ“Š Scikit-learn
  • ๐Ÿงช Pandas, NumPy, Matplotlib

โญ Want to Contribute?

Feel free to fork, โญ star, or raise issues! Together, letโ€™s build smarter pipelines ๐Ÿ”๐Ÿ’ก


โœ… Built with โค๏ธ by Anand โ€“ Follow for more end-to-end ML & MLOps content! ๐Ÿ”— View Project on DagsHub

Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...