1 Branches

.dvc

aecc5d25b8

Added the entire end to end code with mlflow and dvc

1 month ago

data

aecc5d25b8

Added the entire end to end code with mlflow and dvc

1 month ago

models

aecc5d25b8

Added the entire end to end code with mlflow and dvc

1 month ago

src

aecc5d25b8

Added the entire end to end code with mlflow and dvc

1 month ago

.DS_Store

40e852e54d

Added the project structure

1 month ago

.dvcignore

06d2059391

Stop tracking data/raw/diabetes.csv in Git, move to DVC

1 month ago

.gitignore

40e852e54d

Added the project structure

1 month ago

dvc.lock

aecc5d25b8

Added the entire end to end code with mlflow and dvc

1 month ago

dvc.yaml

aecc5d25b8

Added the entire end to end code with mlflow and dvc

1 month ago

params.yaml

762f8e7173

Added the train.py

1 month ago

readme.md

0d8ef6e544

Added the readme.md

1 month ago

requirements.txt

762f8e7173

Added the train.py

1 month ago

DagsHub Storage

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

You have to be logged in to leave a comment.

🚀 Project: End-to-End ML Pipeline with DVC & MLflow 🔁📊

Welcome to an exciting project that demonstrates how to build a production-grade Machine Learning pipeline from scratch using DVC 🗓️ for data version control and MLflow ⚙️ for experiment tracking.

🎯 Objective: Train a robust Random Forest Classifier 🌲 on the Pima Indians Diabetes Dataset 🧬, with a modular and reproducible ML pipeline including:

🔍 Data Preprocessing
🧠 Model Training
📈 Model Evaluation

🔑 Key Highlights

📦 Data Versioning with DVC

With DVC, you can:

🧬 Track datasets, models, and code changes
⚙️ Structure workflows into stages (preprocess ➡️ train ➡️ evaluate)
🔁 Automatically re-run affected stages when changes occur
☁️ Connect to remote data storage (DagsHub/S3) for collaboration

🛠️ Your pipeline becomes:

✅ Modular
✅ Reproducible
✅ Scalable

📊 Experiment Tracking with MLflow

MLflow allows:

🧪 Tracking experiments: log parameters, metrics, models
🧮 Comparing runs visually
📏 Optimizing hyperparameters (n_estimators, max_depth, etc.)
📦 Storing & reusing model artifacts

🔍 “What gets measured gets improved.” — With MLflow, you measure everything.

📁 Dataset Used

Pima Indians Diabetes Dataset 📊 Medical data for binary classification ✅ Balanced features ✅ Real-world healthcare relevance

🤖 Model Used

🌲 Random Forest Classifier ✅ Robust ✅ Handles missing values well ✅ Performs well on tabular data

📈 Final Output

At the end of this project, you will have:

🎯 A complete ML pipeline versioned with DVC
⚙️ Multiple model experiments tracked via MLflow
☁️ Integrated remote storage (DagsHub)
🔁 Reproducible and scalable pipeline stages

🔥 Tech Stack

🐍 Python
📦 DVC
⚙️ MLflow
☁️ DagsHub
📊 Scikit-learn
🧪 Pandas, NumPy, Matplotlib

⭐ Want to Contribute?

Feel free to fork, ⭐ star, or raise issues! Together, let’s build smarter pipelines 🔁💡

✅ Built with ❤️ by Anand – Follow for more end-to-end ML & MLOps content! 🔗 View Project on DagsHub

Tip!

Press p or to see the previous file or, n or to see the next file

Specify your S3 bucket

Bucket name cannot be the same as the repository name. Please change one of them.

Bucket url and prefix

Region

Endpoint Url

Disable SSL verification

readme.md

🚀 Project: End-to-End ML Pipeline with DVC & MLflow 🔁📊

🔑 Key Highlights

📦 Data Versioning with DVC

📊 Experiment Tracking with MLflow

📁 Dataset Used

🤖 Model Used

📈 Final Output

🔥 Tech Stack

⭐ Want to Contribute?

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Ananddd06 / MachineLearningPipeline

readme.md

🚀 Project: End-to-End ML Pipeline with DVC & MLflow 🔁📊

🔑 Key Highlights

📦 Data Versioning with DVC

📊 Experiment Tracking with MLflow

📁 Dataset Used

🤖 Model Used

📈 Final Output

🔥 Tech Stack

⭐ Want to Contribute?

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Ananddd06
/
MachineLearningPipeline