Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git mlflow
96453812a5
final commit
4 months ago
app
96453812a5
final commit
4 months ago
img
2616d6e674
create img folder
4 months ago
96453812a5
final commit
4 months ago
20ce242229
add docker ignore file
4 months ago
38466dd961
add data to dvc
4 months ago
aa98f49d77
add dvc lock
4 months ago
2c3fc7fd1a
add dockerfile
4 months ago
1745cd7e65
create readme file
4 months ago
a3dfbfa063
add config file
4 months ago
38466dd961
add data to dvc
4 months ago
aa98f49d77
add dvc lock
4 months ago
830bbe326a
build stage
4 months ago
da29da6563
add requirements.txt
4 months ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

Readme.MD

You have to be logged in to leave a comment. Sign In

ML-PROJET-WITH-DOCKER

This project is a complete Machine Learning pipeline that includes model training, batch predictions, and an interactive web app using Streamlit, DVC, and Docker. It also integrates with DagsHub for version control and experiment tracking.


Features

  • Train a Gradient Boosting model on the Wine dataset
  • Store and version data/models with DVC
  • Run batch predictions on uploaded .csv files
  • Use a clean and interactive Streamlit interface
  • Fully reproducible thanks to Docker
  • Use of a centralized configuration with config.yaml

alt textalt text alt text

📁 Project Structure

ML-PROJET-WITH-DOCKER/
│
├── app/
│   ├── training/
│   │   └── train_model.py              # Script to train and save the model
│   ├── model/
│   │   ├── load_model.py               # Load model from file
│   │   └── predictor.py                # Make predictions using the model
│   ├── streamlit_app.py                # Streamlit web application
│   ├── utils.py                        # Utility functions
│   └── dvc_utils.py                    # DVC utility integration
│
├── data/                               # Raw input data (tracked by DVC)
├── models/                             # Trained models (tracked by DVC)
│   └── gradiant_boosting_model.pkl     # Trained Gradient Boosting model
├── uploads/                            # Uploaded files via Streamlit
│
├── .dvc/                               # DVC internal files
├── .venv/                              # Python virtual environment (excluded)
├── .env                                # Environment variables
├── Dockerfile                          # Docker image definition
├── config.yaml                         # Centralized parameters and paths
├── requirements.txt                    # Python dependencies
├── dvc.yaml                            # DVC pipeline definition
├── dvc.lock                            # DVC stage locks for reproducibility
└── README.md                        

🐳 Docker Usage

🔧 Build the image

docker build -t ml-docker-app .

🔧 Run de Streamlit app

docker run -p 8501:8501 ml-docker-app

📦 DVC Pipeline

We use DVC to track data and model versions, and define reproducible pipelines.

Train model with DVC

dvc repro

➕ Add new stages (example)

dvc stage add -n train_model \
  -d app/training/train_model.py \
  -d config.yaml \
  -o models/gradiant_boosting_model.pkl \
  python3 app/training/train_model.py

DagsHub Integration

This project can be synced with DagsHub to manage:

✅ Data & model versioning

✅ Experiments tracking with MLflow

✅ Git/DVC repository hosting

https://dagshub.com/Herman-Motcheyo/ML-Project-Docker

Tip!

Press p or to see the previous file or, n or to see the next file

About

This project is a complete Machine Learning pipeline that includes model training, batch predictions, and an interactive web app using Streamlit, DVC, and Docker. It also integrates with DagsHub for version control and experiment tracking.

Collaborators 1

Comments

Loading...