Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git github
Karan Shingde 3d32b6880d
Update README.md
8 months ago
7972b6a2ab
DVC and Docker steup done
9 months ago
88b125721d
kubectl issue
9 months ago
7972b6a2ab
DVC and Docker steup done
9 months ago
img
f1a4413dfc
preview added
9 months ago
k8s
23bd79c8a6
Replica reduced
8 months ago
a2b8cd558c
initial pipeline commit #1
1 year ago
8a9146a479
Pushing to ACR
9 months ago
a2b8cd558c
initial pipeline commit #1
1 year ago
7972b6a2ab
DVC and Docker steup done
9 months ago
a2b8cd558c
initial pipeline commit #1
1 year ago
src
7972b6a2ab
DVC and Docker steup done
9 months ago
8ed8b81749
create py script to generate data
1 year ago
9 months ago
2ed1f2a86e
Ci.yaml added
9 months ago
7972b6a2ab
DVC and Docker steup done
9 months ago
4c14635faf
path env set
9 months ago
5de3059aef
artifacts edited
9 months ago
543b211823
docker image built!
9 months ago
73063a90c6
LICENSE added
11 months ago
3d32b6880d
Update README.md
8 months ago
7972b6a2ab
DVC and Docker steup done
9 months ago
4c14635faf
path env set
9 months ago
430e09f932
model training and experiment done
11 months ago
7972b6a2ab
DVC and Docker steup done
9 months ago
35cb4ebb50
Using random forest
9 months ago
0d0c806157
CI/CD edited
9 months ago
ba998aef9b
Pytest set
9 months ago
a2b8cd558c
initial pipeline commit #1
1 year ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

MLOps🚀 - From developement to deployment🧪💥

In short for Machine Learning Operations, is a set of practices and methodologies that aim to streamline the deployment, management, and maintenance of machine learning models in production environments. It brings together the principles of DevOps (Development Operations) and applies them specifically to machine learning workflows. The MLOps lifecycle encompasses various stages and processes, ensuring the smooth integration of machine learning models into real-world applications.

Banner

NEED FOR MLOPS🔮?

Implementing MLOps practices is crucial for several reasons:

  1. Reproducibility: MLOps ensures that the entire machine learning pipeline, from data preprocessing to model deployment, is reproducible. This means that the same results can be obtained consistently, facilitating debugging, testing, and collaboration among team members.

  2. Scalability: MLOps allows for the seamless scaling of machine learning models across different environments and datasets. It enables efficient deployment and management of models, regardless of the volume of data or the complexity of the infrastructure.

  3. Agility: MLOps promotes agility by enabling rapid experimentation, iteration, and deployment of models. It facilitates quick feedback loops, allowing data scientists and engineers to adapt and improve models based on real-world performance and user feedback.

  4. Monitoring and Maintenance: MLOps ensures continuous monitoring of deployed models, tracking their performance and detecting anomalies. It enables proactive maintenance, including retraining models, updating dependencies, and addressing potential issues promptly.

  5. Collaboration: MLOps fosters collaboration among data scientists, engineers, and other stakeholders involved in the machine learning workflow. It establishes standardized practices, tools, and documentation, enabling efficient communication and knowledge sharing.

MLOps GIF

This project aims to implement the MLOps (Machine Learning Operations) lifecycle from scratch. The stages involved in the lifecycle include:

MLOps STAGE 🪜

Set Project🐣

Set up your project environment and version control system for MLOps.

  1. Create a Python virtual environment to manage dependencies.
  2. Initialize Git and set up your GitHub repository for version control.
  3. Install DVC (Data Version Control) for efficient data versioning and storage.
  4. Install project dependencies using requirements.txt.
  5. Write utility scripts for logs, exception handling, and common utilities.
Exploratory Data Analysis📊

Perform EDA on your data to gain insights and understand statistical properties.

  1. Explore the data to understand its distribution and characteristics.
  2. Plot charts and graphs to visualize data patterns and relationships.
  3. Identify and handle outliers and missing data points.
Data Pipeline🚧

Create a data ingestion pipeline for data preparation and versioning.

  1. Write a data ingestion pipeline to split data into train and test sets.
  2. Store the processed data as artifacts for reproducibility.
  3. Implement data versioning using DVC for maintaining data integrity.
  4. Use the Faker library to generate synthetic data with noise for testing purposes.
Data Transformation🦾

Perform data transformation tasks to ensure data quality and consistency.

  1. Write a script for data transformation, including imputation and outlier detection.
  2. Handle class imbalances in the dataset.
  3. Implement One-Hot-Encoding and scaling for features.
Model Training🏋️

Train and tune multiple classification models and track experiments.

  1. Train and tune various classification models on the data.
  2. Use MLflow for experimentation and tracking model metrics.
  3. Log results in the form of JSON to track model performance.
Validation Pipeline✅

Create a Pydantic pipeline for data preprocessing and validation.

  1. Define a Pydantic data model to enforce data validation and types.
  2. Implement a pipeline for data preprocessing and validation.
  3. Verify the range of values and data types for data integrity.
Create a FastAPI⚡

Build a FastAPI to make predictions using your trained models.

  1. Develop a FastAPI application to serve predictions.
  2. Integrate the trained models with the FastAPI endpoint.
  3. Provide API documentation using Swagger UI.
Test the API⚗️

Conduct thorough testing of your FastAPI application.

  1. Use Pytest to test different components of the API.
  2. Test data types and handle missing input scenarios.
  3. Ensure the API responds correctly to various inputs.
Containerization and Orchestration🚢

Prepare your application for deployment using containers and orchestration.

  1. Build a Docker image for your FastAPI application.
  2. Push the Docker image to Azure Container Registry (ACR).
  3. Test the application locally using Minikube.
  4. Deploy the Docker image from ACR to Azure Kubernetes Service (AKS) for production.
CI/CD🔁

Set up a Continuous Integration and Continuous Deployment pipeline for your application.

  1. Configure CI/CD pipeline for automated build and testing.
  2. Deploy the application on Azure using CI/CD pipelines.

Run by yourself🏃‍♂️

  1. Clone the repository:
git init
git clone https://github.com/karan842/mlops-best-practices.git
  1. Create a virtual environment
python -m venv env
  1. Install dependencies
pip install -r requirements.txt
  1. Run the Flask App
python app.py
  1. Run Data and Model pipeline (Enable MLFLOW)
mlflow ui
dvc init
dvc repro
  1. Test the application
pytest

Contribute to it🌱

To make contribution in this project:

  • Clone the repository.
  • Fork the repository.
  • Make changes.
  • Create a Pull request.
  • Also, publish an issue!

Machine Learning Tool Stack📚

Infrastructure Tooling

Acknowledgement📃:

  1. Machine Learning in Production (DeepLearning.AI) - Coursera
  2. MLOps communities from Discord, Twitter, and LinkedIn
  3. Kubernetes, MLFlow, Pytest official documents
  4. Microsoft Learning
  5. ChatGPT and Bard

Connect Me🤝:

Gmail | LinkedLin | Twitter

Tip!

Press p or to see the previous file or, n or to see the next file

About

Practical guide to build end-to-end machine learning pipeline and deploy your model in production,

Collaborators 1

Comments

Loading...