CHEST DISEASE CLASSIFICATION USING CT-SCAN
Overview
This project aims to develop an AI model capable of classifying and diagnosing chest cancer, with a specific focus on adenocarcinoma, the most prevalent form of lung cancer. Leveraging deep learning techniques, particularly Convolutional Neural Networks (CNNs), the model utilizes the pretrained VGG-16 architecture to analyze medical images for cancer detection. The primary objective is to assist healthcare professionals in achieving early and accurate diagnoses, ultimately leading to improved patient outcomes.
Model Architecture
The VGG-16 model is a deep convolutional neural network renowned for its effectiveness in image classification tasks. Pretrained on large-scale image datasets such as ImageNet, VGG-16 possesses a deep network architecture consisting of 16 layers, including convolutional layers with small 3x3 filters and max-pooling layers. By fine-tuning the pretrained VGG-16 model on chest cancer images, we aim to capitalize on its robust feature extraction capabilities for accurate cancer classification.
Usage
Healthcare professionals can utilize the developed AI model as a supplementary tool in chest cancer diagnosis. By inputting medical images into the pretrained VGG-16 model, clinicians can receive automated predictions regarding the presence of adenocarcinoma, facilitating timely intervention and treatment planning.
Benefits
- Early Detection: The AI model enables early detection of chest cancer, particularly adenocarcinoma, which is crucial for improving patient prognosis.
- Accuracy: By leveraging deep learning techniques and pretrained models like VGG-16, the model achieves high levels of accuracy in cancer classification.
- Efficiency: Automated classification of medical images streamlines the diagnostic process, allowing healthcare professionals to focus on patient care and treatment decisions.
MLOps Implemented
- MLFlow: Experiment Tracking
- DVC/Data Version Control: Pipeline Tracking
Deployment
Using Jenkins and AWS EC2, ECR
Project Implementation
- Create an acc in local repo and connect it with the github
- Then create the READMe.md, .gitignore, LICENSE files
- Then create the template.py file
- Created the env and activated
- We are going to write the constants in yaml file instead of constants
- Edit the requirements.txt file and setup.py file
- Python-box is also used to manage the exception
- Here custom logging is written on the constructor file in src/cnnclassifier. By doing this, no need to mention the logger folder (first approach)
- You can also do it in another way of creating the logging folder and inside the folder create a constructor file, write the code (second approach)
- Create a common.py file inside the utils → code
- Configbox: to make it easily callable (refer trails.ipynb)
- Ensure_annotation: make it bug free (refer trails.ipynb)
Workflows
- Update the config.yaml
- Update params.yaml
- Update the entity
- Update the configuration manager in src config
- Update the components
- Update the pipeline
- Update the main.py
- Update the dvc.yaml
Mlflow --> Experiment Tracking :
Example :
ElasticNet : parameters (alpha , L1 )
0.7 , 0.9 --> exp 1 :70%
0.5 , 0.5 --> exp 2 :80%
0.4 , 0.6 --> exp 3 :50%
CSV Exp 1 , Exp 2 , Exp 3
DagsHub
- Create a repo in GitHub
- Go to DagsHub
- Create a new repository
- Connect to repo
- GitHub
- Connect to the repo
Copy the experiment tracking to the README file in GitHub:
MLFLOW_TRACKING_URI=https://dagshub.com/-------/mlflow_demo.mlflow \
MLFLOW_TRACKING_USERNAME=------ \
MLFLOW_TRACKING_PASSWORD=------3a547cee2bfa163992db880d6b571b70 \
python script.py
export MLFLOW_TRACKING_URI=https://dagshub.com/-----/mlflow_demo.mlflow
export MLFLOW_TRACKING_USERNAME=--------
export MLFLOW_TRACKING_PASSWORD=------63a547cee2bfa163992db880d6b571b70
How to select the good one among many?
- Select every experiments
- Compare
- We get a parallel coordinate plot
Criteria for selecting the best model:
- Accuracy wise: r2 (It should be high)
- Mean Absolute Error (MAE): should be low
- Root Mean Squared Error (RMSE): should be low
Thus instead of performing hyperparameter tuning on DL model (which is costly and time-taking task), we could simply compare and obtain the best model from MLflow - DagsHub.
Components
Data Ingestion
- Upload the data in the drive and download it using gdown
- Now in config → config.yaml: code
- Create a data ingestion file in the research folder
- Now update the entity: which is the return type of the function → create config_entity.py
- Update the config → configuration.py
- Now the component → create a file: data_ingestion.py
- Now update the pipeline
- Update the endpoint → main.py
- Update the utils → common.py
Create a base model
- Update the config.yaml file: prepare_base_model
- Params.yaml
- Update the preparebasemodel.ipynb
- Entity → config entity
- Then update the config → configuration manager
- Components → create a file prepare_base_model.py
- Update the pipeline
- Then the endpoint: main.py
Model Trainer
- Create a file modeltrainer.ipynb and update it
- Entity → configentity.py
- Config → configuration.py
- Components → create a file: modeltrainer.py
- Update the pipeline
- Then the endpoint: main.py
Model Evaluation using MLflow
- Config is not required here
- Create an ipynb file model evaluation with MLflow
- Connect the repo to DagsHub
- Then MLflow tracking in bash
In Jupyter Notebook:
os.environ["MLFLOW_TRACKING_URI"]="https://dagshub.com/------/Chest-Disease-Classification-using-CT-Scan-Image.mlflow"
os.environ["MLFLOW_TRACKING_USERNAME"]="------"
os.environ["MLFLOW_TRACKING_PASSWORD"]="------63a547cee2bfa163992db880d6b571b70”
- Update the entity
- Config → configuration
- Components → model evaluation
- Pipeline
- End point → main.py
- Dvc.yaml → Update the file
- Now execute the commands
- Execute Dvc init (dvc folder has been created)
- Execute DVC repro
- In dvc.lock file → it has saved all the metadata
- If you execute the dvc repro it will show:
Stage 'data_ingestion' didn't change, skipping
Stage 'prepare_base_model' didn't change, skipping
Stage 'training' didn't change, skipping
Stage 'evaluation' didn't change, skipping
Data and pipelines are up to date.
Prediction pipeline
- Prediction.py file added
- Create a folder → model → copy the model during training
- Push the changes to GitHub
User App
- Create the index.html
- And app.py using Flask
Deployment
- Create a Dockerfile
- .dockerignore
- Docker-compose.yml
- Create a .jenkins folder → then inside that create a Jenkinsfile
- Scripts → ec2_instance.sh, Jenkins.sh files created
Deployment
- Create a Dockerfile
- .dockerignore
- Docker-compose.yml
- Create a .jenkins folder → then inside that create a Jenkinsfile
- Scripts → ec2_instance.sh, Jenkins.sh files created
AWS
AWS login
- Create IAM user:
- IAM → User → Create user → User name: chest-user → Permission policy: Administrator access → Create user
Setup the security credential
Security credential → create access keys → command line interface → permission → create access keys: download as CSV
Now we need to launch a Jenkins Server: here EC2
- EC2 → Launch instance
- Name: jenkin-machine → Ubuntu → Amazon Machine Image → Instance type: → Create key pair → Launch instance
- Take the instance created → connect
Now we need to setup the Jenkins
#!/bin/bash
sudo apt update
sudo apt install openjdk-8-jdk -y
wget -q -O - https://pkg.jenkins.io/debian-stable/jenkins.io.key | sudo apt-key add -
sudo sh -c 'echo deb https://pkg.jenkins.io/debian-stable binary/ > /etc/apt/sources.list.d/jenkins.list'
sudo apt-get update
sudo apt-get install jenkins -y
sudo systemctl start jenkins
sudo systemctl enable jenkins
sudo systemctl status jenkins
## Installing Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
sudo usermod -aG docker jenkins
newgrp docker
sudo apt install awscli -y
sudo usermod -a -G docker jenkins
## AWS configuration & restarts jenkins
aws configure
sudo systemctl restart jenkins
## Now setup elastic IP on AWS
## For getting the admin password for jenkins
sudo cat /var/lib/jenkins/secrets/initialAdminPassword
Setting the Elastic IP
- Allocate Elastic IP address → keep it as default → allocate
- Associate this elastic IP address → instance → select the instance → Associate
- Select the instance → security → security groups → edit inbound rules → add rules → 8080 & 0.0.0.0/0
- Copy the public IP and load in a new page → Jenkins will be running → administrator password: paste → continue
- For administrator password: execute sudo cat /var/lib/jenkins/secrets/initialAdminPassword in EC2 terminal
- Install suggested plugins → automatically installing required items → create First admin user
- Username: , password: , full name: , email: → save and finish → obtain the URL for Jenkins → Log into Jenkins Server
Now we have to set the secret variable in Jenkins server:
- Manage Jenkins → credentials → system → global credentials → add credential → secret text → ECR_REPOSITORY:
Create an ECR repo in AWS
- AWS → ECR → create repo → private → name → create
- Copy the URI → paste in the Jenkins server: ECR_REPOSITORY
Next secret variable:
- Secret text → Global → AWS_ACCOUNT_ID: copy the ID from AWS account
- Secret text → Global → AWS_ACCESS_KEY_ID: copy from the downloaded CSV
- Secret text → Global → AWS_SECRET_ACCESS_KEY: copy from the downloaded CSV
- SSH Username with private key → Global → ssh_key → Enter directly → add → copy the PEM file
Dashboard → Manage Jenkins → Plugins → Available plugins → SSH agents → install → install and restart → Again log into Jenkins server → verify whether the credentials are added
Now create a pipeline:
- New item → Pipeline name → Pipeline → okay → Pipeline script from SCM → SCM: Git → paste the repo URL → branch: main → path of Jenkinsfile → save
Now we have to create another EC2 instance for the application:
- EC2 → instance → Launch instance → Name: → Ubuntu → t2large → key value pair for the EC2 -1 → 32 GB → launch instance
Instance → Connect
#!/bin/bash
sudo apt update
sudo apt-get update
sudo apt upgrade -y
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
newgrp docker
sudo apt install awscli -y
## AWS configuration
aws configure
## Now setup elastic IP on AWS
Run it on the EC2-2 instance terminal
sudo apt update
sudo apt-get update
sudo apt upgrade -y
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
newgrp docker
sudo apt install awscli -y
AWS configuration
aws configure
Now setup elastic IP on AWS
Now create an Elastic IP for the EC2-2 instance:
- Allocate Elastic IP address → allocate → Associate Elastic IP address → Associate for the EC2-2 instance
Now open up the Jenkinsfile → change the public IP near the ssh_key from the EC2-2 instance
Now create a folder .github → create another folder inside → workflows → create a file: main.yaml → copy the code
Now in GitHub → settings → secrets and variables → actions → create new repository secret:
- URL: Jenkins URL
- USER: Jenkins username
- TOKEN: Jenkins dashboard → profile → configure → API token → add new Token → Generate → copy the Token
- JOB: job created in Jenkins: pipeline
Now we can push the code into GitHub. GitHub repo will trigger the Jenkins Server. Manually Trigger the pipeline/workflow:
- GitHub → actions → Trigger Jenkins Job → Run workflow → run workflow → workflow starts → and triggered the Jenkins server → Build has started in Jenkins Server → Build the images and push the image into ECR and execute the application