Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

README.md 6.4 KB

You have to be logged in to leave a comment. Sign In

๐Ÿง  Brain Tumor Classification with MRI Scans

This repository contains a machine learning project for classifying brain tumors using MRI images. The model is trained to detect and categorize four types of brain conditions from axial brain scan images.

๐Ÿ‘ค Author

Daniel Egbo

๐Ÿงฉ Problem Statement

Brain tumors pose a serious health challenge, requiring timely and accurate diagnosis to improve treatment outcomes. Magnetic Resonance Imaging (MRI), especially T1-weighted contrast-enhanced scans, is widely used for brain tumor detection. However, manual interpretation of MRI scans is time-consuming and can be subject to inter-observer variability.

This project aims to automate the classification of brain tumors from MRI images using a machine learning model. The objective is to accurately categorize axial brain scans into one of four classes โ€” glioma, meningioma, pituitary tumor, or no tumor โ€” thereby assisting radiologists in diagnosis and reducing diagnostic delays.

๐Ÿ—‚๏ธ Dataset Description

The dataset used in this project is sourced from the Kaggle Brain Tumor MRI Dataset. It consists of T1-weighted contrast-enhanced MRI images captured in the axial plane. The images are grouped into four categories, each representing a distinct medical condition:

  • glioma โ€“ a type of tumor that arises from glial cells in the brain.
  • meningioma โ€“ typically a slow-growing tumor that forms on the meninges, the membranes covering the brain and spinal cord.
  • pituitary โ€“ tumors originating in the pituitary gland, located at the base of the brain.
  • notumor โ€“ MRI scans that show no evidence of tumor.

Each category is stored in a separate subdirectory, and the images are in JPEG format. The dataset is balanced and suitable for supervised image classification tasks.

โœ… Requirements

This project is built using a modern MLOps stack and requires the following tools and libraries:

  • Python 3.10+ โ€” Core programming language for data preprocessing, model training, and orchestration
  • torch โ€” Deep learning framework used to build and train the brain tumor classification model
  • torchvision โ€” Utilities for image transformations and loading image datasets
  • scikit-learn โ€” Metrics, evaluation tools, and utilities for model validation
  • MLflow โ€” For experiment tracking, model logging, and registry management
  • prefect โ€” Workflow orchestration to manage the ML pipeline as reproducible tasks and flows
  • Docker โ€” Containerization of the training and inference environments
  • AWS ECR (Elastic Container Registry) โ€” Storage for Docker images used in deployment
  • AWS ECS (Elastic Container Service) โ€” For deploying the trained model as a scalable web service

๐Ÿš€ Getting Started

1. Clone the repository

git clone https://github.com/Danselem/brain_mri.git
cd brain_mri

The project makes use of Makefile and Astral uv. Click the Astral link to see the details of the package and how to install it.

2. Create and activate a virtual environment

To create and activate an environment:

make init

3. โš™๏ธ Install dependencies

make install

4. Fetch Data

make fetch-data

This will fetch the data from Kaggle and store it in the data repo. Ensure you have a Kaggle account and set up your API key.

5. Set up MLflow server

There are two options to set up MLflow

  1. Use AWS EC2 and S3 Ensure terraform is installed on your PC and you have AWS credentials set up on your PC with aws configure. Next, cd infra then follow the instructions in infra for a complete set up of AWS resources including EC2, RDS, S3, Kinesis, Lambda, etc.

  2. Use DagsHub Sign up at Dagshub and obtain an API key and create a project repo. After that, run the command to create a .env file:

make env

Next, fill the .env file with the right information.

6. Start the orchestrator.

This project uses Prefect for running the ML pipeline. To start the prefect server, run the command:

make prefect

This will start a prefect server running at https://127.0.0.1/4200.

7 Run the ML Pipeline

To run the pipeline,

make pipeline

This will proceed to load the data, transform it and start the parameter tuning. See image below for the prefect modeling pipeline

Prefect.

It will also log the ML experiments in Dagshub and also register the best model. For example, see below. Prefect.

All experiments ran for this project can be accessed in Dagshub.

8. Fetch and serve the best model

fetch-best-model

The above command will fetch the registered model from the Dagshub MLFlow server and save it in the models repo. With this, we are ready to serve the model.

9. Serve the model locally

Test the local deployment

make serve_local

10. Build the Docker container

make build

11. Start and run the Docker container

make run

12. Push the container to AWS ECR

make ecr

This uses the ecr bash script to create and container and push to AWS ECR. Here is the sample below:

ECR

13. Deploy the container to AWS ECS

make ecs

This uses the ecs bash script and deploy the container to AWS ECS. Here is the sample below:

ECS

๐Ÿงช Testing with Pytest

To test your setup or add unit tests:

make test

๐Ÿ“Š Evaluation

  • Accuracy and loss plots
  • Confusion matrix

Performance metrics are saved in the MLFlow server.

๐Ÿ“š References

  • Masoud Nickparvar, Brain Tumor MRI Dataset โ€“ Kaggle Dataset
  • Related works on medical image classification with deep learning

๐Ÿ“œ License

This project is for educational and research purposes only. Please refer to the dataset's license on Kaggle for usage terms. This project is licensed under the MIT License.


๐Ÿ™‹๐Ÿฝโ€โ™€๏ธ Contact

Made with ๐Ÿ’ป by Daniel Egbo. Feel free to reach out with questions, issues, or suggestions.

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...