Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

README.md 4.3 KB

You have to be logged in to leave a comment. Sign In

Molecular Classification of Cancer by Gene Expression Monitoring

Description

For this analysis to predict classes I was interested to learn more about using genetic information to predict pathological diseases. This project is wrapped up in a Docker container with the goal to predict the class of a new patient without the need to set up the programming environment on an external local computer. The pipeline of the model and the accompanied metric files are visualized as in the above Data Pipeline.

While this model is performed on a relatively small dataset, the same concept could be applied to large datasets with many new patients at once, or customers, that have to be categorized into separate classes to receive the appropriate treatment, respectively, service.

Dataset background

For those interested, I'll provide some more information about the dataset used for the classication model. Althugh, I don't have a medical background I've been fascinated by the human body and in particular the role of our genetic profile that guides internal processes and the interaction between these internal processes and the relationship with our endocrine system. The dataset used here has been derived from a study performed in 1999 with the goal to classify cancer types into subtypes in order to better target the cancer and provide better treatment to the patient. The paper of the study, which is a very interesting read, is added to the bottom of this page. The below is a summary of the introduction to the study.

The challenge of cancer treatment has been to target specific therapies to pathogenetically distinct tumor types, to maximize efficacy and minimize toxicity. Tumors with similar histopathological appearance can follow significantly different clinical courses and show different responses to therapy. Acute leukemias is such an example. This type of cancer has for example the subtypes acute lymphoblastic leukemia, ALL, that arises from lymphoid precursors and myeloid leukemia, AML, that arises from myeloid precursors. These subtypes respond differently to the same chemotherapy. What is new in this study is the approach to classify cancers not on their biological insights, but through a more systematic approach to recognize tumor subtypes based on global gene expressions analysis. The paper describes two approaches to this challenge: class discovery and class prediction. Class discovery refers to defining previously unrecognized tumor subtypes. Class prediction refers to the assignment of particular tumor samples to already-defined classes, which could reflect current states or future outcomes.

For this project I will focus only on the class prediction challenge. The results of predicting the subtype of a cancer demonstrates the feasibility of cancer classification based solely on gene expression monitoring. For each gene in the dataset (7100 genes in total) a quantitative expression level was measured as input for the model.

Consequently, the success of this approach suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge. And thus, provide a better targeted treatment therapy for the patient.

Getting Started

Prerequisites

Because the output of this project returns predictions through a Docker container, I assume that the Docker Engine is installed and running.

Installation

To build the image run the following in the command line:

docker build -t prediction-model -f Dockerfile .

Usage

To view the performance metric run the follwoing:

docker run prediction-model cat /reports/metrics/eval.json

To make predictions run:

docker run prediction-model python ./predict.py

Alternatively, use docker-compse.yml and run the following to view the predictions:

docker-compose build
docker-compose up

To stop the container from running use:

docker-compose stop

Authors and acknowledgment

Annalie Kruseman

Feel free to contact me for any questions on annaliakruseman@gmail.com.

Dataset downloaded from Kaggles Gene Expression dataset

Paper written by T. R. Golub et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. (1999)

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...