Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Annalie Kruseman daa78d1845
Hyperparameter tuning exploration
3 years ago
94ba598c1c
Remove evaluation stage from pipeline
3 years ago
5428316df0
Add docker-compose file
3 years ago
8b4d182755
Build pipeline
3 years ago
daa78d1845
Hyperparameter tuning exploration
3 years ago
8b4d182755
Build pipeline
3 years ago
daa78d1845
Hyperparameter tuning exploration
3 years ago
5428316df0
Add docker-compose file
3 years ago
cf7136e2c0
Perform feature selection through regularization
3 years ago
ada5a20180
Initialize Git
3 years ago
a5f2b2d6e4
Build and run Docker image
3 years ago
befea4d31b
Add raw data
3 years ago
01ff743e09
Add gitignore file
3 years ago
ada5a20180
Initialize Git
3 years ago
5428316df0
Add docker-compose file
3 years ago
c0b8d81d89
Add docker-compose usage to README
3 years ago
5428316df0
Add docker-compose file
3 years ago
cf7136e2c0
Perform feature selection through regularization
3 years ago
cf7136e2c0
Perform feature selection through regularization
3 years ago
cf7136e2c0
Perform feature selection through regularization
3 years ago
cf7136e2c0
Perform feature selection through regularization
3 years ago
cf7136e2c0
Perform feature selection through regularization
3 years ago
cf7136e2c0
Perform feature selection through regularization
3 years ago
c49f641168
Change docker model naming
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Molecular Classification of Cancer by Gene Expression Monitoring

Description

For this analysis to predict classes I was interested to learn more about using genetic information to predict pathological diseases. This project is wrapped up in a Docker container with the goal to predict the class of a new patient without the need to set up the programming environment on an external local computer. The pipeline of the model and the accompanied metric files are visualized as in the above Data Pipeline.

While this model is performed on a relatively small dataset, the same concept could be applied to large datasets with many new patients at once, or customers, that have to be categorized into separate classes to receive the appropriate treatment, respectively, service.

Dataset background

For those interested, I'll provide some more information about the dataset used for the classication model. Althugh, I don't have a medical background I've been fascinated by the human body and in particular the role of our genetic profile that guides internal processes and the interaction between these internal processes and the relationship with our endocrine system. The dataset used here has been derived from a study performed in 1999 with the goal to classify cancer types into subtypes in order to better target the cancer and provide better treatment to the patient. The paper of the study, which is a very interesting read, is added to the bottom of this page. The below is a summary of the introduction to the study.

The challenge of cancer treatment has been to target specific therapies to pathogenetically distinct tumor types, to maximize efficacy and minimize toxicity. Tumors with similar histopathological appearance can follow significantly different clinical courses and show different responses to therapy. Acute leukemias is such an example. This type of cancer has for example the subtypes acute lymphoblastic leukemia, ALL, that arises from lymphoid precursors and myeloid leukemia, AML, that arises from myeloid precursors. These subtypes respond differently to the same chemotherapy. What is new in this study is the approach to classify cancers not on their biological insights, but through a more systematic approach to recognize tumor subtypes based on global gene expressions analysis. The paper describes two approaches to this challenge: class discovery and class prediction. Class discovery refers to defining previously unrecognized tumor subtypes. Class prediction refers to the assignment of particular tumor samples to already-defined classes, which could reflect current states or future outcomes.

For this project I will focus only on the class prediction challenge. The results of predicting the subtype of a cancer demonstrates the feasibility of cancer classification based solely on gene expression monitoring. For each gene in the dataset (7100 genes in total) a quantitative expression level was measured as input for the model.

Consequently, the success of this approach suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge. And thus, provide a better targeted treatment therapy for the patient.

Getting Started

Prerequisites

Because the output of this project returns predictions through a Docker container, I assume that the Docker Engine is installed and running.

Installation

To build the image run the following in the command line:

docker build -t prediction-model -f Dockerfile .

Usage

To view the performance metric run the follwoing:

docker run prediction-model cat /reports/metrics/eval.json

To make predictions run:

docker run prediction-model python ./predict.py

Alternatively, use docker-compse.yml and run the following to view the predictions:

docker-compose build
docker-compose up

To stop the container from running use:

docker-compose stop

Authors and acknowledgment

Annalie Kruseman

Feel free to contact me for any questions on annaliakruseman@gmail.com.

Dataset downloaded from Kaggles Gene Expression dataset

Paper written by T. R. Golub et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. (1999)

Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...