multimodal-TCGA

Things to showcase:

Multimodal is better than uni-modal
Fine-tuning a pre-trained model is better than training from scratch
Better embeddings yield better results

TASK: Predicting survival of cancer patients using multimodal data from TCGA

The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that sequenced and molecularly characterized over 11,000 cases of primary cancer samples. The goal of TCGA is to improve the ability to diagnose, treat and prevent cancer through an understanding of the molecular basis of the disease. TCGA data is organized into six data types: clinical, biospecimen, genomic, epigenomic, transcriptomic, and proteomic. The data is available through the Genomic Data Commons (GDC) Data Portal and Imaging Data Commons (IDC) Data Portal.

In this project, we will use multimodal data from TCGA to predict survival of cancer patients. We will use the following data types:

Clinical data
Pathology whole slide images (WSI)
Radiology images

We use pretrained foundational models to generate embeddings for each modality. To evaluate the performance of the embeddings generated, we use the embeddings to train a simple neural network model for some classification/regression task. The best performing model is then used as the embedding generator for the multimodal transformer model.

Embedding Models

Clinical data

To generate the embeddings for the clinical data, we study the performance of the following foundational models that are trained on the clinical data:

Pathology WSI and Radiology images

To generate the embeddings for the pathology WSI and radiology images, we use the REMEDIS model.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
lib		lib
plots		plots
src		src
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
main.py		main.py
main2.py		main2.py
pathology-embeddings-shapes.csv		pathology-embeddings-shapes.csv
requirements.txt		requirements.txt
survival.py		survival.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multimodal-TCGA

Embedding Models

Clinical data

Pathology WSI and Radiology images

About

Releases

Packages

Languages

Aakash-Tripathi/multimodal-TCGA

Folders and files

Latest commit

History

Repository files navigation

multimodal-TCGA

Embedding Models

Clinical data

Pathology WSI and Radiology images

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages