-
Notifications
You must be signed in to change notification settings - Fork 22
Somatic Mutation Data File Format
The binary somatic mutation data file is loaded for usage by the pyNBS algorithm using the load_binary_mutation-data
function. The binary somatic mutation data file can be represented in two file formats:
The default format for the binary somatic mutation data file is the list format. This file format is a 2-column csv or tsv list where the 1st column is a sample/patient and the 2nd column is a gene mutated in the sample/patient. There are no headers in this file format. Loading data with the list format is typically faster than loading data from the matrix format.The following text is the list representation of the matrix above.
TCGA-04-1638 A2M
TCGA-23-1029 A1CF
TCGA-23-2647 A2BP1
TCGA-24-1847 A2M
TCGA-42-2589 A1CF
The matrix binary somatic mutation data format is a binary csv or tsv matrix with rows represent samples/patients and columns represent genes. The following table is a small excerpt of a matrix somatic mutation data file:
A1CF | A2BP1 | A2M | |
---|---|---|---|
TCGA-04-1638 | 0 | 0 | 1 |
TCGA-23-1029 | 1 | 0 | 0 |
TCGA-23-2647 | 0 | 1 | 0 |
TCGA-24-1847 | 0 | 0 | 1 |
TCGA-42-2589 | 1 | 0 | 0 |
- If the user has a TCGA MAF file downloaded from The Broad Institute's Firehose, the user can use the
process_TCGA_MAF
function to construct a binary somatic mutation file that is usable by the pyNBS package. - All somatic mutation data used in our examples can be found here.