Somatic Mutation Data File Format

The binary somatic mutation data file is loaded for usage by the pyNBS algorithm using the load_binary_mutation-data function. The binary somatic mutation data file can be represented in two file formats:

List Format

The default format for the binary somatic mutation data file is the list format. This file format is a 2-column csv or tsv list where the 1st column is a sample/patient and the 2nd column is a gene mutated in the sample/patient. There are no headers in this file format. Loading data with the list format is typically faster than loading data from the matrix format.The following text is the list representation of the matrix above.

TCGA-04-1638	A2M
TCGA-23-1029	A1CF
TCGA-23-2647	A2BP1
TCGA-24-1847	A2M
TCGA-42-2589	A1CF

Matrix Format

The matrix binary somatic mutation data format is a binary csv or tsv matrix with rows represent samples/patients and columns represent genes. The following table is a small excerpt of a matrix somatic mutation data file:

	A1CF	A2BP1	A2M
TCGA-04-1638	0	0	1
TCGA-23-1029	1	0	0
TCGA-23-2647	0	1	0
TCGA-24-1847	0	0	1
TCGA-42-2589	1	0	0

Note

If the user has a TCGA MAF file downloaded from The Broad Institute's Firehose, the user can use the process_TCGA_MAF function to construct a binary somatic mutation file that is usable by the pyNBS package.
All somatic mutation data used in our examples can be found here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Somatic Mutation Data File Format

List Format

Matrix Format

Note

Home

Running pyNBS

pyNBS Algorithm

pyNBS Modules

pyNBS Supplement

Appendix

Clone this wiki locally