All genome coordinates are stored as 0-based half-open intervals (i.e. positions are counted starting with 0, and the second coordinate points at the position right after the last position of an interval).
- The data are located in folder
<experiment>/data/
- The data for the individual experiments (reference, query, chromosome lengths) are described in file
<experiment>/experiments.tsv
- The results are located in folder
<experiment>/results/
- p-values computed by MCDP are located in folder
<experiment>/results/sf/direct_eigen/
- p-values computed by sampling from gold null hypothesis are located in folder
<experiment>/results/sf/perm_nc/10000/
- p-values computed by SBDP 1 are located in folder
<experiment>/results/sf/dp/<scaling>/
- running times and peak memory usages are located in folders
<experiment>/results/metrics/<algorithm>/...
- p-values computed by MCDP are located in folder
The data are prepared and provided by Sarmashghi and Bafna (2019)1 (published with their consent).
orig1
- ECorig2
- CNVorig3
- H3K4me3orig4
- CS
The CNV maps were obtained from Supplementary Table S9 and S10 of the publication by Zarrei et al. (2015)2.
The classification of gene names into categories was obtained from Supplementary Table S4 of the publication by Zarrei et al. (2015)2.
We used gene names (first column) as gene identificators.
The gene coordinates were obtained from UCSC Gene Browser RefSeq on human genome hg19, track ncbiRefSeq
and merged using ncbiRefSeqLink
table.
We had two strategies to merge multiple annotations with the same gene name:
- (used in the paper)
all
: all annotations lex-smallest
: the one with the lexicographically smallest item ID
Footnotes
-
Sarmashghi S, Bafna V. Computing the Statistical Significance of Overlap between Genome Annotations with ISTAT. Cell Syst. 2019;8(6):523-529.e4. doi:10.1016/j.cels.2019.05.006 ↩ ↩2
-
Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet. 2015;16(3):172-183. doi:10.1038/nrg3871 ↩ ↩2