Reproducibility

General info

All genome coordinates are stored as 0-based half-open intervals (i.e. positions are counted starting with 0, and the second coordinate points at the position right after the last position of an interval).

Folder organisation

The data are located in folder <experiment>/data/
The data for the individual experiments (reference, query, chromosome lengths) are described in file <experiment>/experiments.tsv
The results are located in folder <experiment>/results/
- p-values computed by MCDP are located in folder <experiment>/results/sf/direct_eigen/
- p-values computed by sampling from gold null hypothesis are located in folder <experiment>/results/sf/perm_nc/10000/
- p-values computed by SBDP ¹ are located in folder <experiment>/results/sf/dp/<scaling>/
- running times and peak memory usages are located in folders <experiment>/results/metrics/<algorithm>/...

01 - 4 real datasets from Sarmashghi and Bafna (2019) (`01-4-real-datasets/`)

The data are prepared and provided by Sarmashghi and Bafna (2019)¹ (published with their consent).

Labels of experiments:

orig1 - EC
orig2 - CNV
orig3 - H3K4me3
orig4 - CS

02 - Genes vs CNV maps (`02-genes-vs-cnv-maps/`)

CNV maps

The CNV maps were obtained from Supplementary Table S9 and S10 of the publication by Zarrei et al. (2015)².

Genes

The classification of gene names into categories was obtained from Supplementary Table S4 of the publication by Zarrei et al. (2015)².

We used gene names (first column) as gene identificators. The gene coordinates were obtained from UCSC Gene Browser RefSeq on human genome hg19, track ncbiRefSeq and merged using ncbiRefSeqLink table.

We had two strategies to merge multiple annotations with the same gene name:

(used in the paper) all: all annotations
lex-smallest: the one with the lexicographically smallest item ID

03 - Synthetic data for accuracy (`03-synthetic-data-accuracy`)

04 - Synthetic data for time and memory requirements (`04-synthetic-data-time-mem`)

References

Sarmashghi S, Bafna V. Computing the Statistical Significance of Overlap between Genome Annotations with ISTAT. Cell Syst. 2019;8(6):523-529.e4. doi:10.1016/j.cels.2019.05.006 ↩ ↩²
Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet. 2015;16(3):172-183. doi:10.1038/nrg3871 ↩ ↩²

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducibility

General info

Folder organisation

01 - 4 real datasets from Sarmashghi and Bafna (2019) (`01-4-real-datasets/`)

Labels of experiments:

02 - Genes vs CNV maps (`02-genes-vs-cnv-maps/`)

CNV maps

Genes

03 - Synthetic data for accuracy (`03-synthetic-data-accuracy`)

04 - Synthetic data for time and memory requirements (`04-synthetic-data-time-mem`)

References

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
01-4-real-datasets		01-4-real-datasets
02-genes-vs-cnv-maps		02-genes-vs-cnv-maps
03-synthetic-data-accuracy		03-synthetic-data-accuracy
04-synthetic-data-time-mem		04-synthetic-data-time-mem
README.md		README.md

fmfi-compbio/mc-overlaps-reproducibility

Folders and files

Latest commit

History

Repository files navigation

Reproducibility

General info

Folder organisation

01 - 4 real datasets from Sarmashghi and Bafna (2019) (01-4-real-datasets/)

Labels of experiments:

02 - Genes vs CNV maps (02-genes-vs-cnv-maps/)

CNV maps

Genes

03 - Synthetic data for accuracy (03-synthetic-data-accuracy)

04 - Synthetic data for time and memory requirements (04-synthetic-data-time-mem)

References

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

01 - 4 real datasets from Sarmashghi and Bafna (2019) (`01-4-real-datasets/`)

02 - Genes vs CNV maps (`02-genes-vs-cnv-maps/`)

03 - Synthetic data for accuracy (`03-synthetic-data-accuracy`)

04 - Synthetic data for time and memory requirements (`04-synthetic-data-time-mem`)

Packages