This directory contains various analysis modules in the OpenPedCan project. See the README of an individual analysis modules for more information about that module.
The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules. This is in service of documenting interdependent analyses. Note that nearly all modules use the harmonized clinical data file (histologies.tsv
) even when it is not explicitly included in the table below.
Module | Input Files | Brief Description | Produces files for data release? | Output Files Consumed by Other Analyses | OT compatibility | Adapted for OPC? | Run Platform | Action Plan |
---|---|---|---|---|---|---|---|---|
chromosomal-instability | histologies.tsv sv-manta.tsv.gz cnv-cnvkit.seg.gz |
Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals | No | breakpoint-data/union_of_breaks_densities.tsv |
No | No | N/A | Will Adapt for OT |
chromothripsis | sv-manta.tsv.gz cnv-consensus.seg.gz independent-specimens.wgs.primary-plus.tsv |
chromothripsis analysis per #1007 | No | N/A | No | No | N/A | N/A |
cnv-chrom-plot | cnv-consensus-gistic.zip cnv-consensus.seg |
Plots genome wide visualizations relating to copy number results | No | N/A | No | No | N/A | N/A |
cnv-comparison | Earlier version of SEG files | Deprecated; compared earlier version of the CNV methods. | No | N/A | No | No | N/A | N/A |
cnv-frequencies | histologies.tsv consensus_wgs_plus_cnvkit_wxs.tsv.gz independent-specimens.wgswxspanel.primary.eachcohort.tsv independent-specimens.wgswxspanel.relapse.eachcohort.tsv independent-specimens.wgswxspanel.primary.tsv independent-specimens.wgswxspanel.relapse.tsv |
Annotate CNV table with mutation frequencies | No | results/gene-level-cnv-consensus-annotated-mut-freq.jsonl.gz results/gene-level-cnv-consensus-annotated-mut-freq.tsv.gz |
Yes | Yes | GitHub | N/A |
collapse-rnaseq | gene-expression-rsem-tpm.rds gencode.v27.primary_assembly.annotation.gtf.gz |
Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. | Yes | results/gene-expression-rsem-fpkm-collapsed.rds included in data download; too large for tracking via GitHub |
Yes | Yes | CAVATICA | N/A |
comparative-RNASeq-analysis | gene-expression-rsem-tpm.rds histologies.tsv mend-qc-manifest.tsv mend-qc-results.tar.gz |
In progress; will produce expression outlier profiles per #229 | No | N/A | No | No | N/A | N/A |
compare-gistic | cnv-consensus-gistic.zip analyses/run-gistic/results/cnv-consensus-hgat-gistic.zip analyses/run-gistic/results/cnv-consensus-lgat-gistic.zip analyses/run-gistic/results/cnv-consensus-medulloblastoma-gistic.zip |
Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma #547 | No | N/A | No | No | N/A | N/A |
copy_number_consensus_call | cnv-cnvkit.seg.gz cnv-controlfreec.tsv.gz sv-manta.tsv.gz |
Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made | Yes | results/cnv_consensus.tsv results/cnv-consensus.seg.gz included in data download ref/cnv_excluded_regions.bed ref/cnv_callable.bed |
Yes | Yes | CAVATICA | N/A |
create-subset-files | All files | This module contains the code to create the subset files used in continuous integration | No | All subset files for continuous integration | No | No | N/A | Will set up for OT ticket in |
efo-mondo-mapping | histologies.tsv efo-mondo-map.tsv |
This module contains a file with EFO, MONDO, and NCIT codes for all cancer_group found in histologies.tsv and runs a script to qc in case any cancer_group is missed | Yes | efo-mondo-mapping.tsv |
No | Yes | N/A | Yes |
filter-mtp-tables | gencode.v38.primary_assembly.annotation.gtf.gz PMTL_v1.1.tsv histologies.tsv gene-level-snv-consensus-annotated-mut-freq.tsv.gz snv-consensus-plus-hotspots.maf.tsv.gz variant-level-snv-consensus-annotated-mut-freq.tsv.gz gene-level-cnv-consensus-annotated-mut-freq.tsv.gz consensus_wgs_plus_cnvkit_wxs.tsv.gz putative-oncogene-fusion-freq.tsv.gz fusion-putative-oncogenic.tsv putative-oncogene-fused-gene-freq.tsv.gz long_n_tpm_mean_sd_quantile_gene_wise_zscore.tsv.gz long_n_tpm_mean_sd_quantile_group_wise_zscore.tsv.gz |
Remove Ensembl (ESNG) gene identifier in the OPenPedCan mutation frequency tables, including SNV, CNV, fusion, and TPM expression tables that are not in GENCODE v38 and Ensembl package 104. | No | All files from module results directory |
No | Yes | N/A | Yes |
focal-cn-file-preparation | cnv-cnvkit.seg.gz cnv-controlfreec.tsv.gz gene-expression-rsem-tpm-collapsed.rds cnv-consensus.seg.gz |
Maps from copy number variant caller segments to gene identifiers; will be updated to take into account changes that affect entire cytobands, chromosome arms #186 | Yes | results/cnvkit_annotated_cn_wxs_autosomes.tsv.gz results/cnvkit_annotated_cn_wxs_x\_and_y.tsv.gz results/consensus_seg_annotated_cn_autosomes.tsv.gz results/consensus_seg_annotated_cn_x\_and_y.tsv.gz results/consensus_wgs_plus_cnvkit_wxs.tsv.gz included in data download results/consensus_wgs_plus_cnvkit_wxs_autosomes.tsv.gz included in data download results/consensus_wgs_plus_cnvkit_wxs_x\_and_y.tsv.gz included in data download |
Yes | Yes | CAVATICA | N/A |
fusion_filtering | fusion-arriba.tsv.gz fusion-starfusion.tsv.gz independent-specimens.rnaseq.primary.tsv independent-specimens.rnaseq.relapse.tsv |
Standardizes, filters, and prioritizes fusion calls | Yes | results/fusion-putative-oncogenic.tsv included in data download results/fusion-recurrent-fusion-bycancergroup.tsv results/fusion-recurrent-fusion-bysample.tsv results/fusion-recurrently-fused-genes-bycancergroup.tsv results/fusion-recurrently-fused-genes-bysample.tsv |
Yes | Yes | GitHub | N/A |
fusion-frequencies | histologies.tsv fusion-putative-oncogenic.tsv fusion-dgd.tsv.gz independent-specimens.rnaseqpanel.primary.tsv independent-specimens.rnaseqpanel.relapse.tsv independent-specimens.rnaseqpanel.primary.eachcohort.tsv independent-specimens.rnaseqpanel.relapse.eachcohort.tsv |
Gather counts and frequencies for fusion per cancer_group and cohort | results/putative-oncogene-fused-gene-freq.jsonl.gz results/putative-oncogene-fused-gene-freq.tsv.gz results/putative-oncogene-fusion-freq.jsonl.gz results/putative-oncogene-fusion-freq.tsv.gz |
N/A | Yes | Yes | GitHub | N/A |
fusion-summary | histologies.tsv fusion-putative-oncogenic.tsv fusion-arriba.tsv.gz fusion-starfusion.tsv.gz |
Generate summary tables from fusion files (#398; #623) | Yes | results/fusion_summary_embryonal_foi.tsv results/fusion_summary_ependymoma_foi.tsv results/fusion_summary_ewings_foi.tsv |
Yes | Yes | GitHub | N/A |
gene_match | GTF file sources: gencode v28 gencode v38 open_ped_can_v7_ensg-hugo-rmtl-mapping.tsv |
This module reads GTF file and formats attributes to extract gene symbol with gene ensembl ID. |
Yes | results\ensg-hugo-pmtl-mapping.tsv |
Yes | Yes | GitHub | N/A |
gene-set-enrichment-analysis | gene-expression-rsem-tpm-collapsed.rds histologies.tsv |
Updated gene set enrichment analysis with appropriate RNA-seq expression data | No | results/gsva_scores.tsv combined file for all RNA library types |
Yes | Yes | GitHub | Move to CAVATICA |
hotspots-detection | snv-strelka2.vep.maf.gz snv-mutect2.vep.maf.gz snv-vardict.vep.maf.gz snv-lancet.vep.maf.gz |
Scavenges cancer any hotspot calls from each caller and merges with consensus (3/3) calls if it was missed in snv-caller workflow. | No | snv-hotspots-mutation.maf.tsv.gz |
No | No | CAVATICA | N/A |
immune-deconv | gene-expression-rsem-tpm-collapsed.rds |
Immune/Stroma characterization across PBTA part of #15 | No | results/xcell_output.rds results/quantiseq_output.rds |
No | No | N/A | N/A |
independent-samples | histologies.tsv |
Generates independent specimen lists for WGS/WXS samples | Yes | results/independent-specimens.wgswxspanel.primary.tsv included in data download results/independent-specimens.wgswxspanel.relapse.tsv included in data download results/independent-specimens.wgswxspanel.primary.eachcohort.tsv included in data download results/independent-specimens.wgswxspanel.relapse.eachcohort.tsv included in data download results/independent-specimens.wgswxspanel.primary.prefer.wxs.tsv included in data download results/independent-specimens.wgswxspanel.relapse.prefer.wxs.tsv included in data download results/independent-specimens.wgswxspanel.primary.eachcohort.prefer.wxs.tsv included in data download results/independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wxs.tsv included in data download results/independent-specimens.rnaseq.primary.tsv included in data download results/independent-specimens.rnaseq.relapse.tsv included in data download results/independent-specimens.rnaseq.primary.eachcohort.tsv included in data download results/independent-specimens.rnaseq.relapse.eachcohort.tsv included in data download |
Yes | Yes | GitHub | N/A |
interaction-plots | independent-specimens.wgs.primary-plus.tsv snv-consensus-mutation.maf.tsv.gz |
Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types e.g., fusions | No | N/A | No | No | N/A | N/A |
long-format-table-utils | ensg-hugo-rmtl-mapping.tsv analyses/fusion_filtering/references/genelistreference.txt efo-mondo-map.tsv uberon-map-gtex-group.tsv uberon-map-gtex-subgroup.tsv |
Functions and scripts for handling long-format tables | No | annotator/annotation-data/ensg-gene-full-name-refseq-protein.tsv annotator/annotation-data/oncokb-cancer-gene-list.tsv |
Yes | Yes | GitHub | N/A |
methylation-preprocessing | metadata\TARGET_Normal_MethylationArray_20160812.sdrf.txt metadata\TARGET_NBL_MethylationArray_20160812.sdrf.1.txt TARGET_NBL_MethylationArray_20160812.sdrf.2.txt metadata/TARGET_CCSK_MethylationArray_20160819.sdrf.txt metadata\TARGET_OS_MethylationArray_20161103.sdrf.txt metadata\TARGET_WT_MethylationArray_20160831.sdrf.txt metadata\TARGET_AML_MethylationArray_20160812_450k.sdrf.1.txt metadata\TARGET_AML_MethylationArray_20160812_450k.sdrf.2.txt metadata\TARGET_AML_MethylationArray_20160812_27k.sdrf.1.txt metadata\TARGET_AML_MethylationArray_20160812_27k.sdrf.2.txt metadata\TARGET_AML_MethylationArray_20160812_27k.sdrf.3.txt metadata\manifest_methylation_CBTN_20220410.1.csv metadata\manifest_methylation_CBTN_20220410.2.csv metadata\manifest_methylation_CBTN_20220410.3.csv metadata\manifest_methylation_CBTN_20220410.4.csv |
Preprocess probe hybridization intensity values of selected methylated and unmethylated cytosine (CpG) loci into usable methylation measurements for the Pediatric Open Targets, OPenPedCan-analysis raw DNA methylation array datasets. | No | N/A | Yes | Yes | GitHub | N/A |
molecular-subtyping-ATRT | analyses/gene-set-enrichment-analysis/results/gsva_scores.tsv gene-expression-rsem-tpm-collapsed.rds analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz snv-consensus-mutation-tmb-all.tsv cnv-consensus-gistic.zip |
Deprecated; Summarizing data into tabular format in order to molecularly subtype ATRT samples #244; this analysis did not work | No | N/A | No | No | N/A | N/A |
molecular-subtyping-CRANIO | histologies-base.tsv snv-consensus-plus-hotspots.maf.tsv.gz |
Molecular subtyping of craniopharyngiomas samples #810 | No | results/CRANIO_molecular_subtype.tsv |
No | No | N/A | Prepare for scaling |
molecular-subtyping-EPN | histologies-base.tsv gene-expression-rsem-tpm-collapsed.rds analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv analyses/fusion-summary/results/fusion_summary_ependymoma_foi.tsv analyses/gene-set-enrichment-analysis/results/gsva_scores.tsv |
molecular subtyping of ependymoma tumors | No | results/EPN_all_data_withsubgroup.tsv |
No | No | N/A | Will Adapt for OT |
molecular-subtyping-EWS | histologies-base.tsv analyses/fusion-summary/results/fusion_summary_ewings_foi.tsv |
Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per #623 | No | results/EWS_samples.tsv |
No | No | N/A | Will Adapt for OT |
molecular-subtyping-HGG | histologies-base.tsv snv-consensus-plus-hotspots.maf.tsv.gz consensus_wgs_plus_cnvkit_wxs.tsv.gz fusion-putative-oncogenic.tsv cnv-consensus-gistic.zip gene-expression-rsem-tpm-collapsed.rds tp53_altered_status.tsv |
Molecular subtyping of high-grade glioma samples #249 | No | results/HGG_molecular_subtype.tsv |
Yes | Yes | GitHub | N/A |
molecular-subtyping-LGAT | histologies-base.tsv snv-consensus-plus-hotspots.maf.tsv.gz fusion-putative-oncogenic.tsv analyses/fusion_filtering/results/fusion-recurrently-fused-genes-bysample.tsv |
Molecular subtyping of Low-grade astrocytic tumor samples #631 | No | results/lgat_subtyping.tsv |
Yes | Yes | GitHub | N/A |
molecular-subtyping-MB | histologies.tsv gene-expression-rsem-tpm-collapsed.rds |
Molecular classification of Medulloblastoma subtypes part of #116 | No | results/MB_molecular_subtype.tsv |
Yes | Yes | GitHub | N/A |
molecular-subtyping-SHH-tp53 | histologies snv-consensus-plus-hotspots.maf.tsv.gz |
Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 | No | N/A | No | No | N/A | N/A |
molecular-subtyping-chordoma | analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz gene-expression-rsem-fpkm-collapsed.stranded.rds |
identifying poorly-differentiated chordoma samples per #250 | No | N/A | No | No | N/A | Will Adapt for OT |
molecular-subtyping-embryonal | histologies-base.tsv analyses/fusion-summary/fusion_summary_embryonal_foi.tsv sv-manta.tsv.gz consensus_wgs_plus_cnvkit_wxs.tsv.gz analyses/focal-cn-file-preparation/cnvkit_annotated_cn_x\_and_y.tsv.gz analyses/focal-cn-file-preparation/controlfreec_annotated_cn_x\_and_y.tsv.gz gene-expression-rsem-tpm-collapsed.rds |
Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 | No | results/embryonal_tumor_molecular_subtypes.tsv |
No | No | N/A | Will Adapt for OT |
molecular-subtyping-integrate | histologies-base.tsv results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv |
Add molecular subtype information to base histology | No | results/histologies.tsv |
Yes | Yes | GitHub | N/A |
molecular-subtyping-neurocytoma | histologies-base.tsv |
Molecular subtyping of Neurocytoma samples #805 | No | results/neurocytoma_subtyping.tsv |
No | No | N/A | Will Adapt for OT |
molecular-subtyping-pathology | analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv analyses/molecular-subtyping-EWS/results/EWS_samples.tsv analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv |
Compile output from other molecular subtyping modules and incorporate pathology feedback #645 | No | results/compiled_molecular_subtyping_with_clinical_feedback.tsv results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv |
Yes | Yes | GitHub | N/A |
mutational-signatures | snv-consensus-plus-hotspots.maf.tsv.gz |
Performs COSMIC and Alexandrov et al. mutational signature analysis using the consensus SNV data | No | N/A | No | No | N/A | N/A |
mutect2-vs-strelka2 | snv-mutect2.vep.maf.gz snv-strelka2.vep.maf.gz |
Deprecated; comparison of only two SNV callers, subsumed by snv-callers |
No | N/A | No | No | N/A | N/A |
oncoprint-landscape | snv-consensus-plus-hotspots.maf.tsv.gz fusion-putative-oncogenic.tsv analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz independent-specimens.\* |
Combines mutation, copy number, and fusion data into an OncoPrint plot #6; will need to be updated as all data types are refined | No | N/A | No | No | N/A | N/A |
pedcbio-cnv-prepare | consensus_wgs_plus_cnvkit_wxs_autosomes.tsv.gz consensus_wgs_plus_cnvkit_wxs_x\_and_y.tsv.gz |
Generate annotated CNV files that are similar to seg files for PedCBio uploads to include all samples with neutral CNV calls | Yes | Upload to PedCBio S3 bucket for ingestion | Yes | GitHub | N/A | |
pedcbio-sample-name | histologies.tsv input\cbtn_cbio_sample.csv input\dgd_cbio_sample.csv input\oligo_nation_cbio_sample.csv input\x01_fy16_nbl_maris_cbio_sample.csv |
For some of the samples, when multiple DNA or RNA specimens are associated with the same sample, there is no column that would distinguish between different aliquots while still tying DNA and RNA together. | Yes | Upload to PedCBio S3 bucket for ingestion | Yes | GitHub | N/A | |
pedot-table-column-display-order-name | analyses/snv-frequencies/results/gene-level-snv-consensus-annotated-mut-freq.tsv analyses/snv-frequencies/results/variant-level-snv-consensus-annotated-mut-freq.tsv.gz analyses/cnv-frequencies/results/gene-level-cnv-consensus-annotated-mut-freq.tsv.gz analyses/fusion-frequencies/results/putative-oncogene-fused-gene-freq.tsv.gz analyses/fusion-frequencies/results/putative-oncogene-fusion-freq.tsv.gz analyses/rna-seq-expression-summary-stats/results/long_n\_tpm_mean_sd_quantile_gene_wise_zscore.tsv.gz analyses/rna-seq-expression-summary-stats/results/long_n\_tpm_mean_sd_quantile_group_wise_zscore.tsv.gz |
Generate and validate an Excel spreadsheet for Pediatric Open Targets PedOT website table display orders and names | No | Upload to FNL BOX | Yes | Yes | GitHub | N/A |
rna-seq-composition | gene-expression-rsem-tpm.rds histologies.tsv mend-qc-results.tar.gz mend-qc-manifest.tsv star-log-manifest.tsv star-log-final.tar.gz |
Analyzes the fraction of read types that comprise each RNA-Seq sample; flags samples with unusual composition | No | N/A | No | No | N/A | N/A |
rna-seq-protocol-ruvseq | gene-counts-rsem-expected_count-collapsed.rds |
Evaluate the use of empirical negative control genes for batch correction | No | N/A | Yes | Yes | Github | N/A |
rna-seq-protocol-dge | gene-counts-rsem-expected_count-collapsed.rds |
In progress #17; check if the DGE analysis between poly-A and stranded RNA-seq data follow a null-p-value distribution; determine stably expressed genes between poly-A and stranded samples. | No | N/A | Yes | Yes | GitHub | N/A |
rna-seq-expression-summary-stats | gene-expression-rsem-tpm-collapsed.rds histologies.tsv |
Calculate TPM summary statistics within each cancer group and cohort. #51. | No | Upload to FNL Box | Yes | Yes | GitHub | N/A |
run-gistic | histologies.tsv cnv-consensus.seg.gz |
Runs GISTIC 2.0 on SEG files | Yes | cnv-consensus-gistic.zip included in data download |
Yes | Yes | GitHub | Move to CAVATICA |
sample-distribution-analysis | histologies.tsv |
Produces plots and tables that illustrate the distribution of different histologies in the PBTA data | No | N/A | No | No | N/A | N/A |
selection-strategy-comparison | gene-expression-rsem-tpm-collapsed.rds |
Deprecated; Comparison of RNA-seq data from different selection strategies | No | N/A | No | No | N/A | N/A |
sex-prediction-from-RNASeq | gene-expression-kallisto.stranded.rds histologies.tsv |
predicts genetic sex using RNA-seq data #84 | No | N/A | No | No | N/A | N/A |
snv-callers | snv-lancet.vep.maf.gz snv-mutect2.vep.maf.gz snv-strelka2.vep.maf.gz snv-vardict.vep.maf.gz tcga-snv-lancet.vep.maf.gz tcga-snv-mutect2.vep.maf.gz tcga-snv-strelka2.vep.maf.gz |
Generates consensus SNV and indel calls for PBTA and TCGA data; calculates tumor mutation burden using the consensus calls | Yes | results/consensus/snv-consensus-plus-hotspots.maf.tsv included in data download; too large for tracking via GitHub results/consensus/snv-consensus-mutation-tmb-all.tsv results/consensus/snv-consensus-mutation-tmb-coding.tsvincluded in data download; too large for tracking via GitHub results/consensus/tcga-snv-consensus-mutation.maf.tsv.gz results/consensus/tcga-snv-mutation-tmb.tsv results/consensus/tcga-snv-mutation-tmb-coding.tsv |
No | N/A | N/A | |
snv-frequencies | histologies.tsv snv-consensus-plus-hotspots.maf.tsv.gz snv-dgd.maf.tsv.gz independent-specimens.wgswxspanel.primary.eachcohort.prefer.wxs.tsv independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wxs.tsv independent-specimens.wgswxspanel.primary.prefer.wxs.tsv independent-specimens.wgswxspanel.relapse.prefer.wxs.tsv |
Annotate SNV table with mutation frequencies | No | results/gene-level-snv-consensus-annotated-mut-freq.jsonl.gz results/gene-level-snv-consensus-annotated-mut-freq.tsv.gz variant-level-snv-consensus-annotated-mut-freq.jsonl.gz variant-level-snv-consensus-annotated-mut-freq.tsv.gz |
Yes | Yes | GitHub | N/A |
ssgsea-hallmark | gene-counts-rsem-expected_count-collapsed.rds |
Deprecated; performs GSVA using Hallmark gene sets | No | N/A | No | No | N/A | N/A |
survival-analysis | TBD | In progress; will eventually contain functions for various types of survival analysis #18 | No | N/A | No | No | N/A | N/A |
telomerase-activity-prediction | gene-expression-rsem-tpm-collapsed.rds gene-counts-rsem-expected_count-collapsed.rds |
Quantify telomerase activity across pediatric brain tumors part of #148 | No | results/TelomeraseScores_PTBAPolya_counts results/TelomeraseScores_PTBAPolya_FPKM.txt results/TelomeraseScores_PTBAStranded_counts.txt results/TelomeraseScores_PTBAStranded_FPKM.txt |
No | No | N/A | N/A |
tmb-calculation | gencode.v27.primary_assembly.annotation.bed intersect_strelka_mutect2_vardict_WGS.bed snv-consensus-plus-hotspots.maf.tsv.gz biospecimen_id_to_bed_map.tsv histologies-base.tsv hg38_strelka.bed wgs_canonical_calling_regions.hg38.bed gencode.v27.primary_assembly.annotation.gtf.gz |
The Tumor Mutation Burden calculation is adapted from snv-callers module of the OpenPBTA-analyses, and use the SNV calls Mutect2, Strelka2, Lancet, and Vardict callers. | No | N/A | No | No | N/A | N/A |
tmb-compare | snv-consensus-mutation-tmb-coding.tsv |
Compares PBTA tumor mutation burden to adult TCGA data. The D3B TMB calculations TMB_d3b_code and its comparison notebook compare-tmb-calculations.Rmd are deprecated. |
No | N/A | No | No | N/A | N/A |
tp53_nf1_score | snv-consensus-plus-hotspots.maf.tsv gene-expression-rsem-tpm-collapsed.rds consensus_wgs_plus_cnvkit_wxs.tsv.gz |
Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data #165 | No | TP53_NF1_snv_alteration.tsv gene-expression-rsem-tpm-collapsed_classifier_scores.tsv loss_overlap_domains_tp53.tsv poly-A_TP53.png stranded_TP53.png sv_overlap_tp53.tsv tp53_altered_status.tsv |
Yes | Yes | GitHub | N/A |
transcriptomic-dimension-reduction | gene-expression-rsem-tpm.rds gene-expression-kallisto.rds |
Dimension reduction and visualization of RNA-seq data part of #9 | No | N/A | No | No | N/A | N/A |
tcga-capture-kit-investigation | snv-lancet.vep.maf.gz snv-mutect2.vep.maf.gz snv-strelka2.vep.maf.gz tcga-snv-lancet.vep.maf.gz tcga-snv-mutect2.vep.maf.gz tcga-snv-strelka2.vep.maf.gz histologies.tsv tcga-manifest.tsv WGS.hg38.lancet.unpadded.bed WGS.hg38.strelka2.unpadded.bed WGS.hg38.mutect2.vardict.unpadded.bed |
Investigation of the TMB discrepancy between PBTA and TCGA data | No | results/*.bed |
Yes | No | GitHub | N/A |
tumor-gtex-plots | gene-expression-rsem-tpm-collapsed.rds histologies.tsv |
In progress #38; tumor vs normal and tumor only expression plots | No | results/pan_cancer_plots_cancer_group_level.{tsv, jsonl.gz} results/pan_cancer_plots_cohort_cancer_group_level.{tsv, jsonl.gz} results/tumor_normal_gtex_plots_cancer_group_level.{tsv, jsonl.gz} results/tumor_normal_gtex_plots_cohort_cancer_group_level.{tsv, jsonl.gz} results/metadata.tsv plots/\*.png |
Yes | Yes | GitHub | N/A |
tumor-normal-differential-expression | histologies.tsv gene-counts-rsem-expected_count-collapsed.rds independent-specimens.rnaseq.primary.tsv independent-specimens.rnaseq.primary.eachcohort.tsv gene-expression-rsem-tpm-collapsed.rds ensg-hugo-pmtl-mapping.tsv efo-mondo-map.tsv uberon-map-gtex-subgroup.tsv |
This module takes as input histologies and the RNA-Seq expression matrices data, and performs differential expression analysis for all combinations of GTEx subgroup normal and cancer histology type tumor. | No | N/A | Yes | Yes | HPC CAVATICA user can create application for personal analysis purpose using scripts provided in the module | N/A |