Skip to content

Commit

Permalink
Merge pull request #555 from d3b-center/v15-subset-files
Browse files Browse the repository at this point in the history
V15 subset files (2/2)
  • Loading branch information
jharenza authored Mar 1, 2024
2 parents 5567109 + c4993a3 commit 3f47859
Show file tree
Hide file tree
Showing 5 changed files with 72 additions and 32 deletions.
29 changes: 25 additions & 4 deletions .github/workflows/run_analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,14 @@ jobs:
name: Run Analysis - Consensus CN Manta
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: free disk space
run: |
sudo swapoff -a
sudo rm -f /swapfile
sudo apt clean
docker rmi $(docker image ls -aq)
df -h
- name: Download Data for Consensus CN Manta
uses: docker://pgc-images.sbgenomics.com/d3b-bixu/open-pedcan:latest
with:
Expand All @@ -27,7 +34,14 @@ jobs:
needs: consensus_cn_manta
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: free disk space
run: |
sudo swapoff -a
sudo rm -f /swapfile
sudo apt clean
docker rmi $(docker image ls -aq)
df -h
- name: Download Data for Consensus CN
uses: docker://pgc-images.sbgenomics.com/d3b-bixu/open-pedcan:latest
with:
Expand Down Expand Up @@ -131,6 +145,7 @@ jobs:

- name: Mutational signatures
entrypoint: mutational-signatures/run_mutational_signatures.sh
openpbta_testing: 1

#- name: Immune Deconvolution
# entrypoint: immune-deconv/run-immune-deconv.sh
Expand Down Expand Up @@ -161,8 +176,14 @@ jobs:
# entrypoint: rnaseq-batch-correct/run_ruvseq.sh

steps:
- uses: actions/checkout@v3

- uses: actions/checkout@v4
- name: free disk space
run: |
sudo swapoff -a
sudo rm -f /swapfile
sudo apt clean
docker rmi $(docker image ls -aq)
df -h
- name: Download Data
uses: docker://pgc-images.sbgenomics.com/d3b-bixu/open-pedcan:latest
with:
Expand Down
Binary file modified analyses/create-subset-files/biospecimen_ids_for_subset.RDS
Binary file not shown.
2 changes: 1 addition & 1 deletion analyses/create-subset-files/run_create_subset_files.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ set -o pipefail

# Set defaults for release and biospecimen file name
BIOSPECIMEN_FILE=${BIOSPECIMEN_FILE:-biospecimen_ids_for_subset.RDS}
RELEASE=${RELEASE:-v14}
RELEASE=${RELEASE:-v15}
NUM_MATCHED=${NUM_MATCHED:-15}

# This option controls whether or not the two larger MAF files are skipped as
Expand Down
68 changes: 43 additions & 25 deletions analyses/mutational-signatures/run_mutational_signatures.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,41 +9,59 @@ set -o pipefail
# Set the working directory to the directory of this file
cd "$(dirname "${BASH_SOURCE[0]}")"


# In CI we'll run an abbreviated version of the de novo signatures extraction
ABBREVIATED_MUTSIGS=${OPC_QUICK_MUTSIGS:-0}

# Run only consensus testing file in CI, since tumor only snv is large
IS_CI=${OPENPBTA_TESTING:-0}

if [[ "$IS_CI" -eq "1" ]]

then

echo "Run the SBS mutational signatures analysis using existing signatures on consensus SNV"
Rscript -e "rmarkdown::render('01-known_signatures.Rmd', params = list(snv_file = \"snv-consensus-plus-hotspots.maf.tsv.gz\", output_Folder = \"ConsensusSNV\"), clean = TRUE)"
mv 01-known_signatures.nb.html 01-ConsensusSNV_known_signatures.nb.html

echo "Run the mutational signatures analysis using COSMIC DBS signatures (v3.3) on consensus SNV"
Rscript -e "rmarkdown::render('02-cosmic_dbs_signatures.Rmd', params = list(snv_file = \"snv-consensus-plus-hotspots.maf.tsv.gz\", output_Folder = \"ConsensusSNV\"), clean = TRUE)"
mv 02-cosmic_dbs_signatures.nb.html 02-ConsensusSNV_cosmic_dbs_signatures.nb.html

# Run the SBS mutational signatures analysis using existing signatures on consensus SNV
Rscript -e "rmarkdown::render('01-known_signatures.Rmd', params = list(snv_file = \"snv-consensus-plus-hotspots.maf.tsv.gz\", output_Folder = \"ConsensusSNV\"), clean = TRUE)"
mv 01-known_signatures.nb.html 01-ConsensusSNV_known_signatures.nb.html
echo "Run analysis of adult CNS mutational signatures on consensus SNV"
Rscript --vanilla 03-fit_cns_signatures.R \
--snv_file snv-consensus-plus-hotspots.maf.tsv.gz \
--output_Folder ConsensusSNV

# Run the SBS mutational signatures analysis using existing signatures on tumor only SNV
Rscript -e "rmarkdown::render('01-known_signatures.Rmd', params = list(snv_file = \"snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz\", output_Folder = \"TumorOnlySNV\"), clean = TRUE)"
mv 01-known_signatures.nb.html 01-TumorOnly_known_signatures.nb.html
else

# Run the mutational signatures analysis using COSMIC DBS signatures (v3.3)
Rscript -e "rmarkdown::render('02-cosmic_dbs_signatures.Rmd', params = list(snv_file = \"snv-consensus-plus-hotspots.maf.tsv.gz\", output_Folder = \"ConsensusSNV\"), clean = TRUE)"
mv 02-cosmic_dbs_signatures.nb.html 02-ConsensusSNV_cosmic_dbs_signatures.nb.html
# Run the SBS mutational signatures analysis using existing signatures on consensus SNV
Rscript -e "rmarkdown::render('01-known_signatures.Rmd', params = list(snv_file = \"snv-consensus-plus-hotspots.maf.tsv.gz\", output_Folder = \"ConsensusSNV\"), clean = TRUE)"
mv 01-known_signatures.nb.html 01-ConsensusSNV_known_signatures.nb.html

# Run the SBS mutational signatures analysis using existing signatures on tumor only SNV
Rscript -e "rmarkdown::render('01-known_signatures.Rmd', params = list(snv_file = \"snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz\", output_Folder = \"TumorOnlySNV\"), clean = TRUE)"
mv 01-known_signatures.nb.html 01-TumorOnly_known_signatures.nb.html

Rscript -e "rmarkdown::render('02-cosmic_dbs_signatures.Rmd', params = list(snv_file = \"snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz\", output_Folder = \"TumorOnlySNV\"), clean = TRUE)"
mv 02-cosmic_dbs_signatures.nb.html 02-TumorOnly_cosmic_dbs_signatures.nb.html
# Run the mutational signatures analysis using COSMIC DBS signatures (v3.3)
Rscript -e "rmarkdown::render('02-cosmic_dbs_signatures.Rmd', params = list(snv_file = \"snv-consensus-plus-hotspots.maf.tsv.gz\", output_Folder = \"ConsensusSNV\"), clean = TRUE)"
mv 02-cosmic_dbs_signatures.nb.html 02-ConsensusSNV_cosmic_dbs_signatures.nb.html

# Run analysis of adult CNS mutational signatures
Rscript --vanilla 03-fit_cns_signatures.R \
--snv_file snv-consensus-plus-hotspots.maf.tsv.gz \
--output_Folder ConsensusSNV
Rscript -e "rmarkdown::render('02-cosmic_dbs_signatures.Rmd', params = list(snv_file = \"snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz\", output_Folder = \"TumorOnlySNV\"), clean = TRUE)"
mv 02-cosmic_dbs_signatures.nb.html 02-TumorOnly_cosmic_dbs_signatures.nb.html

Rscript --vanilla 03-fit_cns_signatures.R \
--snv_file snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz \
--output_Folder TumorOnlySNV

# Run mutational signature summary of hypermutant tumors
## skip script 04 if it is on GitHub CI
if [ "$CI" = true ]; then
echo "Running in GitHub CI"
else
echo "Not running in GitHub CI"
# Run analysis of adult CNS mutational signatures
Rscript --vanilla 03-fit_cns_signatures.R \
--snv_file snv-consensus-plus-hotspots.maf.tsv.gz \
--output_Folder ConsensusSNV

Rscript --vanilla 03-fit_cns_signatures.R \
--snv_file snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz \
--output_Folder TumorOnlySNV

Rscript -e "rmarkdown::render('04-explore_hypermutators.Rmd', params = list(output_Folder = \"ConsensusSNV\"), clean = TRUE)"

## Tumor only result did not have sample passing the filter. Therefore, 04 script is not running for tumor only
#Rscript -e "rmarkdown::render('04-explore_hypermutators.Rmd', params = list(output_Folder = \"TumorOnlySNV\"), clean = TRUE)"

fi
5 changes: 3 additions & 2 deletions doc/data-files-description.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ This document contains information about all data files associated with this pro
|`fusion-putative-oncogenic.tsv` | Analysis file | [`fusion_filtering`](https://github.com/d3b-center/OpenPedCan-analysis/tree/master/analyses/fusion_filtering) | Filtered and prioritized fusions
|`gene-counts-rsem-expected_count-collapsed.rds` | Analysis file | PBTA+GMKF+TARGET [`collapse-rnaseq`](https://github.com/d3b-center/OpenPedCan-analysis/tree/dev/analyses/collapse-rnaseq) | Gene expression - RSEM expected_count for each samples collapsed to gene symbol (gene-level)
|`gene-expression-rsem-tpm-collapsed.rds` | Analysis file | PBTA+GMKF+TARGET [`collapse-rnaseq`](https://github.com/d3b-center/OpenPedCan-analysis/tree/dev/analyses/collapse-rnaseq) | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level)
|`tcga-gene-expression-rsem-tpm-collapsed.rds` | Modified reference file | TCGA samples lifted from GENCODE v27 to v39 | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level)
|`gtex-gene-expression-rsem-tpm-collapsed.rds` | Modified reference file | GTEX v8 release lifted to GENCODE v39 | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level)
|`tcga_gene-counts-rsem-expected_count-collapsed.rds` | Modified reference file | TCGA samples lifted from GENCODE v27 to v39 | Gene expression - RSEM counts for each samples collapsed to gene symbol (gene-level)
|`tcga_gene-expression-rsem-tpm-collapsed.rds` | Modified reference file | TCGA samples lifted from GENCODE v27 to v39 | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level)
|`gtex_gene-expression-rsem-tpm-collapsed.rds` | Modified reference file | GTEX v8 release lifted to GENCODE v39 | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level)
|`gtex_gene-counts-rsem-expected_count-collapsed.rds` | Modified reference file | GTEX v8 release lifted to GENCODE v39 | Gene expression - RSEM counts for each samples collapsed to gene symbol (gene-level)
|`WGS.hg38.lancet.300bp_padded.bed` | Reference Target/Baits File | [SNV and INDEL calling](https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#snv-and-indel-calling) | WGS.hg38.lancet.unpadded.bed file with each region padded by 300 bp
|`WGS.hg38.lancet.unpadded.bed` | Reference Regions File | [SNV and INDEL calling](https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#snv-and-indel-calling) | hg38 WGS regions created using UTR, exome, and start/stop codon features of the GENCODE 31 reference, augmented with PASS variant calls from Strelka2 and Mutect2
Expand Down

0 comments on commit 3f47859

Please sign in to comment.