-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathexample_usage.Rmd
293 lines (202 loc) · 12.5 KB
/
example_usage.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
title: "My typical usage of `mt_helpers`"
author: "Mayank Tandon"
output:
html_document:
toc: true
toc_float: true
toc_depth: 4
code_folding: show
---
---------------------------------
# Example exploration
---------------------------------
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(maftools)
library(openxlsx)
library(ComplexHeatmap)
library(plotly)
library(dplyr)
```
# Source `mt_helpers`
We're just sourcing the functions straight from Github
```{r source-helpers,warning=F}
## For TCGA-related functions
source("https://raw.githubusercontent.com/mtandon09/mt_helpers/main/helper_functions.tcga.R")
## For oncoplot and other MAF plotting functions
source("https://raw.githubusercontent.com/mtandon09/mt_helpers/main/helper_functions.oncoplot.R")
## For mutational signatures functions
source("https://raw.githubusercontent.com/mtandon09/mt_helpers/main/helper_functions.mutsig.R")
```
------------------------------------------
# Get some MAF data
`get_tcga_data` will fetch MAF files from TCGA data using [`TCGABiolinks`](https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html). It just packages the jobs of fetching a MAF file, it's associated clinical data and returns a `maftools` MAF object.
We'll use data from the [TCGA-ACC project](https://portal.gdc.cancer.gov/projects/TCGA-ACC).
```{r get-maf-data,message=F,warning=F,output=F,results='hold', attr.output='style="max-height: 150;"'}
tcga_maf <- get_tcga_data(tcga_dataset = "ACC", variant_caller = "mutect2")
print(tcga_maf)
```
The MAF file will be saved as `./data/TCGA_ACC/mutect2/TCGA-ACC.mutect2.maf`
------------------------------------------
# Filter MAF data
`filter_maf_chunked` is a function to read and filter a MAF file in chunks. This kinda helps for huge MAFs (say > 500k variant or a couple of Gbs), but ultimately you still gotta hold 'em in memory because this is R `:(`, so downstream work may still be clunky/might not work.
It uses another function, `filter_maf_tbl`, which is meant to work on `data.table` tibbles, so you can repurpose that for use with `maftools` objects (i.e. the `maf@data` or `[email protected]` slots).
One key feature of this function is that it will also remove top 50 [exome FLAG genes](https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-014-0064-y) by default.
```{r filter-maf-data,message=T,warning=T,results='hold',attr.output='style="max-height: 150;"'}
source("../helper_functions.oncoplot.R")
# tcga_maf.filtered <- filter_maf_chunked("data/TCGA_ACC/mutect2/TCGA_ACC.mutect2.maf")
tcga_maf.filtered <- filter_maf_chunked(tcga_maf)
print(tcga_maf.filtered)
```
## Count silent vs. non-silent mutations
`plot_silent_nonsilent` returns a `ggplot2` plot, so we can easily make it interactive with `plotly::ggplotly`
### Silent/Non-Silent
```{r silent-nonsilent,message=F,warning=F}
plotly::ggplotly(plot_silent_nonsilent(tcga_maf.filtered))
```
------------------------------------------
# Mutation Burden
## Plot mutation burden {.tabset}
`make_burden_plot` returns a `ggplot2` plot, so we can easily make it interactive with `plotly::ggplotly`
### Dot plot
This is more useful for larger cohorts
```{r burden-dotplot,message=F,warning=F}
plotly::ggplotly(make_burden_plot(tcga_maf.filtered, plotType = "Dotplot"))
```
### Bar plot
This also adds Variant Classification information, but mostly useful for smaller cohorts
```{r burden-barplot,message=F,warning=F}
plotly::ggplotly(make_burden_plot(tcga_maf.filtered, plotType = "Barplot"))
```
------------------------------------------
# Oncoplot
## Basic oncoplot
`make_oncoplot` is my own implementation of the `oncoPrint` function from `ComplexHeatmap` (heavily relying on ideas from `maftools`'s oncoplot function)
```{r oncoplot-1,message=F,warning=F}
myhm <- make_oncoplot(tcga_maf.filtered)
draw(myhm)
```
## Sample annotations
`tcga_clinical_colors` will make some reasonable colors for commonly reported clinical features in TCGA datasets. This can then be input into `make_oncoplot` to draw sample annotations.
Note that this `clin_data` argument must be a `data.table` object (suitable for use with the `clinical.data` slot of a `MAF` object) and must contain sample IDs in a column named 'Tumor_Sample_Barcode'.
```{r oncoplot-anno,message=F,warning=F, out.width="100%", out.height="500px"}
source("../helper_functions.tcga.R")
## maftools converts all columns to factor when storing the clinical data
## So let's turn age back to numeric
sample_annotation_data <- [email protected]
sample_annotation_data$age_at_diagnosis <- as.numeric(as.character(sample_annotation_data$age_at_diagnosis))
## This function returns nice colors for selected clinical data columns
sample_annotation_colors <- tcga_clinical_colors(sample_annotation_data)
anno_columns <- c(names(sample_annotation_colors),"Tumor_Sample_Barcode")
## Remve clinical data columns that don't have custom colors
sample_annotation_data <- sample_annotation_data[,..anno_columns]
## Plot
source("../helper_functions.oncoplot.R")
# myhm <- make_oncoplot(tcga_maf.filtered,
make_oncoplot(tcga_maf.filtered,
savename = "annotated_oncoplot.pdf",
clin_data = sample_annotation_data,
clin_data_colors = sample_annotation_colors
)
# draw(myhm)
knitr::include_graphics(file.path(getwd(),"annotated_oncoplot.pdf"))
```
## ClinVar annotations
`make_oncoplot2` is a more experimental version of `make_oncoplot`. You can do things like plot selected genes and show pathogenicity annotations from ClinVar (if available in the MAF).
```{r oncoplot-2,message=F,warning=F}
## Select just the genes that have pathogenic or uncertain mentions in the ClinVar significance annotation
mygenes <- unique(tcga_maf.filtered@data$Hugo_Symbol[grepl("pathogenic|uncertain",tcga_maf.filtered@data$CLIN_SIG, ignore.case = T)])
myhm <- make_oncoplot2(tcga_maf.filtered,cohort_freq_thresh = NULL, genes_to_plot = mygenes, use_clinvar_anno = T)
draw(myhm)
```
## Sets of selected genes
The `genes_to_plot` argument will also accept a `data.frame` with columns named "`Reason`" and "`Hugo_Symbol`" to define custom sets of genes to display. If `cohort_freq_thresh` is set to a fraction, it will also add frequently mutated genes.
Note that the genes are plotted as given, so a gene can appear several times if it is in multiple "Reason" sets.
```{r oncoplot-3,message=F,warning=F, out.width="100%", out.height="500px"}
## Adapted from: http://yulab-smu.top/clusterProfiler-book/chapter7.html
library(msigdbr)
library(stringr)
## Get human gene sets from msigdb
m_t2g <- msigdbr(species = "Homo sapiens", category = "H") %>%
dplyr::select(gs_name, gene_symbol, gs_subcat)
## Select just ones matching "JAK" or "AKT"
pathwaysdf <- data.frame(m_t2g)
mypaths <- pathwaysdf[grepl("JAK|AKT",pathwaysdf$gs_name, ignore.case = T),]
## Set up a data frame with columns "Reason" and "Hugo_Symbol"
## The "Reason" value is used to label the plot, so here I'm replacing _ with spaces and adding text wrapping with stringr
genes_df <- data.frame(Reason=stringr::str_wrap(gsub("_"," ",gsub("HALLMARK_","",mypaths$gs_name)), width=10),
Hugo_Symbol=mypaths$gene_symbol,
stringsAsFactors = F)
## Sizing can get tricky with larger plots; it will be adjusted automatically for the number of samples and genes if saving to pdf
make_oncoplot2(tcga_maf.filtered,cohort_freq_thresh = 0.05, genes_to_plot = genes_df, use_clinvar_anno = T, savename="customonco.pdf")
## This is just for adding the plot to this markdown document
knitr::include_graphics(file.path(getwd(),"customonco.pdf"))
```
------------------------------------------
# Mutational Signatures
Here we're only talking about single-base substitution (SBS) signatures, focusing on the ones [categorized by **COSMIC v3.1**](https://cancer.sanger.ac.uk/cosmic/signatures/SBS).
## Download COSMIC data and etiologies
I'm storing the manually curated etiology data in my [Github repo](https://github.com/mtandon09/mt_helpers).
```{r download-cosmic-data,message=F,warning=F}
# Download signatures data from my repo
local_signatures_file="../cosmic/COSMIC_Mutational_Signatures_v3.1.xlsx"
local_etiology_file="../cosmic/COSMIC_signature_etiology.xlsx"
if (!file.exists(local_signatures_file)) {
if (!dir.exists(dirname(local_signatures_file))) { dir.create(dirname(local_signatures_file), recursive = T)}
download.file("https://github.com/mtandon09/mt_helpers/blob/main/cosmic/COSMIC_Mutational_Signatures_v3.1.xlsx?raw=true",destfile = local_signatures_file)
}
if (!file.exists(local_signatures_file)) {
if (!dir.exists(dirname(local_signatures_file))) { dir.create(dirname(local_signatures_file), recursive = T)}
download.file("https://github.com/mtandon09/mt_helpers/blob/main/cosmic/COSMIC_signature_etiology.xlsx?raw=true",destfile = local_etiology_file)
}
```
## Inspect individual signatures
Just plotting the first 5 samples here.
```{r plot-individual-signatures,message=F,warning=F, results='hold', attr.output='style="max-height: 400px;"'}
source("../helper_functions.mutsig.R")
# source("https://raw.githubusercontent.com/mtandon09/mt_helpers/main/helper_functions.mutsig.R")
### Sample signatures
plotN=5
make_signature_plot(make_tnm(tcga_maf.filtered)[,1:5])
```
## Plot similarity to COSMIC signatures
The cosine similarity of each individual signature is computed against each COSMIC signature using the [`cosin_sim_matrix`](https://rdrr.io/bioc/MutationalPatterns/man/cos_sim_matrix.html) function from `MutationalPatterns`.
The row annotations are colored by categories according to the `Proposed Etiology` section of each COSMIC signatures page (e.g. for [SBS1](https://cancer.sanger.ac.uk/cosmic/signatures/SBS/SBS1.tt#aetiology)).
I tried several (relatively naïve) ways of clustering/concept-mapping the text with no luck, so instead I used those results to aid manual curation. My proposed solution, used here, can be found in [this repo as an Excel file](https://github.com/mtandon09/mt_helpers/blob/main/cosmic/COSMIC_signature_etiology.xlsx).
**THIS IS EXTREMELY EXPERIMENTAL!**. If you find discrepancies or bad categorizations in these annotations, please [open an issue here](https://github.com/mtandon09/mt_helpers/issues)!
```{r plot-cosmic-similarity,message=F,warning=F, out.width="600px", results='hide', out.width="100%", out.height="500px"}
### COSMIC signatures
source("../helper_functions.mutsig.R")
## maftools converts all columns to factor when storing the clinical data
## So let's turn age back to numeric
sample_annotation_data <- [email protected]
sample_annotation_data$age_at_diagnosis <- as.numeric(as.character(sample_annotation_data$age_at_diagnosis))
## This function returns nice colors for selected clinical data columns
sample_annotation_colors <- tcga_clinical_colors(sample_annotation_data)
anno_columns <- c(names(sample_annotation_colors),"Tumor_Sample_Barcode")
## Remve clinical data columns that don't have custom colors
sample_annotation_data <- sample_annotation_data[,..anno_columns]
make_mut_signature_heatmap(tcga_maf.filtered,signatures_file = local_signatures_file, etio_data_xlsx = local_etiology_file,
savename="cosmic.pdf",
clin_data = sample_annotation_data, clin_data_colors = sample_annotation_colors)
## This is just for adding the plot to this markdown document
knitr::include_graphics(file.path(getwd(),"cosmic.pdf"))
```
# Co-occurence and overlaps
## Visualize overlap between samples
`make_overlap_plot` can plot a square matrix heatmap of pair-wise overlaps, or show the same information in a ribbon plot (which is not usually helpful, but it looks pretty sometimes).
Note that the `summarize_by` can be set to 'gene' to count overlaps by gene (one or more mutations), or set to anything else to count by variant position and protein change.
This is also coded very naively so it's very slow for large cohorts.
Also both plotting methods (using `pheatmap` or `circlize`) do not lend well to passing objects, so the plots are printed to PDF.
```{r overlap-hm,message=F,warning=F, out.width="100%", out.height="500px"}
source("../helper_functions.oncoplot.R")
make_overlap_plot(tcga_maf.filtered, plotType = c("heatmap","ribbon"), savename = file.path(getwd(),"overlap.pdf"),savewidth = 12, saveheight = 12)
## This is just for adding the plot to this markdown document
knitr::include_graphics(file.path(getwd(),"overlap.pdf"))
```
------------------------------------------
# R Session Info
``` {r print-session-info, attr.output='style="max-height: 150px;"'}
sessionInfo()
```