-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathGSE147507 CoV2.Rmd
275 lines (194 loc) · 10.2 KB
/
GSE147507 CoV2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
---
title: "GSE147507"
author: "Scott Presnell"
date: "3/26/2020"
output: pdf_document
---
## Executive Summary
One of the greatest threats to humanity is the emergence of a pandemic virus. Among those with the greatest potential for such an event include influenza viruses and coronaviruses. In the last century alone, we have observed four major influenza A virus pandemics as well as the emergence of three highly pathogenic coronaviruses including SARS-CoV-2, the causative agent of the ongoing COVID-19 pandemic. As no effective antiviral treatments or vaccines are presently available against SARS-CoV-2, it is important to understand the host response to this virus as this may guide the efforts in development towards novel therapeutics. Here, we offer the first in-depth characterization of the host transcriptional response to SARS-CoV-2 and other respiratory infections through in vitro and ex vivo model systems. Our data demonstrate that each virus elicits both core antiviral components as well as unique transcriptional footprints. Compared to the response to influenza A virus (IAV) and respiratory syncytial virus (RSV), SARS-CoV-2 elicits a muted response that lacks robust induction of a subset of cytokines including the Type I and Type III interferons as well as a numerous chemokines. Taken together, these data suggest that the unique transcriptional signature of this virus may be responsible for the development of COVID-19tion
Independent biological triplicates of primary human lung epithelium (NHBE) and transformed lung alveolar (A549) cells were mock treated or infected with SARS-CoV-2 (USA-WA1/2020) at different MOI (NHBE: 2, A549: 0.2). Additionally independent biological duplicates of A549 cells were Mock treated or infected with RSV (A2 strain, MOI 15) or IAV (A/Puerto Rico/8/1934 (H1N1, MOI 5)). mRNA enriched libraries were prepared from total RNA extractions of mock treated or virus infected cells using the TruSeq RNA Library Prep Kit v2 (A549) or TruSeq Stranded mRNA LP (NHBE) according to the manufacturer’s instructions. cDNA libraries were sequenced using an Illumina NextSeq 500 platform. Raw sequencing reads were aligned to the human genome (hg19) using the RNA-Seq Alignment App on Basespace (Illumina, CA)
Taken directly from [GSE147507](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147507) SARS-Cov-2 Dataset from tenOever lab at the Icahn School of Medicine at Mt. Siani
```{r setup, echo=FALSE, message=FALSE, warning=FALSE}
library(knitr)
library(tidyverse) #loads ggplot2, tidyr, dplyr, stringr
library(magrittr)
library(readxl)
library(limma)
library(edgeR)
library(ggrepel)
library(ggcorrplot)
library(DGETools) # SRP Volcano
library(RNAseQC) # MJD calc_PCcors
opts_chunk$set(cache=FALSE, autodep = TRUE, fig.width=6, fig.height=4.0, echo=FALSE, warning=FALSE, message=FALSE)
options(digits=4)
project <- "GSE147507"
dataDir <- "data"
resultsDir <- "results"
dataFile <- file.path(dataDir, paste(project, "data.Rdata", sep="_"))
normCountsFile <- file.path(resultsDir, paste(project, "TMM", "NormCounts.csv", sep="_"))
# QC Cuts
cut.alignment <- 0.75
cut.medianCV <- 0.75
cut.filter <- 0.1
# Stat cuts
cut.fc <- 1.5
cut.pval <- 0.05
# set the ggplot theme
theme_set(theme_bw() + theme(text = element_text(face="bold", size=14), plot.title = element_text(face="bold", hjust = 0.5)))
```
```{r loading}
counts <- read.table(file=file.path(dataDir, "GSE147507_RawReadCounts.tsv"), row.names=1)
#counts <- counts[,order(colnames(counts))]
design <- read.csv(file=file.path(dataDir, "design.csv"), row.names = 1)
```
\newpage
## QC Measures
Checking aligned read counts - that's all we have access to with this data.
```{r barcounts}
tcounts <-data.frame(alignedCounts=apply(counts, 2, sum)/1e6, base=factor(design$Base))
#tcounts <-data.frame(alignedCounts=apply(counts, 2, sum)/1e6)
ggplot(tcounts, aes(reorder(rownames(tcounts), alignedCounts), alignedCounts, fill=base)) +
#ggplot(tcounts, aes(reorder(rownames(tcounts), alignedCounts), alignedCounts)) +
labs(fill='Experiment') +
geom_bar(stat="identity") + theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Library") + ylab("Aligned Counts / 1e6") +
ggtitle("GSE147507 QC: Library Counts")
```
```{r filter}
filteredCounts <- select(counts, !grep("X3", colnames(counts)))
filteredDesign <- subset(design, !grepl("X3", rownames(design)))
# check for coherency of the rows and columns
#all(colnames(filteredCounts) == rownames(filteredDesign))
```
```{r PreliminaryEdgeR}
filteredGenes <- rownames(filteredCounts)
# and sanity check
#table(rownames(filteredCounts) == filteredGenes$ensembl_gene_id)
d<-DGEList(counts=filteredCounts, genes=filteredGenes)
d<-calcNormFactors(d)
keepRows <- rowSums(round(cpm(d$counts)) >= 1) >= cut.filter*ncol(filteredCounts)
#table(keepRows)
curDGE <-d[keepRows,]
curDGE <- calcNormFactors(curDGE)
# if (!file.exists(normCountsFile)) {
# normAnnot <- cbind(curDGE$genes, cpm(curDGE))
# write.csv(normAnnot, file=normCountsFile, quote=F)
# }
```
```{r PCASetup}
normCounts <- cpm(curDGE, log=T)
pcaResult <- prcomp(t(normCounts))
# check for coherent row order.
#table(rownames(pcaResult$x) == filteredDesign$libraryId)
pcaSummary <- summary(pcaResult)
pcaDesign <- cbind(pcaResult$x, filteredDesign)
pcaDesign$label <- paste0("PC", 1:length(pcaSummary$importance[2,]),
" (", round(pcaSummary$importance[2,]*100, 1), "%)")
```
\newpage
## PCA Plots
First all samples, then just the epithelial cell based samples...
```{r PCAPlot}
ggplot(data=pcaDesign, aes(x=PC1, y=PC2)) +
geom_point(aes(color=Base, shape=Treatment), size=3) +
labs(x = pcaDesign$label[1], y = pcaDesign$label[2], color="Base", shape="Treatment") +
theme_bw() +
ggtitle("GSE147507: PCA By Sample Type")
```
```{r compareSARS}
filteredSARSDesign <- filteredDesign %>% subset(Base == "SARS004")
filteredSARSCounts <- filteredCounts[,colnames(filteredCounts) %in% rownames(filteredSARSDesign)]
d<-DGEList(counts=filteredSARSCounts, genes=rownames(filteredSARSCounts))
d<-calcNormFactors(d)
keepRows <- rowSums(round(cpm(d$counts)) >= 1) >= cut.filter*ncol(filteredSARSCounts)
#table(keepRows)
curSARSDGE <-d[keepRows,]
curSARSDGE <- calcNormFactors(curSARSDGE)
```
```{r PCASetupSARS}
normCounts <- cpm(curSARSDGE, log=T)
pcaResult <- prcomp(t(normCounts))
# check for coherent row order.
#table(rownames(pcaResult$x) == filteredDesign$libraryId)
pcaSummary <- summary(pcaResult)
pcaDesign <- cbind(pcaResult$x, filteredSARSDesign)
pcaDesign$label <- paste0("PC", 1:length(pcaSummary$importance[2,]),
" (", round(pcaSummary$importance[2,]*100, 1), "%)")
```
```{r PCAPlotSARS}
ggplot(data=pcaDesign, aes(x=PC1, y=PC2)) +
geom_point(aes(color=Treatment), size=3) +
labs(x = pcaDesign$label[1], y = pcaDesign$label[2], color="Treatment") +
theme_bw() +
ggtitle("GSE147507: PCA of Epitheilal cells By Treatment")
```
```{r SARSDGE}
testDGE <- curSARSDGE
testDesign <- filteredSARSDesign
testDesign$Treatment <- factor(testDesign$Treatment, levels=c("mock", "CoV2"))
#table(testDesign$shortName, testDesign$treatment)
intMM <- model.matrix(~Treatment, data=testDesign)
colnames(intMM) <- c("(Intercept)", "postTreatment")
SARSVoom <- voomWithQualityWeights(testDGE, design=intMM)
fit <- lmFit(SARSVoom, intMM)
fit2 <- eBayes(fit)
```
\newpage
## Differential Expression
Primary human bronchial epithelial cells (NHBE) infected with SARS-CoV-2 vs mock infection (three samples per group)
```{r PostvsPreSARS004.Volcano, fig.width=10.0, fig.height=10.0, results='asis'}
Volcano(fit2, "postTreatment", "GSE147507 Lung Epithelial SARS-CoV-2 infected vs mock", labelSource = "genes", write=T)
```
```{r compareLung}
filteredLungACDesign <- filteredDesign %>% subset(Base == "CoV002")
filteredLungACCounts <- filteredCounts[,colnames(filteredCounts) %in% rownames(filteredLungACDesign)]
d<-DGEList(counts=filteredLungACCounts, genes=rownames(filteredLungACCounts))
LungACDGE<-calcNormFactors(d)
keepRows <- rowSums(round(cpm(LungACDGE$counts)) >= 1) >= cut.filter*ncol(filteredLungACCounts)
#table(keepRows)
curLungACDGE <-LungACDGE[keepRows,]
curLungACDGE <- calcNormFactors(curLungACDGE)
```
```{r LungACDGE}
testDGE <- curLungACDGE
testDesign <- filteredLungACDesign
testDesign$Treatment <- factor(testDesign$Treatment, levels=c("mock", "CoV2"))
#table(testDesign$shortName, testDesign$treatment)
intMM <- model.matrix(~Treatment, data=testDesign)
colnames(intMM) <- c("(Intercept)", "postInfectionLungAC")
LungACVoom <- voomWithQualityWeights(testDGE, design=intMM)
fit <- lmFit(LungACVoom, intMM)
fit2 <- eBayes(fit)
```
\newpage
Lung adenocarcinoma cells (A549) infected with SARS-CoV-2 vs mock infection (three samples per group)
```{r PostvsPreLungAC004.Volcano, fig.width=10.0, fig.height=10.0, results='asis'}
Volcano(fit2, "postInfectionLungAC", "GSE147507 Lung Adeno SARS-CoV-2 infected vs mock", labelSource = "genes", write=T)
```
```{r compareLungRSV}
filteredLungRSVDesign <- filteredDesign %>% subset(Base == "svRNA184")
filteredLungRSVCounts <- filteredCounts[,colnames(filteredCounts) %in% rownames(filteredLungRSVDesign)]
d<-DGEList(counts=filteredLungRSVCounts, genes=rownames(filteredLungRSVCounts))
LungRSVDGE<-calcNormFactors(d)
keepRows <- rowSums(round(cpm(LungRSVDGE$counts)) >= 1) >= cut.filter*ncol(filteredLungRSVCounts)
#table(keepRows)
curLungRSVDGE <-LungRSVDGE[keepRows,]
curLungRSVDGE <- calcNormFactors(curLungRSVDGE)
```
```{r LungRSVDGE}
testDGE <- curLungRSVDGE
testDesign <- filteredLungRSVDesign
testDesign$Treatment <- factor(testDesign$Treatment, levels=c("mock", "RSV"))
#table(testDesign$shortName, testDesign$treatment)
intMM <- model.matrix(~Treatment, data=testDesign)
colnames(intMM) <- c("(Intercept)", "postInfectionRSV")
LungRSVVoom <- voomWithQualityWeights(testDGE, design=intMM)
fit <- lmFit(LungRSVVoom, intMM)
fit2 <- eBayes(fit)
```
\newpage
Lung adenocarcinoma cells (A549) infected with RSV vs mock infection (two samples per group)
```{r PostvsPreLungRSV004.Volcano, fig.width=10.0, fig.height=10.0, results='asis'}
Volcano(fit2, "postInfectionRSV", "GSE147507 Lung AC cells RSV infected vs mock", labelSource = "genes", write=T)
```