-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathggplot2.Rmd
340 lines (252 loc) · 8.21 KB
/
ggplot2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
# ggplot2 package
```{r global_options, include=FALSE}
knitr::opts_chunk$set(fig.width=5, fig.height=4,
echo=TRUE, warning=FALSE, message=FALSE)
```
* Graphing package inspired by the **G**rammar of **G**raphics work of Leland Wilkinson.
* A tool that enables to concisely describe the components of a graphic.
* Why ggplot2 ?
+ Flexible
+ Customizable
+ Pretty !
+ Well documented
* We will see:
* Scatter plots
* Box plots
* Dot plots
* Bar plots
* Histograms
* How to save plots
* Volcano plots
## Getting started
* All ggplots start with a **base layer** created with the **ggplot()** function:
```{r, eval=F, echo=TRUE}
ggplot(data=dataframe, mapping=aes(x=column1, y=column2))
```
*The base layer is setting the grounds but NOT plotting anything*
* Add a layer (with the **+** sign) that describes what kind of plot you want.
## Scatter plot
```{r, eval=F}
# Example of a scatter plot: add the geom_point() layer
ggplot(data=dataframe, mapping=aes(x=column1, y=column2)) + geom_point()
```
* Example of a simple scatter plot:
```{r}
# Create a data frame
df1 <- data.frame(sample1=rnorm(200), sample2=rnorm(200))
# Plot !
ggplot(data= df1 , mapping=aes(x=sample1, y=sample2)) +
geom_point()
```
* Add **layers** to that object to customize the plot:
* **ggtitle** to add a title
* **geom_vline** to add a vertical line
* etc.
```{r}
ggplot(data= df1 , mapping=aes(x=sample1, y=sample2)) +
geom_point() +
ggtitle(label="my first ggplot") +
geom_vline(xintercept=0)
```
Bookmark that [ggplot2 reference](https://ggplot2.tidyverse.org/reference/) and that good [cheatsheet](https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf) for some of the ggplot2 options.
* You can save the plot in an object **at any time** and add layers to that object:
```{r}
# Save in an object
p <- ggplot(data= df1 , mapping=aes(x=sample1, y=sample2)) +
geom_point()
# Add layers to that object
p + ggtitle(label="my first ggplot")
```
* What is inside the **aes** (aesthetics)function ?
* Anything that varies according to your data !
* Columns with values to be plotted.
* Columns with which you want to, for example, color the points.
Color all points in red (not depending on the data):
```{r}
ggplot(data=df1 , mapping=aes(x=sample1, y=sample2)) +
geom_point(color="red")
```
Color the points according to another column in the data frame:
```{r}
# Build a data frame from df1: add a column with "yes" and "no"
df2 <- data.frame(df1, grouping=rep(c("yes", "no"), c(80, 120)))
# Plot and add the color parameter in the aes():
pscat <- ggplot(data=df2, mapping=aes(x=sample1, y=sample2, color=grouping)) +
geom_point()
```
## Box plots
* Simple boxplot showing the data distribution of sample 1:
```{r}
ggplot(data=df2, mapping=aes(x="", y=sample1)) + geom_boxplot()
```
* Split the data into 2 boxes:
```{r}
ggplot(data=df2, mapping=aes(x=grouping, y=sample1)) + geom_boxplot()
```
* What if you want to plot both sample1 and sample2 ?<br>
*You need to convert into a **long** format*
Plotting both sample1 and sample2:
```{r, eval=FALSE}
# install package reshape2
install.packages("reshape2")
```
```{r}
# load package
library("reshape2")
# convert to long format
df_long <- melt(data=df2)
# all numeric values are organized into only one column: value
# plot:
ggplot(data=df_long, mapping=aes(x=variable, y=value)) +
geom_boxplot()
```
* What if now you also want to see the distribution of "yes" and "no" in both sample1 and sample2 ?<br>
*Integrate a parameter to the **aes()***
```{r}
# Either color (color of the box border)
ggplot(data=df_long, mapping=aes(x=variable, y=value, color=grouping)) +
geom_boxplot()
```
```{r}
# Or fill (color inside the box)
ggplot(data=df_long, mapping=aes(x=variable, y=value, fill=grouping)) +
geom_boxplot()
```
Do you want to change the default colors?<br>
* Integrate either layer:
* **scale_color_manual()**
* **scale_fill_manual**
```{r}
pbox <- ggplot(data=df_long, mapping=aes(x=variable, y=value, fill=grouping)) +
geom_boxplot() +
scale_fill_manual(values=c("slateblue2", "chocolate"))
```
## Dot plots
Example of the expression of a gene in 6 samples: 2 experimental groups in triplicates.
```{r}
# create a named vector with the expression of a gene
mygene <- c(8.1, 8.2, 8.6, 8.7, 9.4, 8.5)
# the names of each element of the vector are sample names
names(mygene) <- c("KO1", "KO2", "KO3", "WT1", "WT2", "WT3")
# transform to long format
mygenelong <- melt(data=mygene)
# add new columns containing sample names and experimental groups
mygenelong$sample_name <- rownames(mygenelong)
mygenelong$group <- gsub("[1-3]{1}", "", mygenelong$sample_name)
# dot plot
# add labels with "label" in the aes() and layer geom_text()
# nudge_x adjust the labels horizontally
pdot <- ggplot(data=mygenelong, mapping=aes(x=group, y=value, col=group, label=sample_name)) +
geom_point() +
geom_text(nudge_x=0.2)
```
* Add more layers:
* **xlab()** to change the x axis label
* **ylab()** to change the y axis label
* **theme** to manage the legend
```{r}
pdot + xlab(label="Experimental group") +
ylab(label="Normalized expression (log2)") +
ggtitle(label="Expression of gene 1") +
theme_bw() +
theme(legend.position="none")
```
## Bar plots
```{r}
# A simple bar plot
ggplot(data=df2, mapping=aes(x=grouping)) + geom_bar()
```
* Customize:
* **scale_x_discrete** is used to handle x-axis title and labels
* **coord_flip** swaps the x and y axis
```{r}
# Save the plot in the object "p"
pbar <- ggplot(data=df2, mapping=aes(x=grouping, fill=grouping)) +
geom_bar()
# Change x axis label with scale_x_discrete and change order of the bars:
p2 <- pbar + scale_x_discrete(name="counts of yes / no", limits=c("yes", "no"))
# Swapping x and y axis with coord_flip():
p3 <- p2 + coord_flip()
# Change fill
p4 <- p3 + scale_fill_manual(values=c("yellow", "cyan"))
# Show intermediary and final plots
pbar
p2
p3
p4
```
## Histograms
Simple histogram on one sample (using the df2 data frame):
```{r}
ggplot(data=df1, mapping=aes(x=sample1)) + geom_histogram()
```
Histogram on more samples (using df_long):
```{r}
ggplot(data=df_long, mapping=aes(x=value)) + geom_histogram()
```
Split the data per sample ("variable" column that represents here the samples):
```{r}
ggplot(data=df_long, mapping=aes(x=value, fill=variable)) + geom_histogram()
```
By default, the histograms are **stacked**: change to position **dodge** (side by side):
```{r}
phist <- ggplot(data=df_long, mapping=aes(x=value, fill=variable)) +
geom_histogram(position='dodge')
```
## About themes
You can change the default global **theme** (background color, grid lines etc. all non-data display):
```{r}
# go back to a previous plot
p <- ggplot(data=df_long, mapping=aes(x=value)) + geom_histogram()
# Try different themes
p + theme_bw()
p + theme_minimal()
p + theme_void()
p + theme_grey()
p + theme_dark()
p + theme_light()
```
## Saving plots in files
* The same as for regular plots applies:
```{r, eval=F}
png("myggplot.png")
p
dev.off()
```
* You can also use the ggplot2 **ggsave** function:
```{r, eval=F}
# By default, save the last plot that was produced
ggsave(filename="lastplot.png")
# You can pick which plot you want to save:
ggsave(filename="myplot.png", plot=p)
# Many different formats are available:
# "eps", "ps", "tex", "pdf", "jpeg", "tiff", "png", "bmp", "svg", "wmf"
ggsave(filename="myplot.ps", plot=p, device="ps")
# Change the height and width (and their unit):
ggsave(filename="myplot.pdf",
width = 20,
height = 20,
units = "cm")
```
* You can also organize several plots on one page
* One way is to use the **gridExtra** package:
* ncol, nrow: arrange plots in such number of columns and rows
```{r, eval=F}
install.packages("gridExtra")
```
```{r}
# load package
library(gridExtra)
# 2 rows and 2 columns
grid.arrange(pscat, pbox, pbar, phist, nrow=2, ncol=2)
```
```{r, fig.width=10, fig.height=4,}
# 1 row and 4 columns
grid.arrange(pscat, pbox, pbar, phist, nrow=1, ncol=4)
```
**WARNING !!**: ggsave and grid.arrange are not directly compatible. To save a file organized by grid.arrange, use the regular functions (pdf, png etc.)
```{r, eval=FALSE}
jpeg("grid_arrange_plots.jpg")
grid.arrange(pscat, pbox, pbar, phist, nrow=1, ncol=4)
dev.off()
```