Packages
@@ -477,7 +477,7 @@
Loading the data
We can read in a file from a path on our computer on on the web and use this as the value. Note that we need to put quotes (“”) around file paths.
Assignment operator shortcut
-In RStudio, typing Alt + - (push Alt at the same time as the - key) will write <-
in a single keystroke in a PC, while typing > Option + - (push Option at the same time as the - key) does the same in a Mac.
+In RStudio, typing Alt + - (push Alt at the same time as the - key) will write <-
in a single keystroke in Windows, while typing > Option + - (push Option at the same time as the - key) does the same in a Mac.
@@ -556,8 +556,9 @@
Getting to know the data
colnames(sampleinfo)
-
-
[1] "X1" "characteristics" "immunophenotype" "developmental stage"
+
+
[1] "X1" "characteristics" "immunophenotype"
+[4] "developmental stage"
@@ -567,9 +568,10 @@
Getting to know the data
sampleinfo$X1
-
-
[1] "GSM1480291" "GSM1480292" "GSM1480293" "GSM1480294" "GSM1480295" "GSM1480296" "GSM1480297"
- [8] "GSM1480298" "GSM1480299" "GSM1480300" "GSM1480301" "GSM1480302"
+
+
[1] "GSM1480291" "GSM1480292" "GSM1480293" "GSM1480294" "GSM1480295"
+ [6] "GSM1480296" "GSM1480297" "GSM1480298" "GSM1480299" "GSM1480300"
+[11] "GSM1480301" "GSM1480302"
@@ -613,35 +615,42 @@
Getting to know the data
summary(counts)
-
-
X1 gene_symbol GSM1480291 GSM1480292 GSM1480293
- Length:23735 Length:23735 Min. : 0.000 Min. : 0.000 Min. : 0.00
- Class :character Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
- Mode :character Mode :character Median : 1.745 Median : 1.891 Median : 0.92
- Mean : 42.132 Mean : 42.132 Mean : 42.13
- 3rd Qu.: 29.840 3rd Qu.: 29.604 3rd Qu.: 21.91
- Max. :12525.066 Max. :12416.211 Max. :49191.15
- GSM1480294 GSM1480295 GSM1480296 GSM1480297
- Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000
- 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
- Median : 0.89 Median : 0.58 Median : 0.54 Median : 2.158
- Mean : 42.13 Mean : 42.13 Mean : 42.13 Mean : 42.132
- 3rd Qu.: 19.92 3rd Qu.: 12.27 3rd Qu.: 12.28 3rd Qu.: 27.414
- Max. :55692.09 Max. :111850.87 Max. :108726.08 Max. :10489.311
- GSM1480298 GSM1480299 GSM1480300 GSM1480301
- Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
- 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
- Median : 2.254 Median : 1.854 Median : 1.816 Median : 1.629
- Mean : 42.132 Mean : 42.132 Mean : 42.132 Mean : 42.132
- 3rd Qu.: 26.450 3rd Qu.: 24.860 3rd Qu.: 23.443 3rd Qu.: 23.443
- Max. :10662.486 Max. :15194.048 Max. :17434.935 Max. :19152.728
- GSM1480302
- Min. : 0.000
- 1st Qu.: 0.000
- Median : 1.749
- Mean : 42.132
- 3rd Qu.: 24.818
- Max. :15997.193
+
+
X1 gene_symbol GSM1480291
+ Length:23735 Length:23735 Min. : 0.000
+ Class :character Class :character 1st Qu.: 0.000
+ Mode :character Mode :character Median : 1.745
+ Mean : 42.132
+ 3rd Qu.: 29.840
+ Max. :12525.066
+ GSM1480292 GSM1480293 GSM1480294
+ Min. : 0.000 Min. : 0.00 Min. : 0.00
+ 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00
+ Median : 1.891 Median : 0.92 Median : 0.89
+ Mean : 42.132 Mean : 42.13 Mean : 42.13
+ 3rd Qu.: 29.604 3rd Qu.: 21.91 3rd Qu.: 19.92
+ Max. :12416.211 Max. :49191.15 Max. :55692.09
+ GSM1480295 GSM1480296 GSM1480297
+ Min. : 0.00 Min. : 0.00 Min. : 0.000
+ 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
+ Median : 0.58 Median : 0.54 Median : 2.158
+ Mean : 42.13 Mean : 42.13 Mean : 42.132
+ 3rd Qu.: 12.27 3rd Qu.: 12.28 3rd Qu.: 27.414
+ Max. :111850.87 Max. :108726.08 Max. :10489.311
+ GSM1480298 GSM1480299 GSM1480300
+ Min. : 0.000 Min. : 0.000 Min. : 0.000
+ 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
+ Median : 2.254 Median : 1.854 Median : 1.816
+ Mean : 42.132 Mean : 42.132 Mean : 42.132
+ 3rd Qu.: 26.450 3rd Qu.: 24.860 3rd Qu.: 23.443
+ Max. :10662.486 Max. :15194.048 Max. :17434.935
+ GSM1480301 GSM1480302
+ Min. : 0.000 Min. : 0.000
+ 1st Qu.: 0.000 1st Qu.: 0.000
+ Median : 1.629 Median : 1.749
+ Mean : 42.132 Mean : 42.132
+ 3rd Qu.: 23.443 3rd Qu.: 24.818
+ Max. :19152.728 Max. :15997.193
@@ -748,8 +757,8 @@
Plotting with ggplot2
ggplot(data = allinfo, mapping = aes(x = Sample, y = Count)) +
geom_boxplot()
-
-
+
+
@@ -761,8 +770,8 @@
Plotting with ggplot2
ggplot(data = allinfo, mapping = aes(x = Sample, y = log2(Count))) +
geom_boxplot()
-
-
+
+
@@ -773,15 +782,15 @@
Plotting with ggplot2
ggplot(data = allinfo, mapping = aes(x = Sample, y = log2(Count + 1))) +
geom_boxplot()
-
-
+
+
The box plots show that the distributions of the samples are not identical but they are not very different.
Box plots are useful summaries, but hide the shape of the distribution. For example, if the distribution is bimodal, we would not see it in a boxplot. An alternative to the boxplot is the violin plot, where the shape (of the density of points) is drawn. See here for an example of how differences in distribution may be hidden in box plots but revealed with violin plots. We could also make jitter plots. A jitter plot is similar to a scatter plot. It adds a small amount of random variation to the location of each point so they don’t overlap. There are too many points in this case for the jitter plots to be useful but this is just to demonstrate, as jitter with and without boxplot is a commonly used ggplot type. We will also make use of jitter plots later.
-
-
Exercise
+
+
Exercises
You can easily make different types of plots with ggplot by using different geoms. Using the same data (same x and y values), try editing the code above to make the plots listed in 1. 2. and 3.
- Make a violin plot (geom_violin)
@@ -796,8 +805,8 @@ Exercise
ggplot(data = allinfo, mapping = aes(x = Sample, y = log2(Count + 1), colour = Sample)) +
geom_boxplot()
-
-
+
+
@@ -808,16 +817,16 @@ Exercise
ggplot(data = allinfo, mapping = aes(x = Sample, y = log2(Count + 1), fill = Sample)) +
geom_boxplot()
-
-
+
+
That looks better. fill =
is used to fill in areas in ggplot2 plots, whereas colour =
is used to colour lines and points.
A really nice feature about ggplot is that we can easily colour by another variable by simply changing the column we give to fill =
.
-
-
Exercise
+
+
Exercises
Modify the plot above. Colour by other variables (columns) in the metadata file:
- characteristics
@@ -870,7 +879,7 @@ Make subplots for each gene
Note on specifying genes
This example is to demonstrate how we could specify any genes in the data to plot. The genes used here were the 8 genes with the highest counts summed across all samples. The command for how to get the gene symbols for these 8 genes is shown below.
-allinfo %>%
+mygenes <- allinfo %>%
group_by(gene_symbol) %>%
summarise(Total_count = sum(Count)) %>%
arrange(desc(Total_count)) %>%
@@ -893,8 +902,8 @@
Note on specifying genes
geom_boxplot() +
facet_wrap(~ gene_symbol)
-
-
+
+
@@ -906,8 +915,8 @@ Note on specifying genes
geom_point() +
facet_wrap(~ gene_symbol)
-
-
+
+
@@ -919,8 +928,8 @@ Note on specifying genes
geom_jitter() +
facet_wrap(~ gene_symbol)
-
-
+
+
@@ -932,8 +941,8 @@ Note on specifying genes
geom_jitter() +
facet_wrap(~ gene_symbol)
-
-
+
+
@@ -959,8 +968,8 @@ Specifying colours
facet_wrap(~ gene_symbol) +
scale_colour_manual(values = mycolours)
-
-
+
+
@@ -973,13 +982,13 @@ Specifying colours
facet_wrap(~ gene_symbol) +
scale_colour_brewer(palette = "Dark2")
-
-
+
+
-
-
Exercise
+
+
Exercises
Make a colourblind friendly plot. Hint there are colourblind friendly palettes here
@@ -994,8 +1003,8 @@ Axis labels and Title
facet_wrap(~ gene_symbol) +
labs(x = "Cell type and stage", y = "Count", title = "Mammary gland RNA-seq data")
-
-
+
+
@@ -1012,8 +1021,8 @@ Themes
labs(x = "Cell type and stage", y = "Count", title = "Mammary gland RNA-seq data") +
theme(axis.text.x = element_text(angle = 90))
-
-
+
+
@@ -1029,8 +1038,8 @@ Themes
theme_bw() +
theme(axis.text.x = element_text(angle = 90))
-
-
+
+
@@ -1044,8 +1053,8 @@ Themes
theme_minimal() +
theme(axis.text.x = element_text(angle = 90))
-
-
+
+
@@ -1063,8 +1072,8 @@ Themes
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
-
-
+
+
@@ -1089,16 +1098,16 @@ Order of groups
-Take a look at the data.
+Take a look at the data. As the table is quite wide we can use select()
to select just the columns we want to view.
-
-mygenes_counts
+
+mygenes_counts %>% select(X1, Group, Group_f)
-
+
@@ -1149,8 +1158,8 @@ Order of groups
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
-
-
+
+
@@ -1159,7 +1168,7 @@ Order of groups
Saving plots
-
We can save plots interactively by clicking Export in the Plots window. Or we can output plots to pdf using pdf()
followed by dev.off()
. We put our plot code after the call to pdf()
and before closing the plot device with dev.off()
.
+
We can save plots interactively by clicking Export in the Plots window and saving as e.g. “myplot.pdf”. Or we can output plots to pdf using pdf()
followed by dev.off()
. We put our plot code after the call to pdf()
and before closing the plot device with dev.off()
.
Let’s save our last plot.
@@ -1177,9 +1186,8 @@
Saving plots
-
-
-
Exercises
+
+
Exercises
- Download the raw counts for this dataset
@@ -1191,6 +1199,7 @@
Exercises
Download the normalised counts for the GSE63310 dataset from GREIN. Make boxplots colouring the samples using different columns in the metadata file.
+
Key Points
@@ -1209,7 +1218,7 @@ Further Reading
-

+
