-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME.Rmd
282 lines (208 loc) · 11.8 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
require(knitr)
knitr::opts_knit$set(global.par = TRUE)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "500px",
dpi = 150,
fig.align = "center",
fig.width=4,
fig.height=4,
out.width=400
)
```
```{r echo=FALSE}
par(mar=c(5,5,1,1))
```
# ggfree: ggplot2-style plots with just base R graphics
[ggplot2](https://ggplot2.tidyverse.org/) is a popular R graphics package that is becoming synonymous with data visualization in R.
The community of developers working within the `ggplot2` framework have implemented some [rather nice extensions](https://www.ggplot2-exts.org/gallery/) as well.
However, it is almost always possible for a visualization produced in `ggplot2` to also be generated using the base graphics package in R.
Long-time users of R who are accustomed to building plots with the latter may find the syntax of `ggplot2` counter-intuitive and awkward.
The overall purpose of `ggfree` is to make it easier to generate plots in the style of [ggplot2](https://ggplot2.tidyverse.org/) and its extensions, without ever actually using any ggplot2 code.
![](man/figures/collage.png)
## Installation
* `ggfree` requires the package [`ape`](https://cran.r-project.org/web/packages/ape/index.html), which you can install by running the command `install.packages('ape')` within R.
* The simplest way to install `ggfree` is to download this package and then install it on the command line. You can use the GitHub web interface to download the latest version of this package as a ZIP archive, by clicking on the green *Code* button and then selecting the *Download ZIP* option on the contextual menu that appears. If you have the [`git`](https://git-scm.com/) version control program installed on your computer, then you can navigate to the desired location in your filesystem and run the command: `git clone https://github.com/ArtPoon/ggfree`
In either case, navigate to the `ggfree` directory in your Terminal app and run the command:
```console
art@Wernstrom ggfree % R CMD INSTALL .
* installing to library ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library’
```
* You can also use the [`devtools`](https://cran.r-project.org/web/packages/devtools/index.html) package to install `ggfree` within R. If you already have `devtools` installed in your R environment, then you can simply run:
```R
# install.packages('devtools') # if you haven't already installed devtools
require(devtools)
devtools::install_github("ArtPoon/ggfree")
```
However, `devtools` is a large R package with many dependencies, so I don't recommend this method if you haven't already installed it.
## Examples
### Slopegraphs
In general, a slopegraph is a method for visually comparing a paired set of observations.
To illustrate the use of slopegraphs, I've adapted a carbon dioxide emissions data set similar to the one used by @clauswilke to demonstrate slopegraphs in *Fundamentals of Data Visualization*.
This data sets is packaged with `ggfree`:
```{r}
require(ggfree)
co2.emissions
```
There are two styles of slopegraphs that are implemented in `ggfree`.
For the first style (where the argument `type` is left at its default value `'b'`), a vertical axis provides reference for the numerical values:
```{r fig.width=5, fig.height=5, out.width=500}
slopegraph(co2.emissions, colorize=T)
```
Setting `colorize` to `TRUE` causes the line segments to be coloured to emphasize positive and negative slopes.
For the second style (setting `type='t'`), we substitute the raw numerical values for data points, which makes the vertical axis unnecessary:
```{r fig.width=5, fig.height=5, out.width=500}
par(family='Palatino') # use a fancier font
# the actual code here
slopegraph(co2.emissions, type='t', cex.text=0.6, names.arg=c(2000, 2010))
# make a nice title
title(expression(text=paste('CO'[2], ' emissions (metric tons) per capita')),
cex=0.7)
```
### Ringplots
A ring- or donut-plot is simply a piechart with a hole in it.
Piecharts have been criticized for being potentially misleading because it is difficult to compare the areas of two different sectors by eye.
However, they are intuitive and compact visual devices, and multiple plots can be drawn in varying sizes to encode additional information, such as sample size.
The additional advantages of ringplots is that information can be embedded in the middle of the plot as text, and multiple ringplots can be nested within each other to display hierarchical frequency data (these are sometimes known as "sunburst" plots).
To generate a ring-plot in `ggfree`, you need to pass a vector of numeric values and specify the inner and outer radii:
```{r}
# prepare colour palettes
require(RColorBrewer)
pal1 <- brewer.pal(5, 'Blues')
pal2 <- brewer.pal(5, 'Reds')
# calling ringplot without x, y args makes new plot
ringplot(VADeaths[,1], r0=0.4, r1=0.65, col=pal1)
# called with x, y args adds ring to existing plot;
# setting use.names to TRUE adds labels
ringplot(VADeaths[,2], x=0, y=0, r0=0.65, r1=0.9, col=pal2,
use.names=T, offset=0.05, srt=90)
# write a label in the middle
text(x=0, y=0, adj=0.5, label='Death rates\nin Virginia\n(1940)', cex=0.8)
```
### Polar area charts
A polar area chart is similar to a ringplot, except that the sectors corresponding to different levels of a factor are drawn at regular angles from the centre.
Instead, the frequencies of the respective levels are visualized by scaling the *area* of the annular sector (donut slice).
To illustrate, we're going to reproduce the classic plot by [Florence Nightingale](https://en.wikipedia.org/wiki/Pie_chart#Polar_area_diagram):
```{r fig.width=5, fig.height=5, out.width=500}
pal <- brewer.pal(3, 'Pastel2')
# load the Florence Nightingale data set (note, need to install HistData)
require(HistData)
ng <- subset(Nightingale, Year==1855, c('Wounds.rate', 'Other.rate', 'Disease.rate'))
row.names(ng) <- Nightingale$Month[Nightingale$Year==1855]
par(mar=rep(0,4))
# the actual plotting function
polarplot(as.matrix(ng), x=0.2, y=0.3, theta=1.1*pi, col=pal,
use.names=T)
# add some nice labels
title('Causes of mortality in British army, Crimean War (1855)',
font.main=1, family='Palatino', line=-3)
legend(x=-0.8, y=0.6, legend=c('Wounds', 'Other', 'Disease'), bty='n',
fill=pal, cex=0.9)
```
### Ridgeplots
Ridgeplots are basically stacked [kernel densities](https://en.wikipedia.org/wiki/Kernel_density_estimation).
Displacing each density curve a small amount along the vertical axis can make it easier to distinguish one curve from another.
The end result can also resemble a topographical map, which is likely the etymology of the name for this type of plot.
In this example, we're going to make use of the `add.alpha` function in `ggfree:common` that adds transparency to colour specifications in R:
```{r}
par(mar=c(5,5,1,1))
pal <- add.alpha(brewer.pal(3, 'Set1'), 0.5)
ridgeplot(split(iris$Sepal.Length, iris$Species), step=0.4, col='white',
fill=pal, lwd=2, xlab='Sepal length', cex.lab=1.2)
```
### Stacked area plots
[Stacked area plots](https://en.wikipedia.org/wiki/Area_chart) are similar to stacked barplots (obtained by calling `barplot` with a matrix), but drawing polygons that span the horizontal range of the plot instead of separate rectangles.
This example uses base R dataset that comprises the daily closing prices of major stock markets in Europe:
```{r}
stackplot(EuStockMarkets, xlab='Days (1991-1998)',
ylab='Daily Closing Price', bty='n')
```
A useful aesthetic device is to separate the baseline from the horizontal axis such that the areas flow both below and above a central axis.
## Trees
```{r echo=FALSE}
par(family="Helvetica")
```
For a detailed description of drawing trees with `ggfree`, I've written a vignette on under the package directory:
[vignettes/ggfree-trees.md](https://github.com/ArtPoon/ggfree/blob/master/vignettes/ggfree-trees.md).
Here is a demonstration of the basic tree drawing functionality using the same random tree employed by the `ggtree` package:
```{r fig.width=8, fig.height=8, out.width="80%"}
set.seed(1999); phy <- rtree(50)
par(mfrow=c(2,2))
# default rectangular layout with "time" axis
plot(tree.layout(phy)); axis(side=1)
# slanted layout with unscaled branches
plot(tree.layout(phy, type='s', unscaled=T))
# radial layout with node labels
plot(tree.layout(phy, type='o'), label='b', cex.lab=0.6)
# equal-angle (unrooted) layout without labels
plot(tree.layout(phy, type='u'), label='n')
```
The function `tree.layout` returns an object that holds the `x` and `y` coordinates for nodes and edges of the tree, depending on which layout algorithm the user has requested.
This exposes the data generated by the layout for subsequent annotation of the plot — a rather different approach to that taken by the plot functions in the `ape` package, which otherwise yields the same basic plots shown above.
So what's the point? Now that we have the layout data, we have the freedom to add any customization we can think of to the tree visualization.
```{r echo=FALSE}
par(mfrow=c(1,1), cex=1)
```
### Flu example
Here is the source code to reproduce one of the example figures from the `ggtree` [application note](https://doi.org/10.1111/2041-210X.12628):
```{r fig.width=10, fig.height=10, out.width="75%"}
# some pre-processing to add dN/dS data to tree, see vignette
L <- tree.layout(flu, 'r')
plot(L, cex=0.6, type='n', mar=c(3,1,0,20), label='n')
# map dN/dS values to colours
pal <- colorRampPalette(c('#0072B2', '#D55E00'))(20)
breaks <- c(seq(0, 1.5, length.out=19), 1000)
col <- pal[as.integer(cut(L$edges$dnds, breaks=breaks))]
# draw labels
host <- ifelse(grepl('Swine', L$nodes$label), '#E41A1C', '#377EB8')
text(L, align=TRUE, cex=0.75, col=host)
# draw tree
lines(L, col=col, lwd=3)
axis(side=1, at=seq(0, 20, 5), labels=seq(1990, 2010, 5), line=-2)
# draw points on tips
points(L, pch=20, col=host, cex=ifelse(L$nodes$n.tips==0, 1.5, 0))
# map colors from edges to nodes
index <- L$edges$child[L$edges$isTip]
draw.guidelines(L, col=host[index])
# load genotype data (example from ggtree)
path <- system.file("extdata/Genotype.txt", package="ggfree")
geno <- read.table(path, header=T, sep='\t', na.strings='')
geno <- geno[match(flu$tip.label, row.names(geno)), ]
# draw boxes
require(RColorBrewer)
col <- brewer.pal(3, 'Set2')
image(L, geno[index, ], xlim=c(30, 37), col=col, cex.axis=0.75, line=-2)
```
Note that most of the functions being used here are generic S3 methods in base R (namely, `plot`, `text`, `lines`, `points` and `image`).
### Birds example
Here is code for decorating a phylogeny of bird families with the numbers of species:
```{r fig.width=10, fig.height=10, out.width="80%"}
data(bird.families)
# taxonomic info from BirdLife International
path <- system.file("extdata/birdlife.csv", package='ggfree')
birds <- read.csv(path, row.names=1)
# some entries are missing
missing <- data.frame(
Family=c("Dendrocygnidae", "Bucorvidae", "Rhinopomastidae", "Dacelonidae", "Cerylidae", "Centropidae", "Coccyzidae", "Crotophagidae", "Neomorphidae", "Batrachostomidae", "Eurostopodidae", "Chionididae", "Eopsaltriidae"),
Count=c(8, 2, 3, 70, 9, 10, 13, 4, 6, 5, 3, 2, 44)
)
birds <- rbind(birds, missing)
# map information to tree
index <- match(bird.families$tip.label, birds$Family)
require(RColorBrewer)
pal <- brewer.pal(9, 'Blues')[2:9]
bins <- as.integer(cut(log(birds$Count[index]), breaks=8))
# draw the tree, offsetting the labels for our image
L <- tree.layout(bird.families, type='o')
plot(L, cex.lab=0.7, offset=2, mar=rep(5,4), col='chocolate')
image(L, z=as.matrix(bins), xlim=c(28.5,30), col=pal)
```
## Other works
* @aronekuld's [beeswarm](https://github.com/aroneklund/beeswarm) provides a nice implementation of dot plots where overlapping points are displaced away from the vertical axis so they can be seen individually.