-
Notifications
You must be signed in to change notification settings - Fork 50
/
Copy path07_ggtree_tree_with_data.Rmd
418 lines (273 loc) · 25.7 KB
/
07_ggtree_tree_with_data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
\newpage
# Plotting tree with data {#chapter7}
```{r include=F}
library(knitr)
opts_chunk$set(message=FALSE, warning=FALSE, eval=TRUE, echo=TRUE, cache=TRUE)
```
Integrating user data to annotate a phylogenetic tree can be done at different
levels. The `r Biocpkg("treeio")` package [@wang_treeio_2020] implements `full_join()` methods to combine tree data to phylogenetic tree object.
The `r CRANpkg("tidytree")` package supports linking tree data to phylogeny
using tidyverse verbs (see also [Chapter 2](#chapter2)).
The `r Biocpkg("ggtree")` package [@yu_two_2018] supports mapping external data to phylogeny for
visualization and annotation on the fly. Although the feature of linking external data is overlapping among these packages, they have different application scopes. For example, in addition to the `treedata` object, `r Biocpkg("ggtree")` also supports several other tree objects (see [Chapter 9](#chapter9)), including `phylo4d`, `phyloseq`, and `obkData` that were designed to contain domain-specific data. The design of these objects did not consider supporting linking external data to the object (it can not be done at the tree object level). We can visualize trees from these objects using `r Biocpkg("ggtree")` and link external data at the visualization level [@yu_two_2018].
The `r Biocpkg("ggtree")` package provides two general methods for mapping and visualizing associated external data on phylogenies. [Method 1](#attach-operator) allows external data to be mapped on the tree structure and used as visual characteristics in the tree and data visualization. [Method 2](#facet_plot) plots the data with the tree side-by-side using different geometric functions after reordering the data based on the tree structure. These two methods integrate data with phylogeny for further exploration and comparison in the evolutionary biology context. The `r Biocpkg("ggtreeExtra")` provides a better implementation of the Method 2 proposed in `r Biocpkg("ggtree")` (see also [Chapter 10](#chapter10)) and works with both rectangular and circular layouts [@ggtreeExtra_2021].
## Mapping Data to The tree Structure {#attach-operator}
In `r Biocpkg("ggtree")`, we implemented an operator, `%<+%`, to attach annotation data to a `ggtree` graphic object. Any data that contains a column of "node" or the first column of taxa labels can be integrated using the `%<+%` operator. Multiple datasets can be attached progressively. When the data are attached, all the information stored in the data serves as numerical/categorical node attributes and can be directly used to visualize the tree by scaling the attributes as different colors or line sizes, labeling the tree using the original values of the attributes or parsing them as [math expression](#faq-formatting-label), [emoji](#phylomoji) or [silhouette image](#ggimage). The following example uses the `%<+%` operator to integrate taxon (`df_tip_data`) and internal node (`df_inode_data`) information and map the data to different colors or shapes of symbolic points and labels (Figure \@ref(fig:attacher)). The tip data contains `imageURL` that links to online figures of the species, which can be parsed and used as tip labels in `r Biocpkg("ggtree")` (see [Chapter 8](#chapter8))\index{data integration}.
(ref:attacherscap) Example of attaching multiple datasets.
(ref:attachercap) **Example of attaching multiple datasets**. External datasets including tip data (e.g., trophic habit and body weight) and node data (e.g., clade posterior and vernacular name) were attached to the `ggtree` graphic via the `%<+%` operator and the data was used to annotate the tree.
```{r attacher, fig.width=9.5, fig.height=6.2, warning=FALSE, message=FALSE, fig.cap="(ref:attachercap)", fig.scap="(ref:attacherscap)", out.width='100%'}
library(ggimage)
library(ggtree)
library(TDbook)
# load `tree_boots`, `df_tip_data`, and `df_inode_data` from 'TDbook'
p <- ggtree(tree_boots) %<+% df_tip_data + xlim(-.1, 4)
p2 <- p + geom_tiplab(offset = .6, hjust = .5) +
geom_tippoint(aes(shape = trophic_habit, color = trophic_habit,
size = mass_in_kg)) +
theme(legend.position = "right") +
scale_size_continuous(range = c(3, 10))
p2 %<+% df_inode_data +
geom_label(aes(label = vernacularName.y, fill = posterior)) +
scale_fill_gradientn(colors = RColorBrewer::brewer.pal(3, "YlGnBu"))
```
Although the data integrated by the `%<+%` operator in `r Biocpkg("ggtree")` is for tree visualization, the data attached to the `ggtree` graphic object can be converted to `treedata` object that contains the tree and the attached data (see [session 7.5](#ggtree_object)).
## Aligning Graph to the Tree Based on the Tree Structure {#facet_plot}
For associating phylogenetic tree with different types of plot produced by user's data, `r Biocpkg("ggtree")` provides `geom_facet()` layer and `facet_plot()` function which accept an input `data.frame` and a `geom` layer to draw the input data. The data will be displayed in an additional panel of the plot. The `geom_facet()` (or `facet_plot`) is a general solution for linking the graphic layer to a tree. The function internally re-orders the input data based on the tree structure and visualizes the data at the specific panel by the geometric layer. Users are free to visualize several panels to plot different types of data as demonstrated in Figure \@ref(fig:phyloseq) and to use different geometric layers to plot the same dataset (Figure \@ref(fig:jv2017)) or different datasets on the same panel.
The `geom_facet()` is designed to work with most of the `geom` layers defined in `r CRANpkg("ggplot2")` and other `r CRANpkg("ggplot2")`-based packages. A list of the geometric layers that work seamlessly with `geom_facet()` and `facet_plot()` can be found in Table \@ref(tab:facet-geom). As the `r CRANpkg("ggplot2")` community keeps expanding and more `geom` layers will be implemented in either `r CRANpkg("ggplot2")` or other extensions, `geom_facet()` and `facet_plot()` will gain more power to present data in the future. Note that different `geom` layers can be combined to present data on the same panel and the combinations of different `geom` layers create the possibility to present more complex data with phylogeny (see also Figures \@ref(fig:jv2017) and \@ref(fig:gggenes)). Users can progressively add multiple panels to present and compare different datasets in the evolutionary context (Figure \@ref(fig:plottree)). Detailed descriptions can be found in the [supplemental file](https://github.com/GuangchuangYu/plotting_tree_with_data/) of [@yu_two_2018]\index{geom\textunderscore facet}.
(ref:plottreescap) Example of plotting SNP and trait data.
(ref:plottreecap) **Example of plotting SNP and trait data**. The 'location' information was attached to the tree and used to color tip symbols (Tree panel), and other datasets. SNP and Trait data were visualized as dot chart (SNP panel) and bar chart (Trait panel).
```{r plottree, fig.width=12, fig.height=7, message=F, fig.cap="(ref:plottreecap)", fig.scap="(ref:plottreescap)", out.width='100%'}
library(ggtree)
library(TDbook)
## load `tree_nwk`, `df_info`, `df_alleles`, and `df_bar_data` from 'TDbook'
tree <- tree_nwk
snps <- df_alleles
snps_strainCols <- snps[1,]
snps<-snps[-1,] # drop strain names
colnames(snps) <- snps_strainCols
gapChar <- "?"
snp <- t(snps)
lsnp <- apply(snp, 1, function(x) {
x != snp[1,] & x != gapChar & snp[1,] != gapChar
})
lsnp <- as.data.frame(lsnp)
lsnp$pos <- as.numeric(rownames(lsnp))
lsnp <- tidyr::gather(lsnp, name, value, -pos)
snp_data <- lsnp[lsnp$value, c("name", "pos")]
## visualize the tree
p <- ggtree(tree)
## attach the sampling information data set
## and add symbols colored by location
p <- p %<+% df_info + geom_tippoint(aes(color=location))
## visualize SNP and Trait data using dot and bar charts,
## and align them based on tree structure
p + geom_facet(panel = "SNP", data = snp_data, geom = geom_point,
mapping=aes(x = pos, color = location), shape = '|') +
geom_facet(panel = "Trait", data = df_bar_data, geom = geom_col,
aes(x = dummy_bar_value, color = location,
fill = location), orientation = 'y', width = .6) +
theme_tree2(legend.position=c(.05, .85))
```
Companion functions to adjust [panel widths](#facet_widths) and [rename panel names](#facet_labeller) are described in [session 12.1](#facet-utils). Removing the panel name is also possible and an example was presented in Figure \@ref(fig:gggenes). We can also use `r CRANpkg("aplot")` or `r CRANpkg("patchwork")` to create composite plots as described in [session 7.5](#composite_plot).
The `geom_facet()` (or `facet_plot()`) internally used `ggplot2::facet_grid()` and only works with Cartesian coordinate system. To align the graph to the tree for the polar system (e.g., for circular or fan layouts), we developed another Bioconductor package, `r Biocpkg("ggtreeExtra")`. The `r Biocpkg("ggtreeExtra")` package provides the `geom_fruit()` layer that works similar to `geom_facet()` (details described in [Chapter 10](#chapter10)). The `geom_fruit()` is a better implementation of the Method 2 proposed in [@yu_two_2018].
## Visualize a Tree with an Associated Matrix {#gheatmap}
The `gheatmap()` function is designed to visualize the phylogenetic tree with a heatmap of an associated matrix (either numerical or categorical). The `geom_facet()` layer is a general solution for plotting data with the tree, including heatmap. The `gheatmap()` function is specifically designed for plotting heatmap with a tree and provides a shortcut for handling column labels and color palettes. Another difference is that `geom_facet()` only supports rectangular and slanted tree layouts, while `gheatmap()` supports rectangular, slanted, and circular (Figure \@ref(fig:mgheatmap)) layouts\index{heatmap}.
In the following example, we visualized a tree of H3 influenza viruses with their associated genotypes (Figure \@ref(fig:gheatmap)A).
```{r fig.width=8, fig.height=6, fig.align="center", warning=FALSE, message=FALSE, eval=F}
beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)
genotype_file <- system.file("examples/Genotype.txt", package="ggtree")
genotype <- read.table(genotype_file, sep="\t", stringsAsFactor=F)
colnames(genotype) <- sub("\\.$", "", colnames(genotype))
p <- ggtree(beast_tree, mrsd="2013-01-01") +
geom_treescale(x=2008, y=1, offset=2) +
geom_tiplab(size=2)
gheatmap(p, genotype, offset=5, width=0.5, font.size=3,
colnames_angle=-45, hjust=0) +
scale_fill_manual(breaks=c("HuH3N2", "pdm", "trig"),
values=c("steelblue", "firebrick", "darkgreen"), name="genotype")
```
The `width` parameter is to control the width of the heatmap. It supports another parameter `offset` for controlling the distance between the tree and the heatmap, such as allocating space for tip labels.
For a timescaled tree, as in this example, it's more common to use *x*-axis by using `theme_tree2`. But with this solution, the heatmap is just another layer and will change the *x*-axis. To overcome this issue, we implemented `scale_x_ggtree()` to set the *x*-axis more reasonably (Figure \@ref(fig:gheatmap)B).
```{r fig.width=8, fig.height=6, fig.align="center", warning=FALSE, eval=F}
p <- ggtree(beast_tree, mrsd="2013-01-01") +
geom_tiplab(size=2, align=TRUE, linesize=.5) +
theme_tree2()
gheatmap(p, genotype, offset=8, width=0.6,
colnames=FALSE, legend_title="genotype") +
scale_x_ggtree() +
scale_y_continuous(expand=c(0, 0.3))
```
(ref:gheatmapscap) Example of plotting matrix with `gheatmap()`.
(ref:gheatmapcap) **Example of plotting matrix with `gheatmap()`**. A H3 influenza tree with a genotype table visualized as a heatmap (A). Tips were aligned and with a tailored *x*-axis for divergence times (tree) and genomic segments (heatmap) (B).
```{r gheatmap, fig.width=8, fig.height=12, warning=FALSE, message=FALSE, echo=F,fig.cap="(ref:gheatmapcap)", fig.scap="(ref:gheatmapscap)", out.width='90%'}
beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)
genotype_file <- system.file("examples/Genotype.txt", package="ggtree")
genotype <- read.table(genotype_file, sep="\t", stringsAsFactor=F)
colnames(genotype) <- sub("\\.$", "", colnames(genotype))
p1 <- ggtree(beast_tree, mrsd="2013-01-01") +
geom_treescale(x=2008, y=1, offset=2) +
geom_tiplab(size=2)
g1 <- gheatmap(p1, genotype, offset=5, width=0.5, font.size=3,
colnames_angle=-45, hjust=0) +
scale_fill_manual(breaks=c("HuH3N2", "pdm", "trig"),
values=c("steelblue", "firebrick", "darkgreen"), name="genotype")+
vexpand(.03, -1)
p2 <- ggtree(beast_tree, mrsd="2013-01-01") +
geom_tiplab(size=2, align=TRUE, linesize=.5) +
theme_tree2()
g2 <- gheatmap(p2, genotype, offset=8, width=0.6,
colnames=FALSE, legend_title="genotype") +
scale_x_ggtree() +
scale_y_continuous(expand=c(0, 0.3))
plot_list(g1, g2, ncol=1, tag_levels='A')
```
### Visualize a tree with multiple associated matrices {#gheatmap-ggnewscale}
Of course, we can use multiple `gheatmap()` function calls to align several associated matrices with the tree. However, `r CRANpkg("ggplot2")` doesn't allow us to use multiple `fill` scales^[See also discussion in <https://github.com/GuangchuangYu/ggtree/issues/78> and <https://groups.google.com/d/msg/bioc-ggtree/VQqbF79NAWU/IjIvpQOBGwAJ>].
To solve this issue, we can use the `r CRANpkg("ggnewscale")` package to create new `fill` scales. Here is an example of using `r CRANpkg("ggnewscale")` with `gheatmap()`.
(ref:mgheatmapscap) Example of plotting multiple matrix with `gheatmap()`.
(ref:mgheatmapcap) **Example of plotting multiple matrix with `gheatmap()`**. A data frame (with 'first' and 'second' columns) was visualized as a discrete heatmap and another data frame (with 'A', 'B' and 'C' columns) was visualized as a continuous heatmap with corresponding discrete and continuous palette of colors (Figure \@ref(fig:mgheatmap)).
```{r mgheatmap, fig.width=10, fig.height=8, fig.cap="(ref:gheatmapcap)", fig.scap="(ref:gheatmapscap)", out.width='100%'}
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
circ <- ggtree(tree, layout = "circular")
df <- data.frame(first=c("a", "b", "a", "c", "d", "d", "a",
"b", "e", "e", "f", "c", "f"),
second= c("z", "z", "z", "z", "y", "y",
"y", "y", "x", "x", "x", "a", "a"))
rownames(df) <- tree$tip.label
df2 <- as.data.frame(matrix(rnorm(39), ncol=3))
rownames(df2) <- tree$tip.label
colnames(df2) <- LETTERS[1:3]
p1 <- gheatmap(circ, df, offset=.8, width=.2,
colnames_angle=95, colnames_offset_y = .25) +
scale_fill_viridis_d(option="D", name="discrete\nvalue")
library(ggnewscale)
p2 <- p1 + new_scale_fill()
gheatmap(p2, df2, offset=15, width=.3,
colnames_angle=90, colnames_offset_y = .25) +
scale_fill_viridis_c(option="A", name="continuous\nvalue")
```
## Visualize a Tree with Multiple Sequence Alignments {#msaplot}
The `msaplot()` accepts a tree (output of `ggtree()`) and a fasta file, then it can visualize the tree with sequence alignment. We can specify the `width` (relative to the tree) of the alignment and adjust the relative position by `offset`, which is similar to the `gheatmap()` function (Figure \@ref(fig:msaplot)A)\index{MSA}.
```{r eval=F}
library(TDbook)
# load `tree_seq_nwk` and `AA_sequence` from 'TDbook'
p <- ggtree(tree_seq_nwk) + geom_tiplab(size=3)
msaplot(p, AA_sequence, offset=3, width=2)
```
A specific slice of the alignment can also be displayed by specifying the `window` parameter (Figure \@ref(fig:msaplot)B)..
```{r fig.width=7, fig.height=7, fig.align='center', warning=FALSE, eval=F}
p <- ggtree(tree_seq_nwk, layout='circular') +
geom_tiplab(offset=4, align=TRUE) + xlim(NA, 12)
msaplot(p, AA_sequence, window=c(120, 200))
```
(ref:msaplotscap) Example of plotting multiple sequence alignment with a tree.
(ref:msaplotcap) **Example of plotting multiple sequence alignments with a tree**. Whole MSA sequences were visualized with a tree in rectangular layout (A). Circular layout with a slice of alignment window (B).
```{r msaplot, fig.width=14, fig.height=7, warning=FALSE, echo=F, fig.cap="(ref:msaplotcap)", fig.scap="(ref:msaplotscap)", out.width='100%'}
library(ggtree)
library(TDbook)
p <- ggtree(tree_seq_nwk) + geom_tiplab(size=3) +
theme(legend.position="none")
g1 = msaplot(p, AA_sequence, offset=3, width=2)
p2 <- ggtree(tree_seq_nwk, layout='circular') +
geom_tiplab(offset=4, align=TRUE) + xlim(NA, 12) +
theme(legend.position="none")
g2 = msaplot(p2, AA_sequence, window=c(120, 200))
plot_list(g1, g2, ncol=2, tag_levels='A')
```
To better support visualizing multiple sequence alignments with a tree and other associated data, we developed the `r Biocpkg("ggmsa")` package with the ability to label the sequences and color the sequences with different color schemes [@yu_cp_2020]. The `ggmsa()` output is compatible with `geom_facet()` and `ggtreeExtra::geom_fruit()` and can be used to visualize a tree, multiple sequence alignments, and different types of associated data to explore their underlying linkages/associations.
## Composite Plots {#composite_plot}
In addition to aligning graphs to a tree using `geom_facet()` or `ggtreeExtra::geom_fruit()` and special cases using the `gheatmap()` and `msaplot()` functions, users can use `r CRANpkg("cowplot")`, `r CRANpkg("patchwork")`, `r CRANpkg("gtable")`^[<https://github.com/YuLab-SMU/ggtree/issues/313>] or other packages to create composite plots. However, extra efforts need to be done to make sure all the plots are aligned properly. The [`ggtree::get_taxa_name()`](#tiporder) function is quite useful for users to re-order their data based on the tree structure. To remove this obstacle, we created an R package `r CRANpkg("aplot")` that can re-order the internal data of a `ggplot` object and create composite plots that align properly with a tree\index{composite plots}.
In the following example, we have a tree with two associated datasets.
```{r}
library(ggplot2)
library(ggtree)
set.seed(2019-10-31)
tr <- rtree(10)
d1 <- data.frame(
# only some labels match
label = c(tr$tip.label[sample(5, 5)], "A"),
value = sample(1:6, 6))
d2 <- data.frame(
label = rep(tr$tip.label, 5),
category = rep(LETTERS[1:5], each=10),
value = rnorm(50, 0, 3))
g <- ggtree(tr) + geom_tiplab(align=TRUE) + hexpand(.01)
p1 <- ggplot(d1, aes(label, value)) + geom_col(aes(fill=label)) +
geom_text(aes(label=label, y= value+.1)) +
coord_flip() + theme_tree2() + theme(legend.position='none')
p2 <- ggplot(d2, aes(x=category, y=label)) +
geom_tile(aes(fill=value)) + scale_fill_viridis_c() +
theme_minimal() + xlab(NULL) + ylab(NULL)
```
<!--
We can extract coordinates from `ggtree` object using `r CRANpkg("dplyr")` verbs implemented in `r CRANpkg("tidytree")`, and merge the y-coordinates to our datasets.
``` r
library(dplyr)
library(tidytree)
d <- filter(g, isTip) %>% select(c(label, y))
dd1 <- left_join(d1, d, by='label')
dd2 <- left_join(d2, d, by='label')
```
Now we can visualize our datasets using the y-coordinates that match corresponding labels in the `ggtree` object.
-->
<!--
If we combine these two plots with `ggtree` using either `r CRANpkg("cowplot")` or `r Githubpkg("thomasp85", "patchwork")`, they do not align properly.
To address this issue, `r Biocpkg("ggtree")` provides a `ylim2()` function to reconcile y limits^[the implementation was inspired by <https://thackl.github.io/ggtree-composite-plots>].
``` r message=TRUE
p1 <- p1 + ylim2(g)
p2 <- p2 + ylim2(g)
```
Now we can plot our tree and the data side by side, and perfectly aligned.
## library(patchwork)
## g + p1 + p2 + plot_annotation(tag_levels="A")
library(cowplot)
plot_grid(g, p1, p2, ncol=3, align='h',
labels=LETTERS[1:3], rel_widths = c(1, .5, .8))
The `ylim2()` is not designed specific for `r Biocpkg("ggtree")`, it works fine with `ggplot` object. Another function `xlim2()` that works for x limits is also provided. See [session 10.5](#axis_align) for more details.
-->
If we align them using `r CRANpkg("cowplot")`, the composite plots are not aligned properly as we anticipated (Figure \@ref(fig:composite)A).
```r
cowplot::plot_grid(g, p2, p1, ncol=3)
```
Using `r CRANpkg("aplot")`, it will do all the dirty work for us and all the subplots are aligned properly as demonstrated in Figure \@ref(fig:composite)B.
```r
library(aplot)
p2 %>% insert_left(g) %>% insert_right(p1, width=.5)
```
(ref:compositescap) Example of aligning tree with data side by side to create composite plot.
(ref:compositecap) **Example of aligning tree with data side-by-side to create composite plot**. `r CRANpkg("cowplot")`` just places the subplots together (A), while `r CRANpkg("aplot")` does extra work to make sure that tree-associated subplots are properly ordered according to the tree structure (B). Note: The 'A' category in the bar plot that is not matched with the tree was removed.
```{r composite, echo = FALSE, fig.width=12, fig.height=10, fig.cap="(ref:compositecap)", fig.scap="(ref:compositescap)", out.width='100%'}
pg1 <- cowplot::plot_grid(g, p2, p1, ncol=3)
library(aplot)
pg2 <- p2 %>% insert_left(g) %>% insert_right(p1, width=.5)
pg2 <- ggplotify::as.ggplot(pg2)
cowplot::plot_grid(pg1, pg2, ncol=1, labels=c("A", "B"))
```
<!--
## The `ggtree` object {#ggtree_object}
to be finished.
## Update tree view with a new tree
In previous example, we have a _`p`_ object that stored the tree viewing of 13 tips and internal nodes highlighted with specific colored big dots. If users want to apply this pattern (we can imaging a more complex one) to a new tree, you don't need to build the tree step by step. `ggtree` provides an operator, _`%<%`_, for applying the visualization pattern to a new tree.
For example, the pattern in the _`p`_ object will be applied to a new tree with 50 tips as shown below:
``` r fig.width=3, fig.height=3, fig.align="center"}
p %<% rtree(50)
```
-->
## Summary {#summary7}
Although there are many software packages that support visualizing phylogenetic trees, plotting a tree with data is often missing or with only limited support. Some of the packages define `S4` classes to store phylogenetic tree with domain-specific data, such as `r CRANpkg("OutbreakTools")` [@jombart_outbreaktools_2014] defined `obkData` for storing tree with epidemiology data and `r Biocpkg("phyloseq")` [@mcmurdie_phyloseq_2013] defines `phyloseq` for storing tree with microbiome data. These packages are capable of presenting some of the data stored in the object on the tree. However, not all the associated data are supported. For example, species abundance stored in the `phyloseq` object is not supported to be visualized using the `r Biocpkg("phyloseq")` package. These packages did not provide any utilities to integrate external data for tree visualization. None of these packages support visualizing external data and aligning the plot to a tree based on the tree structure.
The `r Biocpkg("ggtree")` package provides two general solutions for integrating data. Method 1, the `%<+%` operator, can integrate external and internal node data and map the data as a visual characteristic to visualize the tree and other datasets used in `geom_facet()` or `ggtreeExtra::geom_fruit()`. Method 2, the `geom_facet` layer or `ggtreeExtra::geom_fruit()`, has no restriction of input data as long as there is a `geom` function available to plot the data (*e.g.*, species abundance displayed by `geom_density_ridges` as demonstrated in Figure \@ref(fig:phyloseq)). Users are free to combine different panels and combine different `geom` layers in the same panel (Figure \@ref(fig:jv2017)).
\newpage
The `r Biocpkg("ggtree")` package has many unique features that cannot be found in other implementations [@yu_two_2018]:
1. Integrating node/edge data to the tree can be mapped to visual characteristics of the tree or other datasets (Figure \@ref(fig:attacher)).
2. Capable of parsing expressions (math symbols or text formatting), emoji, and image files ([Chapter 8](#chapter8)).
3. No pre-definition of input data types or how the data should be plotted in `geom_facet()` (Table \@ref(tab:facet-geom)).
4. Combining different `geom` functions to visualize associated data is supported (Figure \@ref(fig:jv2017)).
5. Visualizing different datasets on the same panel is supported.
6. Data integrated by `%<+%` can be used in `geom_facet()` layer.
7. Able to add further annotations to specific layers.
8. Modular design by separating tree visualization, data integration (Method 1), and graph alignment (Method 2).
Modular design is a unique feature for `r Biocpkg("ggtree")` to stand out from other packages. The tree can be visualized with data stored in the tree object or external data linked by the `%<+%` operator, and fully annotated with multiple layers of annotations (Figures \@ref(fig:attacher) and \@ref(fig:jv2017)), before passing it to `geom_facet()` layer. The `geom_facet()` layer can be called progressively to add multiple panels or multiple layers on the same panels (Figure \@ref(fig:jv2017)). This creates the possibility of plotting a full annotated tree with complex data panels that contain multiple graphic layers.
The `r Biocpkg("ggtree")` package fits the `R` ecosystem and extends the abilities to integrate and present data with trees to existing phylogenetic packages. As demonstrated in Figure \@ref(fig:phyloseq), we can plot species abundance distributions with the `phyloseq` object. This cannot be easily done without `r Biocpkg("ggtree")`. With `r Biocpkg("ggtree")`, we are able to attach additional data to tree objects using the `%<+%` operator and align graphs to a tree using the `geom_facet()` layer. Integrating `r Biocpkg("ggtree")` into existing workflows will extend the abilities and broaden the applications to present phylogeny-associated data, especially for comparative studies.