forked from YuLab-SMU/treedata-book
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path04_ggtree_visualization.Rmd
620 lines (393 loc) · 35.8 KB
/
04_ggtree_visualization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
# (PART\*) Part II: Tree data visualization and annotation {-}
# Phylogenetic Tree Visualization {#chapter4}
```{r include=F}
library("ape")
library("grid")
library("ggplot2")
library("cowplot")
library("treeio")
library("ggtree")
```
## Introduction
There are many software packages and webtools that are designed for displaying phylogenetic trees, such as *TreeView* [@page_visualizing_2002], *FigTree*\index{FigTree}^[http://tree.bio.ed.ac.uk/software/figtree/], *TreeDyn* [@chevenet_treedyn:_2006], *Dendroscope* [@huson_dendroscope_2012], *EvolView*\index{EvolView} [@he_evolview_2016] and *iTOL*\index{iTOL} [@letunic_interactive_2007], *etc.*. Only several of them, such as *FigTree*, *TreeDyn* and *iTOL*, allow users to annotate the trees with coloring branches, highlighted clades and tree features. However, their pre-defined annotating functions are usually limited to some specific phylogenetic data. As phylogenetic trees are becoming more widely used in multidisciplinary studies, there is an increasing need to incorporate various types of the phylogenetic covariates and other associated data from different sources into the trees for visualizations and further analyses. For instance, influenza virus has a wide host range, diverse and dynamic genotypes and characteristic transmission behaviors that are mostly associated with the virus evolution and essentially among themselves. Therefore, in addition to standalone applications that focus on each of the specific analysis and data type, researchers studying molecular evolution need a robust and programmable platform that allows the high levels of integration and visualization of many of these different aspects of data (raw or from other primary analyses) over the phylogenetic trees to identify their associations and patterns.
To fill this gap, we developed `r Biocpkg("ggtree")`\index{ggtree}, a package for the R programming language [@rstats] released under the Bioconductor\index{Bioconductor} project [@gentleman_bioconductor_2004]. The `r Biocpkg("ggtree")` is built to work with phylogenetic data object (see [chapter 1](#chapter1) and [chapter 9](#chapter9)), and display tree graphics with `r CRANpkg("ggplot2")` package [@wickham_ggplot2_2016] that was based on the grammar of graphics [@wilkinson_grammar_2005].
The R language is increasingly being used in phylogenetics. However, a comprehensive package, designed for viewing and annotating phylogenetic trees, particularly with complex data integration, is not yet available. Most of the R packages in phylogenetics focus on specific statistical analyses rather than viewing and annotating the trees with more generalized phylogeny-associated data. Some packages, including `r CRANpkg("ape")`\index{ape} [@paradis_ape_2004] and `r CRANpkg("phytools")` [@revell_phytools_2012], which are capable of displaying and annotating trees, are developed using the base graphics system of R. In particular, `r CRANpkg("ape")` is one of the fundamental package for phylogenetic analysis and data processing. However, the base graphics system is relatively difficult to extend and limits the complexity of tree figure to be displayed. `r CRANpkg("OutbreakTools")` [@jombart_outbreaktools_2014] and `r Biocpkg("phyloseq")` [@mcmurdie_phyloseq_2013] extended `r CRANpkg("ggplot2")` to plot phylogenetic trees. The `r CRANpkg("ggplot2")` system of graphics allows rapid customization and exploration of design solutions. However these packages were designed for epidemiology and microbiome data respectively and did not aim to provide a general solution for tree visualization\index{visualization} and annotation\index{annotation}. The `r Biocpkg("ggtree")` package also inherits versatile properties of `r CRANpkg("ggplot2")`, and more importantly allows constructing complex tree figures by freely combining multiple layers of annotations using the tree associated data imported from different sources (see detailed in [Chapter 1](#chapter1)).
## Visualizing Phylogenetic Tree with *ggtree*
The `r Biocpkg("ggtree")` package is designed for annotating phylogenetic trees with their associated data of different types and from various sources. These data could come from users or analysis programs, and might include evolutionary rates, ancestral sequences\index{ancestral sequences}, *etc.* that are associated with the taxa from real samples, or with the internal nodes representing hypothetic ancestor strain/species, or with the tree branches indicating evolutionary time courses. For instance, the data could be the geographic positions of the sampled avian influenza viruses (informed by the survey locations) and the ancestral nodes (by phylogeographic inference) in the viral gene tree [@lam_phylodynamics_2012].
The `r Biocpkg("ggtree")` supports `r CRANpkg("ggplot2")`'s graphical language, which allows high level of customization, is intuitive and flexible. It is notable that `r CRANpkg("ggplot2")` itself does not provide low-level geometric objects or other support for tree-like structures, and hence `r Biocpkg("ggtree")` is a useful extension on that regard. Even though the other two phylogenetics-related R packages, `r CRANpkg("OutbreakTools")` and `r Biocpkg("phyloseq")`, are developed based on `r CRANpkg("ggplot2")`, the most valuable part of `r CRANpkg("ggplot2")`\index{ggplot2} syntax - adding layers of annotations - is not supported in these packages. For example, if we have plotted a tree without taxa labels, `r CRANpkg("OutbreakTools")` and `r Biocpkg("phyloseq")` provide no easy way for general `R` users, who have little knowledge about the infrastructures of these packages, to add a layer of taxa labels. The `r Biocpkg("ggtree")` extends `r CRANpkg("ggplot2")` to support tree objects by implementing a geometric layer, `geom_tree`, to support visualizing tree structure. In `r Biocpkg("ggtree")`, viewing a phylogenetic tree is relatively easy, via the command `ggplot(tree_object) + geom_tree() + theme_tree()` or `ggtree(tree_object)` for short. Layers of annotations can be added one by one via the `+` operator. To facilitate tree visualization, `r Biocpkg("ggtree")` provides several geometric layers, including `geom_treescale` for adding legend of tree branch scale (genetic distance, divergence time, *etc.*), `geom_range` for displaying uncertainty of branch lengths (confidence interval or range, *etc.*), `geom_tiplab` for adding taxa label, `geom_tippoint` and `geom_nodepoint` for adding symbols of tips and internal nodes, `geom_hilight` for highlighting a clade with rectangle, and `geom_cladelabel` for annotating a selected clade with a bar and text label, *etc.*. A full list of geometric layers provided by *ggtree* are summarized in Table \@ref(tab:geoms).
To view a phylogenetic tree, we first need to parse the tree file into `R`.
[Treeio](#chapter1) package parses diverse annotation data from different software outputs into `S4` phylogenetic data objects. The `r Biocpkg("ggtree")` mainly utilizes these `S4` objects to display and annotate the tree. There are also other R packages that defined `S3`/`S4` classes to store phylogenetic trees with domain specific associated data, including `phylo4` and `phylo4d` defined in `r CRANpkg("phylobase")` package, `obkdata` defined in `r CRANpkg("OutbreakTools")` package, and `phyloseq` defined in `r Biocpkg("phyloseq")` package (see also [Chapter 9](#chapter9)). All these tree objects are also supported in `r Biocpkg("ggtree")` and their specific annotation data can be used to annotate the tree in `r Biocpkg("ggtree")`. Such compatibility of `r Biocpkg("ggtree")` facilitates the integration of data and analysis results. In addition, `r Biocpkg("ggtree")` also supports other tree-like structure, including [dendrogram](#dendrogram).
### Basic Tree Visualization
The `r Biocpkg('ggtree')` package extends `r CRANpkg('ggplot2')` [@wickham_ggplot2_2009] package to support viewing phylogenetic tree.
It implements `geom_tree` layer for displaying phylogenetic tree, as shown below in Figure \@ref(fig:basicviz)A.
```{r fig.width=3, fig.height=3, fig.align="center", eval=F}
library("treeio")
library("ggtree")
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
ggplot(tree, aes(x, y)) + geom_tree() + theme_tree()
```
The function, `ggtree`, was implemented as a short cut to visualize a tree, and it works exactly the same as shown above.
`r Biocpkg('ggtree')` takes all the advantages of `r CRANpkg('ggplot2')`. For example, we can change the color, size and type of the lines as we do with `r CRANpkg('ggplot2')` (Figure \@ref(fig:basicviz)B).
```{r fig.width=3, fig.height=3, fig.align="center", eval=F}
ggtree(tree, color="firebrick", size=1, linetype="dotted")
```
By default, the tree is viewed in ladderize form, user can set the parameter `ladderize = FALSE` to disable it (Figure \@ref(fig:basicviz)C).
```{r fig.width=3, fig.height=3, fig.align="center", eval=F}
ggtree(tree, ladderize=FALSE)
```
The `branch.length` is used to scale the edge, user can set the parameter `branch.length = "none"` to only view the tree topology (cladogram, Figure \@ref(fig:basicviz)D) or other numerical variable to scale the tree (*e.g.* _d~N~/d~S~_, see also in [Chapter 5](#chapter5)).
```{r fig.width=3, fig.height=3, fig.align="center", eval=F}
ggtree(tree, branch.length="none")
```
(ref:basicvizscap) Basic tree visualization.
(ref:basicvizcap) **Basic tree visualization.**
```{r basicviz, fig.width=12, fig.height=3, echo=F, fig.cap="(ref:basicvizcap)", fig.scap="(ref:basicvizscap)"}
library("treeio")
library("ggtree")
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
cowplot::plot_grid(
ggplot(tree, aes(x, y)) + geom_tree() + theme_tree(),
ggtree(tree, color="firebrick", size=1, linetype="dotted"),
ggtree(tree, ladderize=FALSE),
ggtree(tree, branch.length="none"),
ncol=4, labels=LETTERS[1:4])
```
### Layouts of phylogenetic tree {#tree-layouts}
Viewing phylogenetic with ggtree is quite simple, just pass the tree object to *ggtree* function. We have developed several types of layouts for tree presentation (Figure \@ref(fig:layout)), including rectangular (by default), slanted, circular, fan, unrooted\index{unrooted} (equal angle and daylight methods), time-scaled and 2-dimensional layouts.
Here are examples of visualizing a tree with different layouts:
```{r eval=F}
library(ggtree)
set.seed(2017-02-16)
tree <- rtree(50)
ggtree(tree)
ggtree(tree, layout="slanted")
ggtree(tree, layout="circular")
ggtree(tree, layout="fan", open.angle=120)
ggtree(tree, layout="equal_angle")
ggtree(tree, layout="daylight")
ggtree(tree, branch.length='none')
ggtree(tree, branch.length='none', layout='circular')
ggtree(tree, layout="daylight", branch.length = 'none')
```
(ref:layoutscap) Tree layouts.
(ref:layoutcap) **Tree layouts.** Phylogram: rectangular layout (A), slanted layout (B), circular layout (C) and fan layout (D). Unrooted: equal-angle method (E) and daylight method (F). Cladogram: rectangular layout (G), circular layout (H) and unrooted layout (I). Slanted and fan layouts for cladogram are also supported.
```{r layout, echo=F, fig.width=8, fig.height=8, fig.cap="(ref:layoutcap)", fig.scap="(ref:layoutscap)", out.extra='', message=FALSE}
library(ggtree)
library(cowplot)
set.seed(2017-02-16)
tree <- rtree(50)
playout <- plot_grid(
ggtree(tree), # + ggtitle("rectangular layout")+theme(plot.title = element_text(hjust = 0.5)),
ggtree(tree, layout="slanted"), # + ggtitle("slanted layout")+theme(plot.title = element_text(hjust = 0.5)),
ggtree(tree, layout="circular"), # + ggtitle("circular layout")+theme(plot.title = element_text(hjust = 0.5), plot.margin = unit(c(0,0,0,0), "lines")),
ggtree(tree, layout="fan", open.angle=120), # + ggtitle("fan layout")+theme(plot.title = element_text(hjust = 0.5), plot.margin = unit(c(0,0,0,0), "lines")),
ggtree(tree, layout="equal_angle"),
ggtree(tree, layout="daylight"),
ggtree(tree, branch.length='none'),
ggtree(tree, branch.length='none', layout='circular'),
ggtree(tree, layout="daylight", branch.length = 'none'),
ncol=3, labels = LETTERS[1:9])
# save(playout, file="data/playout.rda")
playout
```
There are also other possible layouts that can be drawn by modifying
scales/coordination (Figure \@ref(fig:layout2)).
<!-- for examples, [reverse label of time
scale](https://github.com/GuangchuangYu/ggtree/issues/87), [repropotion
circular/fan tree](
https://groups.google.com/d/msg/bioc-ggtree/UoGQekWHIvw/ZswUUZKSGwAJ), *etc.*. -->
```{r eval=FALSE}
ggtree(tree) + scale_x_reverse()
ggtree(tree) + coord_flip()
ggtree(tree) + layout_dendrogram()
print(ggtree(tree), newpage=TRUE, vp=grid::viewport(angle=-30, width=.9, height=.9))
ggtree(tree, layout='slanted') + coord_flip()
ggtree(tree, layout='slanted', branch.length='none') + layout_dendrogram()
ggtree(tree, layout='circular') + xlim(-10, NA)
ggtree(tree) + scale_x_reverse() + coord_polar(theta='y')
ggtree(tree) + scale_x_reverse(limits=c(10, 0)) + coord_polar(theta='y')
```
```{r fig.keep='none', echo=FALSE, warning=FALSE}
tree_angle <- grid::grid.grabExpr(print(ggtree(tree), newpage=TRUE, vp = grid::viewport(angle=-30, width=.9, height=.9)))
```
(ref:layout2scap) Derived Tree layouts.
(ref:layout2cap) **Derived Tree layouts.** right-to-left rectangular layout (A), bottom-up rectangular layout (B), top-down rectangular layout (Dendrogram) (C), rotated rectangular layout (D), bottom-up slanted layout (E), top-down slanted layout (Cladogram) (F), circular layout (G), circular inward layout (H and I).
```{r layout2, fig.cap="(ref:layout2cap)", fig.scap="(ref:layout2scap)", fig.width=8, fig.height = 8, echo=FALSE, warning=FALSE}
plot_grid(
ggtree(tree) + scale_x_reverse(),
ggtree(tree) + coord_flip(),
ggtree(tree) + layout_dendrogram(),
tree_angle,
ggtree(tree, layout='slanted') + coord_flip(),
ggtree(tree, layout='slanted', branch.length='none') + layout_dendrogram(),
ggtree(tree, layout='circular') + xlim(-10, NA),
ggtree(tree) + scale_x_reverse() + coord_polar(theta='y'),
ggtree(tree) + scale_x_reverse(limits=c(15, 0)) + coord_polar(theta='y'),
ncol=3, labels=LETTERS[1:9])
```
**Phylogram.** Layouts of *rectangular*, *slanted*, *circular* and *fan* are supported to visualize phylogram\index{phylogram} (by default, with branch length scaled) as demonstrated in Figure \@ref(fig:layout)A, B, C and D.
**Unrooted layout.** Unrooted (also called 'radial') layout is supported by equal-angle and daylight algorithms, user can specify unrooted layout algorithm by passing "equal_angle" or "daylight" to `layout` parameter to visualize the tree. Equal-angle method was proposed by Christopher Meacham in *PLOTREE*, which was incorporated in *PHYLIP* [@retief_phylogenetic_2000]. This method starts from the root of the tree and allocates arcs of angle to each subtrees proportional to the number of tips in it. It iterates from root to tips and subdivides the angle allocated to a subtree into angles for its dependent subtrees. This method is fast and was implemented in many software packages. As shown in Figure \@ref(fig:layout)E, equal angle method has a drawback that tips are tend to be clustered together and leaving many spaces unused. The daylight method starts from an initial tree built by equal angle and iteratively improves it by successively going to each interior node and swinging subtrees so that the arcs of "daylight" are equal (Figure \@ref(fig:layout)F). This method was firstly implemented in *PAUP\** [@wilgenbusch_inferring_2003].
**Cladogram.** To visualize cladogram\index{cladogram} that without branch length scaling and only display the tree structure, `branch.length` is set to "none" and it works for all types of layouts (Figure \@ref(fig:layout)G, H and I).
**Time-scaled layout.** For time-scaled tree\index{time-scaled tree}, the most recent sampling date must be specified via the `mrsd` parameter and *ggtree* will scaled the tree by sampling (tip) and divergence (internal node) time, and a time scale axis will be displayed under the tree by default.
(ref:timescaledscap) Time-scaled layout.
(ref:timescaledcap) **Time-scaled layout.** The *x-axis* is the timescale (in units of year). The divergence time was inferred by *BEAST*\index{BEAST} using molecular clock model.
```{r timescaled, fig.cap="(ref:timescaledcap)", fig.scap="(ref:timescaledscap)", out.extra='', fig.height=4.5}
beast_file <- system.file("examples/MCC_FluA_H3.tree",
package="ggtree")
beast_tree <- read.beast(beast_file)
ggtree(beast_tree, mrsd="2013-01-01") + theme_tree2()
```
**Two-dimensional tree layout.** A two-dimensional tree\index{two-dimensional tree} is a projection of the phylogenetic tree in a space defined by the associated phenotype (numerical or categorical trait, on the *y*-axis) and tree branch scale (*e.g.*, evolutionary distance, divergent time, on the *x*-axis). The phenotype can be a measure of certain biological characteristics of the taxa and hypothetical ancestors in the tree. This is a new layout we proposed in *ggtree*, which is useful to track the virus phenotypes or other behaviors (*y*-axis) changing with the virus evolution (*x*-axis). In fact, the analysis of phenotypes or genotypes over evolutionary time have been widely used for study influenza virus evolution [@neher_prediction_2016], though such analysis diagrams are not tree-like, *i.e.*, no connection between data points, unlike our two-dimensional tree layout that connect data points with the corresponding tree branches. Therefore, this new layout we provided will make such data analysis easier and more scalable for large sequence data sets of influenza viruses.
In this example, we used the previous time-scaled tree of H3 human and swine influenza viruses (Figure \@ref(fig:timescaled); data published in [@liang_expansion_2014]) and scaled the *y*-axis based on the predicted *N*-linked glycosylation sites (NLG) for each of the taxon and ancestral sequences of hemagglutinin proteins. The NLG sites were predicted using *NetNGlyc* 1.0 Server^[<http://www.cbs.dtu.dk/services/NetNGlyc/>]. To scaled the *y*-axis, the parameter `yscale` in the `ggtree()` function is set to a numerical or categorical variable. If `yscale` is a categorical variable as in this example, users should specify how the categories are to be mapped to numerical values via the `yscale_mapping` variables.
(ref:2dscap) Two-dimensional tree layout.
(ref:2dcap) **Two-dimensional tree layout.** The trunk and other branches highlighted in red (for swine) and blue (for human). The *x*-axis is scaled to the branch length (in units of year) of the time-scaled tree. The *y*-axis is scaled to the node attribute variable, in this case the number of predicted *N*-linked glycosylation site (NLG) on the hemagglutinin protein. Colored circles indicate the different types of tree nodes. Note that nodes assigned the same *x*- (temporal) and *y*- (NLG) coordinates are superimposed in this representation and appear as one node, which is shaded based on the colors of all the nodes at that point.
```{r 2d, fig.cap="(ref:2dcap)", fig.width=8.5, fig.height=5, fig.scap="(ref:2dscap)", out.extra=''}
NAG_file <- system.file("examples/NAG_inHA1.txt", package="ggtree")
NAG.df <- read.table(NAG_file, sep="\t", header=FALSE,
stringsAsFactors = FALSE)
NAG <- NAG.df[,2]
names(NAG) <- NAG.df[,1]
## separate the tree by host species
tip <- get.tree(beast_tree)$tip.label
beast_tree <- groupOTU(beast_tree, tip[grep("Swine", tip)],
group_name = "host")
p <- ggtree(beast_tree, aes(color=host), mrsd="2013-01-01",
yscale = "label", yscale_mapping = NAG) +
theme_classic() + theme(legend.position='none') +
scale_color_manual(values=c("blue", "red"),
labels=c("human", "swine")) +
ylab("Number of predicted N-linked glycoslyation sites")
## (optional) add more annotations to help interpretation
p + geom_nodepoint(color="grey", size=3, alpha=.8) +
geom_rootpoint(color="black", size=3) +
geom_tippoint(size=3, alpha=.5) +
annotate("point", 1992, 5.6, size=3, color="black") +
annotate("point", 1992, 5.4, size=3, color="grey") +
annotate("point", 1991.6, 5.2, size=3, color="blue") +
annotate("point", 1992, 5.2, size=3, color="red") +
annotate("text", 1992.3, 5.6, hjust=0, size=4, label="Root node") +
annotate("text", 1992.3, 5.4, hjust=0, size=4,
label="Internal nodes") +
annotate("text", 1992.3, 5.2, hjust=0, size=4,
label="Tip nodes (blue: human; red: swine)")
```
As shown in Figure \@ref(fig:2d), two-dimensional tree good at visualizing the change of phenotype over the evolution in the phylogenetic tree. In this example, it is shown that H3 gene of human influenza A virus maintained high level of *N*-linked glycosylation sites (n=8 to 9) over last two decades and dropped significantly to 5 or 6 in a separate viral lineage transmitted to swine populations and established there. It was indeed hypothesized that the human influenza virus with high level of glycosylation on the viral hemagglutinin protein provides better shielding effect to protect the antigenic sites from exposure to the herd immunity, and thus has selective advantage in human populations that maintain high level of herd immunity against the circulating human influenza virus strains. For the viral lineage that newly jumped across the species barrier and transmitted to swine population, the shielding effect of the high-level surface glycan oppositely impose selective disadvantage because the receptor-binding domain may also be shielded which greatly affect the viral fitness of the lineage that newly adapted to a new host species. Another example of two-dimensional tree can be found on Figure \@ref(fig:phenogram).
## Displaying Tree Components
### Displaying tree scale (evolution distance) {#geom-trescale}
To show tree scale, user can use `geom_treescale()` layer (Figure \@ref(fig:treescale)A).
```{r fig.width=4, fig.height=4, fig.align="center", eval=F}
ggtree(tree) + geom_treescale()
```
`geom_treescale()` supports the following parameters:
+ *x* and *y* for tree scale position
+ *width* for the length of the tree scale
+ *fontsize* for the size of the text
+ *linesize* for the size of the line
+ *offset* for relative position of the line and the text
+ *color* for color of the tree scale
```{r eval=F}
ggtree(tree) + geom_treescale(x=0, y=45, width=1, color='red')
ggtree(tree) + geom_treescale(fontsize=6, linesize=2, offset=1)
```
We can also use `theme_tree2()` to display the tree scale by adding *x axis*.
```{r fig.width=3, fig.height=3, fig.align="center", eval=F}
ggtree(tree) + theme_tree2()
```
(ref:treescalescap) Display tree scale.
(ref:treescalecap) **Display tree scale.** `geom_treescale` automatically add a scale bar for evolutionary distance (A). Users can modify color, width and position of the scale (B) as well as size of the scale bar and text and their relative position (C). Another possible solution is to enable x-axis which is useful for time-scale tree (D).
```{r treescale, fig.width=8, fig.height=6, echo=F, fig.cap="(ref:treescalecap)", fig.scap="(ref:treescalescap)"}
cowplot::plot_grid(
ggtree(tree) + geom_treescale(),
ggtree(tree)+geom_treescale(x=0, y=45, width=1, color='red'),
ggtree(tree)+geom_treescale(fontsize=6, linesize=2, offset=1),
ggtree(tree) + theme_tree2(),
ncol=2, labels=LETTERS[1:4])
```
Tree scale is not restricted to evolution distance, `r Biocpkg('treeio')` can re-scale the tree with other numerical variable (details described in [session 2.3](#rescale-treeio)).
### Displaying nodes/tips {#geom-nodepoint}
Showing all the internal nodes and tips in the tree can be done by adding a layer of points using `geom_nodepoint`, `geom_tippoint` or `geom_point`.
```{r fig.width=3, fig.height=3, fig.align="center", eval=F}
ggtree(tree) + geom_point(aes(shape=isTip, color=isTip), size=3)
```
```{r fig.width=3, fig.height=3, fig.align="center", eval=F}
p <- ggtree(tree) + geom_nodepoint(color="#b5e521", alpha=1/4, size=10)
p + geom_tippoint(color="#FDAC4F", shape=8, size=3)
```
(ref:nodeTipscap) Display external and internal nodes.
(ref:nodeTipcap) **Display external and internal nodes.** `geom_point` automatically add symbolic points of all nodes (A). `geom_nodepoint` adds symbolic points for internal nodes and `geom_tippoint` adds symbolic points for external nodes (B).
```{r nodeTip, fig.width=8, fig.height=3, echo=F, fig.cap="(ref:nodeTipcap)", fig.scap="(ref:nodeTipscap)"}
p1 <- ggtree(tree) + geom_point(aes(shape=isTip, color=isTip), size=3)
p <- ggtree(tree) + geom_nodepoint(color="#b5e521", alpha=1/4, size=10)
p2 <- p + geom_tippoint(color="#FDAC4F", shape=8, size=3)
plot_grid(p1, p2, ncol=2, labels = LETTERS[1:2])
```
### Displaying labels
Users can use `geom_text` or `geom_label` to display the node (if available) and tip labels simultaneously or `geom_tiplab` to only display tip labels (Figure \@ref(fig:tiplab)A).
```{r fig.width=3, fig.height=3, warning=FALSE, fig.align="center", eval=FALSE}
p + geom_tiplab(size=3, color="purple")
```
`geom_tiplab` not only supports using *text* or *label* geom to display labels,
it also supports *image* geom to label tip with image files (see [Chapter 7](#chapter7)). A corresponding
geom, `geom_nodelab` is also provided for displaying node labels.
For *circular* and *unrooted* layout, `r Biocpkg('ggtree')` supports rotating node labels according to the angles of the branches (Figure \@ref(fig:tiplab)B).
```{r fig.width=6, fig.height=6, warning=FALSE, fig.align="center", eval=FALSE}
ggtree(tree, layout="circular") + geom_tiplab(aes(angle=angle), color='blue')
```
(ref:tiplabscap) Display tip labels.
(ref:tiplabcap) **Display tip labels.** `geom_tiplab` supports displaying tip labels (A). For circular, fan or unrooted tree layout, the labels can be rotated to fit the angle of the branches (B).
```{r tiplab, fig.width=9, fig.height=6, echo=F, fig.cap="(ref:nodeTipcap)", fig.scap="(ref:nodeTipscap)"}
p1 <- p + geom_tiplab(size=3, color="purple")
p2 <- ggtree(tree, layout="circular") + geom_tiplab(aes(angle=angle), color='blue')
plot_grid(p1, p2, ncol=2, labels = LETTERS[1:2], rel_widths=c(1,2))
```
By default, the positions to display text are based on the node positions, we can change them to based on the middle of the branch/edge (by setting `aes(x = branch)`), which is very useful when annotating transition from parent node to child node.
### Displaying root edge
`r Biocpkg("ggtree")` doesn't plot root edge by default. Users can use `geom_rootedge()` to automatically display root edge (Figure \@ref(fig:rootedge)A). If there is no root edge information, `geom_rootedge()` will display nothing by default (Figure \@ref(fig:rootedge)B). Users can set the root edge to the tree (Figure \@ref(fig:rootedge)C) or specify `rootedge` in `geom_rootedge()` (Figure \@ref(fig:rootedge)D).
```{r eval=F}
## with root edge = 1
tree1 <- read.tree(text='((A:1,B:2):3,C:2):1;')
ggtree(tree1) + geom_tiplab() + geom_rootedge()
## without root edge
tree2 <- read.tree(text='((A:1,B:2):3,C:2);')
ggtree(tree2) + geom_tiplab() + geom_rootedge()
## setting root edge
tree2$root.edge <- 2
ggtree(tree2) + geom_tiplab() + geom_rootedge()
## specify length of root edge for just plotting
## this will ignore tree$root.edge
ggtree(tree2) + geom_tiplab() + geom_rootedge(rootedge = 3)
```
(ref:rootedgescap) Display root edge.
(ref:rootedgecap) **Display root edge.** `geom_rootedge` supports displaying root edge if the root edge was presented (A). It shows nothing if there is no root edge (B). In this case, users can manually setting root edge for the tree (C) or just specify the length of root for plotting (D).
```{r rootedge, fig.width=6, fig.height=4, echo=F, fig.cap="(ref:rootedgecap)", fig.scap="(ref:rootedgescap)"}
## with root edge = 1
tree1 <- read.tree(text='((A:1,B:2):3,C:2):1;')
p1 = ggtree(tree1) + geom_tiplab() + geom_rootedge()
## without root edge
tree2 <- read.tree(text='((A:1,B:2):3,C:2);')
p2 = ggtree(tree2) + geom_tiplab() + geom_rootedge()
## setting root edge
tree2$root.edge <- 2
p3 = ggtree(tree2) + geom_tiplab() + geom_rootedge()
## specify length of root edge for just plotting
## this will ignore tree$root.edge
p4 = ggtree(tree2) + geom_tiplab() + geom_rootedge(rootedge = 3)
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels = LETTERS[1:4])
```
### Color tree
In `ggtree`, coloring phylogenetic tree is easy, by using `aes(color=VAR)` to map the color of tree based on a specific variable (numeric and category are both supported).
(ref:colortreescap) Color tree by continuous or discrete feature.
(ref:colortreecap) **Color tree by continuous or discrete feature.** Edges are colored by values that associated with child node.
```{r colortree, fig.width=5, fig.height=5, fig.cap="(ref:colortreecap)", fig.scap="(ref:colortreescap)"}
ggtree(beast_tree, aes(color=rate)) +
scale_color_continuous(low='darkgreen', high='red') +
theme(legend.position="right")
```
User can use any feature (if available), including clade posterior and _d~N~/d~S~_ _etc._, to scale the color of the tree. If the feature is continuous numerical value, `r Biocpkg("ggtree")` provides a `continuous` parameter to support plotting continuous state transition in edges. Here, we use the example provided in <http://www.phytools.org/eqg2015/asr.html> to demonstrate this functionality.
(ref:continuousColorscap) Continuous state transition in edges.
(ref:continuousColorcap) **Continuous state transition in edges.** Edges are colored by the values from ancestral trait to offspring.
```{r continuousColor, fig.width=6, fig.height=6, fig.cap="(ref:continuousColorcap)", fig.scap="(ref:continuousColorscap)"}
anole.tree<-read.tree("http://www.phytools.org/eqg2015/data/anole.tre")
svl <- read.csv("http://www.phytools.org/eqg2015/data/svl.csv",
row.names=1)
svl <- as.matrix(svl)[,1]
fit <- phytools::fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE)
td <- data.frame(node = nodeid(anole.tree, names(svl)),
trait = svl)
nd <- data.frame(node = names(fit$ace), trait = fit$ace)
d <- rbind(td, nd)
d$node <- as.numeric(d$node)
tree <- full_join(anole.tree, d, by = 'node')
ggtree(tree, aes(color=trait), layout = 'circular',
ladderize = FALSE, continuous = TRUE, size=2) +
scale_color_gradientn(colours=c("red", 'orange', 'green', 'cyan', 'blue')) +
geom_tiplab(hjust = -.1) + xlim(0, 1.2) + theme(legend.position = c(.05, .85))
```
Besides, we can use two-dimensional tree (as demonstrated in Figure \@ref(fig:2d)) to visualize phenotype on the vertical dimension to create the phenogram. We can use `r CRANpkg("ggrepel")` package to repel tip labels to avoid overlapping as demonstrated in Figure \@ref(fig:repelTip).
(ref:phenogramscap) Phenogram.
(ref:phenogramcap) **Phenogram.** Projecting the tree into a space defined by time (or genetic distance) on the horizontal axis and phenotype on the vertical dimension.
```{r phenogram, fig.width=6, fig.height=8, fig.cap="(ref:phenogramcap)", fig.scap="(ref:phenogramscap)"}
ggtree(tree, aes(color=trait), continuous = TRUE, yscale = "trait") +
scale_color_viridis_c() + theme_minimal()
```
### Rescale tree
Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site). In `ggtree`, users can re-scale a phylogenetic tree by any numerical variable inferred by evolutionary analysis (*e.g.* _d~N~/d~S~_).
This example displays a time tree (Figure \@ref(fig:rescaleTree)A) and the branches were rescaled by substitution rate inferred by BEAST (Figure \@ref(fig:rescaleTree)B).
```{r rescaleBeast, fig.width=10, fig.height=5}
library("treeio")
beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)
beast_tree
p1 <- ggtree(beast_tree, mrsd='2013-01-01') + theme_tree2() +
labs(caption="Divergence time")
p2 <- ggtree(beast_tree, branch.length='rate') + theme_tree2() +
labs(caption="Substitution rate")
```
The following example draw a tree inferred by CodeML (Figure \@ref(fig:rescaleTree)C), and the branches can be rescaled by using _d~N~/d~S~_ values (Figure \@ref(fig:rescaleTree)D).
```{r rescaleCodeml, fig.width=10, fig.height=5}
mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="treeio")
mlc_tree <- read.codeml_mlc(mlcfile)
p3 <- ggtree(mlc_tree) + theme_tree2() +
labs(caption="nucleotide substitutions per codon")
p4 <- ggtree(mlc_tree, branch.length='dN_vs_dS') + theme_tree2() +
labs(caption="dN/dS tree")
```
(ref:rescaleTreescap) Rescale tree branches.
(ref:rescaleTreecap) **Rescale tree branches.** A time scaled tree inferred by BEAST (A) and its branches were rescaled by substitution rate (B). A tree inferred by CodeML (C) and the branches were rescaled by _d~N~/d~S~_ values (D).
```{r rescaleTree, fig.width=8, fig.height=6, echo=FALSE,fig.cap="(ref:rescaleTreecap)", fig.scap="(ref:rescaleTreescap)"}
plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])
```
In addition to specify `branch.length` in tree visualization, users can change branch length stored in tree object by using `rescale_tree` function, and the following command will display a tree that is identical to (Figure \@ref(fig:rescaleTree)B).
```{r rescaleBeast2, eval=F}
beast_tree2 <- rescale_tree(beast_tree, branch.length='rate')
ggtree(beast_tree2) + theme_tree2()
```
### Modify compenents of a theme
`theme_tree()` defined a totally blank canvas, while _`theme_tree2()`_ adds
phylogenetic distance (via x-axis). These two themes all accept a parameter of
`bgcolor` that defined the background color. Users can use any [theme components](http://ggplot2.tidyverse.org/reference/theme.html) to the `theme_tree()` or `theme_tree2()` functions to modify them.
```{r eval=F}
set.seed(2019)
x <- rtree(30)
ggtree(x, color="red") + theme_tree("steelblue")
ggtree(x, color="white") + theme_tree("black")
```
(ref:themescap) ggtree theme.
(ref:themecap) **ggtree theme.**
```{r theme, fig.width=8, fig.height=3, fig.align="center", echo=FALSE,fig.cap="(ref:themecap)", fig.scap="(ref:themescap)"}
library(ggimage)
set.seed(2019)
x <- rtree(30)
cowplot::plot_grid(
ggtree(x, color="red") + theme_tree("steelblue"),
ggtree(x, color="purple") + theme_tree("black"),
ncol=2)
```
Users can also use image file as tree background, see example in [Appendix B](#ggimage-bgimage).
## Visualize a list of trees
`ggtree` supports `multiPhylo` object and a list of trees can be viewed simultaneously.
(ref:multiPhyloscap) Visuazlie multiPhylo object.
(ref:multiPhylocap) **Visuazlie multiPhylo object.**
```{r mutiPhylo, fig.width=12, fig.height=4,fig.cap="(ref:multiPhylocap)", fig.scap="(ref:multiPhyloscap)"}
trees <- lapply(c(10, 20, 40), rtree)
class(trees) <- "multiPhylo"
ggtree(trees) + facet_wrap(~.id, scale="free") + geom_tiplab()
```
One hundred bootstrap trees can also be view simultaneously.
(ref:bp100scap) Visualize one hundred bootstrap trees.
(ref:bp100cap) **Visualize one hundred bootstrap trees.**
```{r bp100, fig.width=20, fig.height=20,fig.cap="(ref:bp100cap)", fig.scap="(ref:bp100scap)"}
btrees <- read.tree(system.file("extdata/RAxML", "RAxML_bootstrap.H3", package="treeio"))
ggtree(btrees) + facet_wrap(~.id, ncol=10)
```
Another way to view the bootstrap trees is to merge them together to form a density tree using `ggdensitree` function.
(ref:densiTreescap) DensiTree.
(ref:densiTreecap) **DensiTree.**
```{r densiTree, fig.width=12, fig.height=7.5, fig.cap="(ref:densiTreecap)", fig.scap="(ref:densiTreescap)"}
ggdensitree(btrees, alpha=.3, colour='steelblue') +
geom_tiplab(size=3) + xlim(0, 45)
```
## Summary
Visualizing phylogenetic tree using `r Biocpkg("ggtree")` is easy by using a single command `ggtree(tree)`. The `r Biocpkg("ggtree")` package provides several geometric layers to display tree components such as tip labels, symbolic points for both external and internal nodes, root edge, *etc*. Associated data can be used to rescale branch lengths, color the tree and display on the tree. All these can be done by the ggplot2 grammar of graphic syntax. `r Biocpkg("ggtree")` also provides several layers that are specifically designed for tree annotation (see [Chapter 5](#chapter5)).