The ggtreeExtra package implemented a layer function, geom_fruit, which is a universal function for aligning graphic layer to phylogenetic tree (Supplemental Fig. S1, S2 and Tab S1). It can internally reorder associated data based on the structure of a phylogenetic tree, visualize the data using specific geometric layer function with user provided aesthetic mapping and non-variable setting and the graphic layer will be displayed with the tree side by side (i.e., right hand side for rectangular layout or external ring for circular layout, Supplemental Fig. S3) with perfect alignment. Different data graph layers can be progressively added to a tree. For example, geom_fruit is able to display a heatmap and a bar plot to the outer rings of an annotated phylogenetic tree to compare microbial abundance across different body sites of human (Supplemental Fig. S7). These two layers aligned automatically to the circular phylogenetic tree and were displayed on different external rings. The number of external rings is not strictly limited and user is free to visualize a number of associated data sets using different geometric layers on different external rings. Each data set is visualized on an independent ring layer and multiple ring layers are stacked on the circular phylogenetic tree. This makes the ggtreeExtra package particularly useful for layering different data sets to create highly informative tree graphics. For example, multiple heatmap and bar chart layers were compactly displayed on the circular tree to represent the status of gene, metabolic capacity and genome size of 963 bacteria and archaea species (Supplemental Fig. S8).
Unlike other tools, ggtreeExtra was developed based on grammar of graphic14 and allows user to map variables of associated data to visual attributes of the outer ring graphic layer with high level of abstraction (Supplemental Fig. S3, S4 and S7). The geometric layers defined in ggplot215 and its extensions can be used in the geom_fruit function. For example, the geom_phylopic implemented by the ggimage package can be used to overlay silhouette images on the external layers to compare morphological characteristics with other attributes (e.g. taxonomy order, dietary preferences and environmental variables) (Supplemental Fig. S5B). With this feature, more data types and visualization methods are supported in ggtreeExtra compared to other tools, since there are a lot of geometric layers developed by the ggplot2 community (Supplemental Tabel S1, S2 and Fig. S2). For instance, taxon-specific infographics can be added as insets in ggtreeExtrra using the geom_plot layer provided by the ggpmics package (Supplemental Fig. S5A). The ggtreeExtra package makes no assumption about user data. Given a suitable geometric layer, ggtreeExtra can be used to incorporate and visualize any kind of information with a tree. This unique feature guarantees versatility of ggtreeExtra and makes it easy to represent heterogeneous data from different disciplines.
One unique advantage of circular layout is to create chord diagram to reveal complex relationships. Couple with the inward circular tree layout supported by ggtree11, ggtreeExtra allows displaying flows or connections between taxa, such as syntenic linkage among genes and genomes and reticulate evolutionary relationships including horizontal gene transfer, hybridization and interspecific recombination. This makes ggtreeExtra an ideal tool for exploring relationships or interactions between taxa in a compact way and it is extremely powerful and uniquely suitable for microbiome research to present microbial correlation or interaction network with phylogenetic tree and other associated data. To demonstrate this unique feature, we used ggtreeExtra and ggtree11 to integrate and visualize several data sets from Arabidopsis leaf microbiome16 on the phylogenetic tree, including directional interactions among different bacteria strains, number of target genes, strain abundance, taxonomy information and the biosynthetic potential of the isolates. The phylogenetic tree was visualized using inward circular layout and the interaction data was visualized as a chord diagram that connecting corresponding isolates of the tree leaves. Other information was displayed as stacked bar chart, heatmaps and symbolic points on the tree (Fig. 1). With ggtreeExtra incorporating all these information, some of the evolutionary patterns that are not straightforward becomes more obvious. We found that inhibitor interactions are more widely found at strains from Firmicutes and Grammarproteobacteria, whereas, strains from Alphaproteobacteria and Betaproteobacteria prefer sensitivity interactions. These strains might have more BGCs (biosynthesis gene clusters) of ribosomal synthesized and post-translationally modified peptide. To our knowledge, there is no other software tools that can easily produce a figure like this and the visualization do help us to explore the data and generate new insights as our findings were not revealed in the original paper17.
The ggtreeExtra is a sub-package of the ggtree package suite and takes all the advantages of other ggtree sub-packages. Phylogenetic data imported by the treeio18 package can be used in ggtreeExtra. This allows evolutionary inferences (e.g. clade support, molecular dating and selection pressure) from commonly used software to be linked to other associated data (e.g. observational and experimental data) for integrative and comparative study (Supplemental Fig. S6). Tree data can be processed using the tidytree package and the phylogenetic tree visualized by ggtree with fully annotation can be further annotated in ggtreeExtra with data layers especially in circular layout (Supplemental Fig. S5-S8 and Fig. 1). The ggtreeExtra package extends the capabilities of ggtree and fully supports grammar of graphics implemented in ggplot215 (Supplemental Fig. S1). It supports aesthetic mapping (Supplemental Fig. S3-6) and a layered grammar of graphics (Supplemental Fig. S7-9). User can use scale functions to specify how the data was mapped to visual values and theme functions to adjust graphic appearance (Supplemental Fig. S3-9). Moreover, it takes all benefits of the ggplot2 community. Geometric layers defined in ggplot2 and other extension packages can be used in ggtreeExtra to visualize tree data (Supplemental Table S2 and Fig. S2-5). We proposed and implemented this framework design originally in ggtree10 and ggtreeExtra fully embrace the design concept. This is the beauty of the ggtree and ggtreeExtra and lays the foundation for displaying tree annotated data layers. It allows ggtreeExtra to support more visualization methods and has no assumption of the input data types (Supplemental Table S1 and S2). As the ggplot2 community keeps expanding, there will be more methods implemented which can be employed to create tree data layer in ggtreeExtra. Furthermore, the combination of these methods allows ggtreeExtra to create more possibilities than other tools to integrate more diverse data sets for novel exploratory data analysis (Fig. 1 and Supplemental Fig. S2). Therefore, it has more potential to reveal systematic patterns and insights of our data than other tools. The versatility of this package ensure its applications in different research areas such as population genetics, molecular epidemiology and microbiome.