The complex role of the gut microbiota in shaping human health and disease has been intensely investigated and explored in recent years, largely due to the availability of culture-independent molecular-based high-throughput sequencing technologies. It is estimated that every human host coexists with an average of 500-1000 different bacterial species [1–3] and research has discovered that the microbiome is associated with host lifestyle and diet [4, 5] as well as many diseases such as obesity, type 2 diabetes [6] and cancer [7]. New sequencing technology brings not only more data and capacity for microbiome research, but also new challenges for data analytics and interpretation. Improved tools and methods for microbiome data analytics can enhance our ability to understand the roles of microbes in diverse environments, particularly understanding how they interact with each other as well as their human hosts.
Current microbiome analysis typically consists of two important components: upstream community profiling (e.g. what is the abundance of all microbes in each sample?) and downstream high-level analysis (e.g. alpha/beta diversity analysis, differential abundance analysis) [8]. In recent years, evolving data analytics, visualization, and machine learning methods have been gradually applied to the development of many software tools and web servers for microbiome data analysis covering these two components. [9–13]. However, new techniques and sequencing technologies have steepened the learning curve for scientific researchers applying new methods for microbiome data analysis and interpretation [14]. Furthermore, existing tools are mostly dedicated to one aspect of analysis and/or are restricted to analyzing one type of microbiome data. For example, while there are many tools and workflows for analyzing 16S rRNA data, there are no existing tools and pipelines tailored for comprehensively addressing the analytical needs of RNA-based metatranscriptomics.
Table 1 gives a summary of the functions of these tools with respect to the analysis needs of microbiome data. For marker gene-based data such as 16S rRNA, QIIME II [15] and Mothur [16] provide a user interface and a plethora of analytic and visualization tools, but do not provide support for metagenomic and metatranscriptomic data. Vegan [17] provides a wide variety of functions for metagenomic data visualization, but lacks a user-interface, and tools for host and microbial read alignment, differential expression, etc. BioBakery [18] provides a comprehensive suite of tools for most metagenomic analysis needs for microbial communities, but relies on a small set of markers to identify species, and does not address host or microbial expression. Microbiome helper [19] is a collection of scripts in multiple languages to facilitate interaction and interoperability microbiome and metagenomic tools, but does not provide interactive visualizations or a graphical user interface. The microbiome package in R [20] provides command-line workflows for a wide variety of the metagenomic data analysis tasks. Phyloseq [21] has a Shiny interface with tools for annotation, visualization, and diversity analysis, but does not provide abundance analysis, and is no longer actively maintained by its developers. Metavizr [22] provides an interface and suite of functions for specific metagenomic visualizations. None of these methods are comprehensive or specifically address the needs for multiple types of 16S rRNA, metagenomic or metatranscriptomic data. Therefore, there are no existing toolkits that contain a complete workflow for microbiome data analysis and interpretation (with or without a graphical user interface).
Here we present animalcules, an interactive analysis and visualization toolkit for microbiome data. animalcules supports the importing of microbiome profiles in multiple formats such as a species count table, an organizational taxonomic unit (OTU) or amplicon sequence variants (ASV) counts table, or Biological Observation Matrix (BIOM) format [23]. These formats could be generated from common microbiome data sources and analytical tools including 16S rRNA, metagenomics, and metatranscriptomic data. Once data is uploaded, animalcules provides a useful data summary and filtering function where users can view and filter their dataset using sample metadata, microbial prevalence or relative abundance. Filtering the data in this way can significantly reduce the time spent performing preprocessing and downstream analysis tasks. For data visualizations, such as relative abundance bar charts and 3D dimension reduction plots (PCA/PCoA/tSNE/UMAP), animalcules supports interactive operations where users can check the sample/microbe information on each data point and adjust the figure format as needed, which is helpful for recognizing elements or data patterns when the sample size or number of microbes is large. Aside from common diversity analysis, differential abundance analysis, and dimension reduction, animalcules supports biomarker identification by training a logistic regression or random forest model with cross-validated biomarker performance evaluation. animalcules provides a graphical user interface (GUI) through R/Shiny, which can be used even by users without prior programming knowledge, while experienced programmers can choose the command-line based R package or a combination of both.
Table 1. Comparison of animalcules and other popular microbiome analysis tools.
|
Biobakery
|
Vegan
|
mothur
|
microbiome
|
Metavizr
|
Microbiome helper
|
Qiime2
|
Phyloseq
|
animalcules
|
Filtering and Data Summary
|
✔
|
|
✔
|
✔
|
|
✔
|
✔
|
✔
|
✔
|
Interactive Visualization
|
|
|
|
|
✔
|
|
✔
|
|
✔
|
Dimension Reduction
|
|
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
Differential Abundance Analysis
|
✔
|
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
Diversity Analysis
|
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
Support for 16S rRNA Data
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
✔
|
Support for total RNA-seq Data
|
✔
|
✔
|
|
|
|
|
|
|
✔
|
Biomarker Identification
|
|
|
|
|
|
|
✔
|
|
✔
|
Interface and Command-Line
|
|
|
|
|
|
✔
|
✔
|
|
✔
|
Language/Platform
|
Python
|
R
|
R/Web
|
R
|
R
|
R/Python/Perl
|
Python
|
R
|
R
|
|
|
|
|
|
|
|
|
|
|
|