The iDOM package is written in the R scientific computing language and relies on the R packages “vegan”, “iCAMP”, “SpiecEasi”, and “ftmsRanalysis”. The currently implemented functions are listed in Table 1. The functions in iDOM are designed to process complex matrices, including molecular composition data and molecular trait data of DOM assemblages. Additionally, these functions could integrate other types of data that affect DOM, such as environmental variables and microbial data (Fig. 1).
Table 1
The functions of R package “iDOM”
Type | Function | Description |
Molecular properties and class assignment | molTrait () | Computes various molecular traits, such as molecular weight, stoichiometry, chemical structure, energy content, and oxidation state |
molTrans () | Estimates putative biochemical transformations for each molecule |
molGroup () | Partitions DOM molecules into four fractions based on two orthogonal trait dimensions of molecular reactivity and activity: labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive |
Diversity and dissimilarity of DOM | commTD () | Calculates selected types of taxonomic diversity and evenness measures |
commFD () | Calculates selected types of functional diversity measures, such as Rao’s quadratic entropy |
commDendro () | Generates three relational metabolite dendrograms based on molecular traits and putative biochemical transformations |
commDD () | Calculates selected types of dendrogram-based diversity measures (Based on metabolite dendrograms) |
commDis () | Calculates the differences in DOM compositions between samples |
Community assembly of DOM | commProc() | Assesses how deterministic and stochastic processes influence DOM assemblages |
The effect of DOM dark matter | iDME () | Assesses the effect of dark matter on DOM assemblages (Based on intraspecific interactions) |
Microbial mechanisms | H2 () | Calculates the network-level specialization in DOM-microbe bipartite networks (Based on interspecific interactions) |
Visualization | plotVK () | Generates van Krevelen diagrams using the O/C and H/C ratios of elemental formulas |
plotRA () | Plots the relative abundance of different molecular groups |
These functions of iDOM could be grouped into four aims. The first aim is to use molecular compositional data and trait data to describe molecular traits, classify groups of molecules based on their traits, and calculate the relative abundance of these groups. The second aim is to integrate environmental variables to describe the distribution of diversity and dissimilarity and explain the community assembly of DOM molecules along environmental gradients or across spatial scales. The third aim is to incorporate unknown molecular data to assess the effect of DOM dark matter on whole DOM assemblages. The fourth aim is to include the microbial data to evaluate the DOM-microbe associations and further the microbial mechanisms influencing DOM molecules production and degradation.
Datasets
The iDOM package provides example datasets of DOM under experimental warming. These datasets are derived from a laboratory microcosm experiment using sterilized Taihu Lake sediments as the organic carbon source, with distinct microbial communities inoculated from China’s lake sediments in subtropical and temperate climate zones, respectively (Hu et al. 2024). The microcosms were incubated in the dark for one month at six different temperature levels (5, 10, 15, 20, 25, and 30°C), with each temperature treatment replicated three times, resulting in a total of 36 samples across two climate zones.
Five example datasets are used: mol.data, mol.trait, envi, mol.dark.matter, and micro.data. The datasets mol.data and mol.trait include the intensities of 5,474 assigned molecular formulae (referred to as “molecules” hereafter) from a total of 11,253 peaks and their corresponding molecular traits across 36 samples. The dataset envi contains experimental variables, such as incubation temperature, providing meta-information to enrich the analysis under varied environmental conditions. The dataset mol.dark.matter comprises the intensities of 5,779 uncharacterized molecules, offering additional insight into the molecular composition of DOM. The dataset micro.data contains the relative abundance of 463 bacterial genera across the samples and can be used to investigate the interactions between DOM and microbes.
Examples of FTICR-MS datasets
Calculation of molecular traits and class assignment of molecules
The R package iDOM provides the molTrait function to calculate molecular properties, the molTrans function to estimate putative biochemical transformations, and the molGroup function to classify groups of DOM molecules based on these traits. The function molTrait can calculate chemical characteristics of molecules related to molecular weight, stoichiometry, chemical structure, and oxidation state (Table S1). These traits are mass, the number of carbon atoms (C), Kendrick Defect (kdefectCH2), O/C ratio, H/C ratio, N/C ratio, P/C ratio, S/C ratio, the modified aromaticity index (AImod), double bond equivalent (DBE), DBE minus oxygen (DBEO), DBE minus AI (DBEAI), standard Gibbs Free Energy (GFE), nominal oxidation state of carbon (NOSC), and carbon use efficiency (Ymet) (Hughey et al. 2001, Koch and Dittmar 2006, LaRowe and Van Cappellen 2011, Koch and Dittmar 2016, Song et al. 2020). Furthermore, the function molTrans estimates putative biochemical transformations for each molecule identified by aligning mass differences to a database of known transformations (Danczak et al. 2020).
The molGroup function can classify DOM assemblages into different groups based on various molecular traits and graphical methods, such as molecular properties, putative biochemical transformations, and Van Krevelen diagrams (Kim et al. 2003, Danczak et al. 2020). For instance, the assigned molecules can be classified into the CHO, CHON, CHOS, CHOP, CHONS, CHONP, CHOSP, and CHONSP formula groups based on the composition of molecular elements. Each molecule aligned on the Van Krevelen diagrams can be correlated to specific natural biomolecules (Kim et al. 2003). Molecules in different regions of the diagram can be categorized into distinct classes, such as lipids, proteins, amino sugars, carbohydrates, unsaturated hydrocarbons, condensed aromatics, lignin, and tannins (Sleighter and Hatcher 2007, Hockaday et al. 2009). Recently, a new method is developed to divide molecules into four fractions based on molecular H/C ratio and putative biochemical transformations, which indicate molecular reactivity and activity, respectively (Hu et al. 2022b). The four fractions are labile-active (H/C ≥ 1.5, transformations > 10), recalcitrant-active (H/C < 1.5, transformations > 10), recalcitrant-inactive (H/C < 1.5, transformations ≤ 1), and labile-inactive (H/C ≥ 1.5, transformations ≤ 1).
The function molGroup revealed that CHO and CHON groups consistently exhibited higher relative abundances compared to other formula groups across the temperature gradient from 5℃ to 30℃. Additionally, condensed aromatics and lignin consistently showed dominance throughout the temperature range (Fig. 2a). The function molGroup further classified 4,150 out of 5,474 molecules into four fractions based on molecular reactivity and activity: labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive. The molecular activity could provide new insights in addition to molecular reactivity, which is supported by the overall overlap between active and inactive molecules with an H/C ratio above or below 1.5 in the Van Krevelen diagram (Fig. 2b). Compared to active molecules, the inactive molecules showed lower relative abundance in both labile and recalcitrant fractions (Fig. S2).
Diversity and dissimilarity of DOM
The R package iDOM provides the commTD, commFD, and commDD functions to calculate within-assemblage diversity, and the chemoDis function to assess between-assemblage compositional differences of DOM molecules. To apply diversity metrics, originally designed for ecological species, to molecular data, individual compounds are treated as species, with the relative intensities of their peaks representing species abundance. The function commTD calculates taxonomic diversity using the most common indices of α-diversity and evenness, including molecular richness, which was based on the number of molecular formulas; the abundance-based diversity metrics such as Shannon, Gini-Simpson (or Simpson’s index), or the Chao 1 indices (Kellerman et al. 2014, Li et al. 2018), which were based on the molecular richness and the molecular relative intensity. The function commFD calculates functional diversity based on molecular traits using Rao’s quadratic entropy, which measures the average abundance-weighted trait-based difference between any two molecules in a community. Greater differences in traits between any two individuals in a community result in higher quadratic entropy (Mentges et al. 2017, Tanentzap et al. 2019).
To further understand the relationships among molecules, the function commDendro generates relational molecular dendrograms, analogous to phylogeny trees, based on molecular properties and putative biochemical transformations. These dendrograms include the molecular characteristics dendrogram (MCD), transformation-based dendrograms (TD), and transformation-weighted characteristics dendrogram (TWCD), representing shared and divergent molecular traits among molecules. After generating molecular dendrograms, the function commDD calculates dendrogram-based diversity measurements, including dendrogram diversity (DD), mean pairwise distance (MPD), and mean nearest taxon distance (MNTD). The DD quantifies the total dendrogram branch length occupied by a given molecular assemblage, analogous to Faith’s Phylogenetic Diversity (Faith 1992). Higher DD values indicate molecular assemblages that span a broader range of molecular properties (MCD), a more extensive biochemical transformation network (TD), or both (TWCD). MPD determines the average dendrogram distance between molecules, while MNTD determines the average dendrogram distance between nearest neighbors (Danczak et al. 2020).
As a complement to alpha-diversity, the function commDis compares the differences in DOM compositions between samples by generating a dissimilarity matrix. The dissimilarity metrics include incidence-based Jaccard, abundance-based Bray-Curtis, and dendrogram-based UniFrac dissimilarity. For each dissimilarity matrix, non-metric multidimensional scaling (NMDS) and principal coordinate analysis (PCoA) could be subsequently employed to visually depict the relationships among samples based on the first two major axes of variation.
For the illustrated dataset, the functions commTD, commFD and commDD revealed that DOM molecular diversity showed different correlations with experimental temperature in temperate and subtropical regions. The molecular richness significantly decreased with rising temperature in the temperate region (P < 0.05), while there was no significant change in the subtropical region (P > 0.05). Meanwhile, RaoQ and MNTD significantly decreased with rising temperature in the subtropical region (P < 0.05), whereas they had nonsignificant change in the temperate region (P > 0.05). Further, the function commDis revealed that the molecular composition showed similarity in the subtropical and temperate regions, as indicated by Permutational Multivariate Analysis of Variance (PERMANOVA, P > 0.05) (Fig. 3d).
Community assembly of DOM assemblages
The assembly processes underlying molecular assemblages could be quantified based on dendrogram-based β-diversity null modeling (Danczak et al. 2020, Hu et al. 2022b). The function commProc quantifies the relative influences of deterministic and stochastic processes governing the assembly of DOM assemblages. The function calculates the dendrogram-based β-nearest taxon index (βNTI) to quantify tip-level clustering or overdispersion of a molecular dendrogram, analogous to its application in phylogenetic trees in ecological communities (Stegen et al. 2012, Wang et al. 2013). The βNTI is calculated by comparing the observed β-mean nearest taxon distance (βMNTD) between pairs of local DOM assemblages to a null expectation generated by randomizing observed dendrogram associations (Danczak et al. 2020). When the comparison between two DOM assemblages significantly deviates from the null expectation (|βNTI| > 2), deterministic processes are likely responsible for the observed pattern. Deterministic processes could lead to a pattern of divergent molecular composition across local assemblages via “variable selection” (βNTI > 2) or convergent molecular composition via “homogeneous selection” (βNTI < -2) (Stegen et al. 2015). Conversely, if the pairwise comparison instead mirrors the null expectation (|βNTI| < 2), stochastic processes are likely responsible for the observed differences.
Based on molecular incidence data and the relevant trait-based dendrograms, we applied the function commProc to quantify the relative influences of deterministic and stochastic processes governing the assembly of DOM assemblages. Most of the |βNTI| values for MCD, TD, and TWCD larger than 2, indicating that deterministic processes predominantly governed the molecular assembly (Fig. 4a). To further understand the assembly mechanisms of different DOM fractions compared to whole DOM assemblages, we applied the molGroup function to partition the DOM composition into labile-active, recalcitrant-active, recalcitrant-inactive, and labile-inactive fractions based on molecular trait dimensions of reactivity and activity (Fig. 4b). In both subtropical and temperate regions, deterministic processes caused by variable selection dominated the assembly of labile or recalcitrant molecules in the active fractions, while stochastic processes are more important for the assembly of molecules within the inactive fractions. Homogeneous selection showed little importance across the fractions in both regions (Fig. 4c).
Effect of DOM dark matter
DOM peaks can be assigned to identifiable molecular formulae using FT-ICR MS, yet a large proportion of DOM remains uncharacterized, often referred to as chemical “dark matter” (Hu et al. 2023). The role of molecular dark matter and its relationship with assigned (i.e., known) molecules, represent a major challenge for a complete understanding of biogeochemical cycles. The function iDME quantifies the effect of dark matter on DOM assemblages by constructing co-occurrence networks based on the presence and absence of dark matter (Hu et al. 2023). In each network, the nodes represent individual molecules, and the edges identify the interactions among molecules. Specifically, two types of networks are constructed: 'KK' networks, which include only known molecules, and 'DK' networks, which encompass both dark matter and known molecules at a 1:1 ratio or at the observed ratio in a DOM assemblage (Fig. 5a). These two networks have an identical number of nodes that are randomly subsampled from the whole DOM molecule pool and are further bootstrapped 100 times.
The function iDME calculates the indicator of dark matter effects (iDME) by quantifying the percentage change in the mean value of a given network metric, such as degree centrality, between “KK” and “DK” networks. Degree is defined as the number of edges connecting a focal node to other nodes (Proulx et al. 2005), and molecules with a higher degree have more interactions within an assemblage. Thus, positive and negative iDME values indicate that dark matter enhances and reduces network interactions within DOM assemblages, respectively, while an iDME of zero suggests a neutral effect. The iDME could be further divided into intra-iDME and inter-iDME to clarify whether the effects of dark matter result from changes in interactions between dark-dark nodes or between dark-known nodes.
In the example dataset, the function iDME showed that DOM dark matter substantially decreased the network connectivity in both temperate and subtropical regions along the temperature gradients (Figs. 5b, c). All iDME values for the network metric of degree were negative and significantly different from zero, with values ranging from − 24.3% to -17.9% for temperate DOM assemblages and from − 22.7% to -20.7% for subtropical DOM assemblages. The iDME values of temperate regions exhibited a significant increase along the temperature gradient (P < 0.05), while those of subtropical regions showed nonsignificant trend (P > 0.05). This result suggests that the negative effect of dark matter on temperate DOM assemblages decreased as the temperature increased. Furthermore, the partitioning of iDME showed that the effects of dark matter were mainly due to changes in links between dark-known nodes, followed by changes in links between dark-dark nodes for both temperate and subtropical regions.
Microbial mechanisms influencing DOM production and degradation
The fate of DOM is intimately linked to the metabolism of complex microbial communities, as microbes regulate the production and degradation of specific molecules, thus playing a crucial role in sustaining biogeochemical cycles (Hu et al. 2022a). The function H2 helps quantify the degree of specialization between DOM molecules and microbial taxa by constructing DOM-microbe bipartite networks based on resource-consumer theory. In the DOM-microbe networks, individual DOM molecules are connected exclusively to microbial taxa that use those specific molecules, while the direct interactions within molecules or taxa are not explicitly considered. According to resource-consumer relationships, negative network interactions likely indicate the degradation of larger molecules into smaller structures, while positive network interactions may relate to the production of new molecules, either through degradation or biosynthetic processes (Hu et al. 2022a).
The function H2 calculates the specialization index H2’ to quantify the degree of specialization between DOM and microbes, and standardizes H2’ using null modeling, such as the shuffle.web algorithm, to directly compare the network indices across different samples (Hu et al. 2022a). An elevated H2’ indicates a high degree of specialization between DOM and microbes (Bluthgen et al. 2006), with extreme cases where a single bacterial taxon might consume or produce just one specific DOM molecule. Conversely, lower H2’ values suggest a more generalized bipartite network where different DOM molecules can be used by a wide range of bacterial taxa (Hu et al. 2022a).
We applied the H2 function to the example dataset to examine how DOM-microbe associations vary under experimental temperatures (Fig. 6). In total, there were 1,108 and 1,938 interactions for the negative and positive networks (|SparCC ρ | > 0.5), respectively (Figs. 6a-b). The standardized H2’ values were negative and significantly lower than expected by chance (P < 0.05), indicating that the interactions between DOM and bacteria were non-random (Figs, 6c-d). Experimental warming showed divergent effects on the H2’ of negative or positive networks between the two regions. Specifically, for the positive networks, experimental warming significantly decreased H2’ for both temperate and subtropical regions (P < 0.05). For the negative networks, experimental warming significantly increased H2’ for the temperate region (P < 0.05), while there was no significant correlation at the subtropical region. Experimental warming in the temperate region could thus contribute to the greater recalcitrance of DOM by increasing production (i.e., less specialized positive networks) and reducing decomposition of molecules (i.e., more specialized negative networks) (Hu et al. 2022a).
Availability
The iDOM open-source software package is implemented in R and available for download via Github (https://github.com/jianjunwang/iDOM).