Assessment of instrumental error and correction of py-MBMS data
Due to the unprecedented size of the of the analysis set, quality control (QC) assessment was needed during analysis, particularly since the condition of the instrument changed between replicate analyses. Analysis of spectra from 6 types of standards monitored individually indicated no particular time-dependent trend, indicating differences were likely based on changes in instrument noise attributed to fluxuations in ion energy and the conditions of the path of pyrolysis vapors (Supplementary Table 1; Supplementary Figures 1-3). The estimated uncorrected lignin content was also reproducible between measurement replicates (Pearson correlation=0.87; R2=0.75; Supplementary Table 2). The standard deviation of lignin content between replicates ranged from 0.0–1.7 wt% lignin, all being less than 10% of the mean determined value. However, since there was minor spectral drift over the replicates of the population that was consistent among standards based on PCA and variance analysis (Supplementary Fig. 4), ions were subsequently corrected for “tray effect.” Most ions with high variance attributed to tray effects were not used in calculations for lignin composition or S/G ratio (the exceptions being m/z 167 and 181) and otherwise attributed to “noise” as outlined in the Discussion section.
Effect of microspatial environmental variation on py-MBMS spectra
We used a Thin Plate Spline (TPS) procedure to model spatial variation in py-MBMS ions in the field trial. Because of the randomized block design, genotypic effects should be randomly distributed throughout the field, so fitted values from this analysis represent environmental variation, while residuals represent genotypic effects plus error. The fitted values of the TPS models for the 421 ions displayed two distinctive patterns: simple (Fig. 1a) vs. complex surfaces (Fig. 1b). The Suface Complexity (SC) parameter was able to discriminate between these patterns, with values for simple surfaces close to 0 and values above 1 for complex surfaces. Among the 421 ions, 198 ions with null SC were free of microspatial influence (only corrected for “tray effects”), while the rest (n=223 ions) were impacted to varying degrees (Fig. 1c). The correlation between the Total Ion Chromatogram (TIC)-normalized ion intensities and the TPS residuals serves as another indication of the degree to which an ion is affected by microspatial variation (Fig. 1d). Fifteen out of the 17 ions used to quantify lignin in the spectra and all ions deriving from cell wall sugars and free phenolics had SC values in excess of 1, indicating that these cell wall components were affected by microenvironmental variation.
PCA of the ions based on their predicted TPS surface (Fig. 2), used here as a proxy for fine-scale environmental effects, yielded a PC-1 explaining 95% of the variation and PC-2 explaining 1%. The loadings for the first principal component were generally negatively correlated with lignin-derived ions with the exception of m/z 168 (primarily deriving from 4-methylsyringol) and m/z 194. When ions of TPS predicted surfaces were clustered in seven groups (Supplementary Fig. 5), the largest group (containing m/z 97) was related to phenolics and lignin-derived species, and the second largest (m/z 82) mostly consisted of sugar-derived ions and lignin dimers, again showing that these cell wall components vary spatially with the microenvironment. The rest of the clusters were small and were mostly composed of irrelevant or otherwise unannotated peaks. Peaks that are termed here as irrelevant may include noise, fragments associated with more abundant species (i.e., loss of a proton) or ions that may have many or unknown sources.
Inter-ramet variation captured in py-MBMS spectra
After TIC normalization and controlling for instrument and environmental variation, the peaks derived from cell wall components had high loadings values in PCA (Fig. 3) and were also among the most abundant and had high variance relative to the mean intensities measured across the population as shown in Figure 4a-b (Supplementary Fig. 6 shows PCA scores that are color coded corresponding to different field locations comparing before and after instrumental and environmental corrections). Approximately 120 ions were annotated based on comparisons with standards, libraries and literature where unannotated ions are either representative of an unknown component or have many sources such that their presence or source is not included or discussed (see Supplementary Table 3). The variance was highest for ions m/z 60 (C), 73 (C), 114 (C), 124 (G), 137 (G), 154 (S), 167 (S), 180 (L), 182 (S), 194 (S), and 210 (S) (where C denotes carbohydrate sugars, L for lignin, P for phenolics, G for G-lignin, S for S-lignin). These ions were also generally abundant in the spectra. However, other abundant ions such as m/z 57 (C), and 85 (C) did not exhibit as high variances relative to the mean intensity values as the former. Conversely, some ions were not particularly abundant but had high variance such as m/z 66 (P, L), 94 (P, L), 121 (P, L), and 138 (P, L).
PCA of the spatially-corrected ions also revealed negative correlations in lignin-derived ions (e.g., m/z 124, 137, 154, 210) and carbohydrate-derived ions (e.g., m/z 73, 85, 114, 126) (Fig. 3). PC-1 accounted for 41% of the spectral variation, where carbohydrate-derived ions generally were negatively correlated with lignin-derived ions. PC-2 accounted for 24% of the spectral variation, with carbohydrate and syringyl (derived from sinapyl monomers, S) ions were negatively correlated with guaiacyl (derived from coniferyl monomers, G)-derived ions (Fig. 3). Additionally, m/z 66, 94, 121 and 138 were negatively correlated with other lignin-derived species, likely indicating these ions were primarily derived from phenolics (such as phenols occurring as secondary metabolites in the case of salicylates or other lignin-like, but not true-lignin phenolics such as ferulate, coumarate, etc.) as opposed to the fragmentation of lignin-derived pyrolysates (although a positive contribution from lignin-derived analytes cannot be ruled out).
Heritability of py-MBMS spectral features
Gains in broad-sense heritability of the ions due to tray correction were marginal in most cases, though heritability of a few ions did improve noticeably with the correction (Supplementary Fig. 7). Values of broad-sense heritability for the TPS-corrected ion intensities ranged from 0 to 0.79, with annotated ions of highest heritability and noteworthiness summarized in Table 1. Permutation tests displayed thresholds of significance in heritabilities ranging from 0.028 to 0.037 for the combined tray-corrected and the TPS-corrected datasets. Although the ions with higher heritabilities were usually associated with complex surfaces for the TPS-fitted values, some ions with high heritability had simple TPS surfaces and SC values near 0 (e.g., m/z 55, 95, 167, 179, 181, 193, 195, 272, 312, and 302; Supplementary Table 4).
The most heritable ions (Table 1) also exhibited high variance in the population. However, several ions, including m/z 126 (C6 carbohydrates), 150 (G and H-lignin), 164 (G), 168 (S) 109 (P, L), 286 (G dimer) and 98 (C6 carbohydrates) were amongst the most heritable but exhibited relatively low variance. Maternal influence was almost always stronger for the most heritable ions, particularly lignin and phenolic-derived species. However, paternal effects were either more dominant or similarly influential as maternal effects for ions derived from carbohydrate sugars such as m/z 73, 97, 114 (see Supplementary Table 4 for full comparison of paternal and maternal variance associated with each ion and Supplementary Figure 8 for the % variance explained by mother vs father annotated by ion origin in biomass).
Hierarchical clustering (HC) using Spearman’s rank correlation distance metric with the complete linkage criterion was used to analyze the clustering of ions in combined tray-TPS corrected spectra to elucidate spectral associations based mostly on genetic information. Eight groups were elucidated in the spectra based on K-means clustering (Supplementary Fig. 9), summarized in Table 2 (full spectral groups outlined in Supplementary Table 5). Groups separated based on biocomponent sources similarily as those in the only TPS-corrected ions (Supplementary Fig. 10), indicating the majority of ions impacted by microspatial environment, and not ions highly impacted by instrumental variation, also were impacted by genetic variation of the population. Ions in the complete tray-TPS corrected spectra generally clustered according to biopolymer source although unannotated and noise ions appeared in all clusters to some degree. Interestingly, the most heritable ions (m/z 66, 94, 121, 138), which are produced from phenolics (possibly including salicylate-like metabolites known to occur in Populus [35-39]), were clustered together in cluster EK0 along with some lignin-derived species, including lignin dimer moieties (m/z 272, 286). The rest of the most heritable ions clustered according to their biocomponent source in clusters EK4 (G-lignin), EK5 (carbohydrate sugars), and EK6 (S-lignin) (Supplementary Table 5).
Familial patterns of ramets
Clustering of the samples based on the genotypic predicted values for py-MBMS spectra revealed some of the underlying family structure present in this population. PCA shows some differentiation of maternal half-sib families (Fig. 5a). The half-sib family from female/maternal ID 1950 (See Supplementary Table 6 for additional identifier information for each parent) in the lower right quadrant of the PCA scores plot had lower S/G and lower lignin composition in comparison to the half-sib family from female 4593 in the upper left quadrant of the plot (also see Supplementary Fig. 11). Clustering by Ward’s method using Euclidean distance revealed 7 clusters (Supplementary Fig. 12), where samples were previously classified into these groups based on K-means clustering meant to elucidate at least 7 different families (Supplementary Fig. 13). These groups largely corresponded to the maternal half-sib families (clusters colored in Fig. 5) as opposed to paternal half-sib families. Interestingly, one group of siblings (from female/maternal ID 1950) also produced the highest abundance of ion m/z 94 (PCA color coded in Supplementary Fig. 14), which can come from lignin but is otherwise attributed to the presence of phenolics such as salicylates (and in this case is otherwise not correlated positively with other lignin ion abundances such as m/z 210 as described previously (Supplementary Fig. 15 for example)). The clustering of the samples in conjunction with PCA of the spectra show that compositional relationships of several families can be elucidated, particularly based on the abundance of lignin and phenolic species. In some cases, several families generally produced similar or overlapping spectral features and hence appeared to have similar biomass composition and hence differences were not noted. Additionally, clustering methods differentiating PCA projections or scores based on MBMS spectra may identify spectral groups that consist of many family types and members and don’t necessarily separate families based on spectra. Either way, the use of clustering methods in combination with PCA of the py-MBMS spectra enable the visualization and validation of the compositional relationships within or across families that are able to be captured in the spectra.
Analysis and heritability of cell wall traits from corrected py-MBMS spectra
The average lignin content of the entire population (n = 2721), after correcting for microspatial variation of the genotypes, was 25.5% (after taking replicate averages for each sample into account), ranging from 20.9-27.9% (Table 3; Fig. 6a). The S/G ratio ranged from 1.56 to 2.77, with an average of 2.10 (Table 3; Fig. 6b). These lignin metrics are typical for variants of P. trichocarpa as previously determined in Muchero et al. [1]. The broad-sense heritability of lignin composition based on TPS-corrected values was 0.56 and the heritability of S/G was 0.81.