To better understand the specifics of the elemental distributions within the leaf samples, descriptive statistics were calculated on the measured analyte concentrations. These values are given in Table 1.
Table 1
Descriptive statistics for each analyte. Values are given in ug/g (ppm) unless otherwise specified.
| Min | Median | Mean | Max | SD | RSD (%) |
N | 12800.00 | 47200.00 | 44971.75 | 75400.00 | 12889.69 | 28.66 |
Ca | 6163.34 | 41.804.06 | 44425.95 | 117215.36 | 15897.07 | 35.78 |
K | 9767.47 | 26697.73 | 26732.15 | 48175.00 | 6316.85 | 23.63 |
Mg | 2014.16 | 7703.00 | 8028.99 | 23306.93 | 2716.79 | 33.84 |
P | 2372.64 | 5815.02 | 5911.25 | 13201.90 | 1485.39 | 25.13 |
S | 0.00 | 3390.00 | 3421.55 | 23830.00 | 1143.53 | 33.42 |
Mn | 15.11 | 147.80 | 177.52 | 830.50 | 124.15 | 69.93 |
Fe | 33.99 | 106.58 | 142.33 | 1737.38 | 136.84 | 96.14 |
B | 19.41 | 79.45 | 94.21 | 405.12 | 58.85 | 62.47 |
Zn | 10.34 | 41.88 | 47.90 | 239.16 | 25.07 | 52.34 |
Cu | 0.92 | 4.12 | 4.57 | 32.73 | 2.32 | 50.67 |
Mo | 0.00 | 0.60 | 1.84 | 26.66 | 2.84 | 154.69 |
Ni | 0.00 | 0.06 | 0.08 | 1.37 | 0.08 | 104.93 |
Measured concentrations range from a non-detectable minimum to a maximum of 11.7%. Of the elements measured, nitrogen, calcium, and potassium were the only three with averages greater than 1%, indicating these three elements are in greatest abundance within the plant. In contrast, manganese, iron, boron, zinc, copper, molybdenum, and nickel display average concentrations less than 0.02%, indicating these elements are consistently found in only trace amounts. The remaining elements: magnesium, phosphorus, and sulfur, fall between these groups of macro and micronutrients.
Use of the Shapiro-Wilk test provided evidence to reject the normality assumption for each of the analytes measured. The descriptive statistics in Table 1 suggest most of these analytes are skewed right, as the concentration values possess lower bounds at a concentration of zero. Due to this normality violation, nonparametric tests were used in place of traditional parametric analysis (27).
The Kruskal-Wallis test confirmed the presence of significant differences in the medians between groups (18, 19). Dunn’s test then yielded insignificant p-values (values exceeding the significance threshold of 0.05) for the pairwise relationship of both nitrogen and calcium and manganese and iron (20). These results suggest their distributions are similar. Of these, nitrogen and calcium are the two most abundant elements in the samples tested.
3.1 Correlations Among Trace Elements
Nonparametric Spearman’s correlations were computed between pairwise groups of elements. These correlation coefficients are expressed using a correlation matrix, displayed in Table 2. The four coefficients with the greatest magnitudes are bolded.
Table 2
Correlation analysis for the element concentrations of cannabis leaves. Coefficients with the four greatest magnitudes are bolded.
| N | P | K | B | Zn | Ni | Mo | Mn | Mg | Fe | Cu | Ca | S |
N | 1 | | | | | | | | | | | | |
P | 0.29 | 1 | | | | | | | | | | | |
K | 0.45 | 0.25 | 1 | | | | | | | | | | |
B | -0.24 | 0.10 | 0.03 | 1 | | | | | | | | | |
Zn | 0.21 | 0.38 | 0.12 | 0.07 | 1 | | | | | | | | |
Ni | 0.24 | -0.06 | 0.23 | -0.11 | 0.12 | 1 | | | | | | | |
Mo | 0.29 | 0.33 | 0.10 | 0.22 | 0.12 | 0.01 | 1 | | | | | | |
Mn | -0.14 | 0.11 | -0.10 | 0.21 | 0.51 | 0.02 | 0.01 | 1 | | | | | |
Mg | -0.06 | 0.22 | 0.03 | 0.66 | 0.15 | -0.02 | 0.26 | 0.21 | 1 | | | | |
Fe | 0.24 | 0.23 | -0.05 | -0.01 | 0.27 | 0.12 | 0.40 | 0.22 | 0.09 | 1 | | | |
Cu | 0.59 | 0.24 | 0.38 | 0.02 | 0.28 | 0.34 | 0.40 | 0.09 | 0.02 | 0.28 | 1 | | |
Ca | -0.13 | 0.09 | -0.04 | 0.44 | 0.38 | 0.08 | -0.06 | 0.47 | 0.63 | 0.05 | -0.09 | 1 | |
S | 0.61 | 0.36 | 0.36 | 0.08 | 0.13 | 0.18 | 0.52 | -0.12 | 0.09 | 0.25 | 0.52 | -0.12 | 1 |
Magnesium and boron are the most heavily correlated of all the pairwise combinations (r = 0.66). Magnesium and calcium, nitrogen and sulfur, and nitrogen and copper also possess similarly moderate, positive correlations.
3.2 Principal Components Analysis
Principal components analysis (PCA) was used to reveal three distinct clusters of elemental variances within the dataset. Conducting PCA on the dataset resulted in thirteen principal components. The proportion of the variance explained by each of the components, including the cumulative proportion, is expressed by the scree plot in Fig. 1.
The first five principal components account for 69% of the total variation, yielding 24%, 18.7%, 10%, 8.2%, and 8.2% of the variance for each respectively. There is a noticeable drop after the second principal component, after which each subsequent component yields diminishing returns. Therefore, for the purpose of this analysis, only the first three principal components will be considered, as these account for over 50% of the variation. To better understand the specific relationships, present within these principal components, biplots can be constructed utilizing the loadings and scores of each analyte (22, 24). These plots are expressed in Fig. 2.
In the plot of PC1 against PC2 in Fig. 2, the loading vectors for boron, calcium, manganese, magnesium, and zinc all have small angles in respect to one other while also being relatively parallel to the PC1 axis. These elements account for the most variation in both the first principal component and the dataset. They also present the highest correlations in Table 2, providing further evidence for this relationship. Likewise, the loading vectors for nitrogen, potassium, copper, and sulfur also possess small angles in relation to one another and are relatively parallel to the PC2 axis. Molybdenum, iron, and potassium fall in between these two groupings, appearing at an angle to both axes.
The second plot of PC2 against PC3 expresses the same relationship among nitrogen, potassium, copper, and sulfur, both against one other and the second principal component. The other analytes, however, do not appear to possess strong correlations among themselves or the third principal component. This indicates much of the strong correlation among these variables exists in the first two principal components.
Finally, the third plot of PC1 against PC3 shows much of the variation existing along the first principal component. This is to be expected, as the first principal component accounts for significantly greater variation than the third. Nickel, which exerts almost no influence on the first principal components, has a loading vector nearly parallel to the third principal component in a positive direction. This stands in contrast with the plot of PC1 against PC2, where nickel was parallel to the second principal component, but in a negative direction. This indicates variation in nickel is largely accounted for by the second and third principal component rather than the first.
This analysis confirms the presence of three specific groups of elements that tend to vary together, and in contrast to other elements. The specifics of these groupings, however, cannot be established by PCA. Other clustering algorithms are necessary to further elucidate these relationships.
3.3 K-means Corroborates the PCA Clusters
The K-means clustering algorithm was utilized to identify three distinct clusters in the dataset. Three clusters were chosen as the optimal amount utilizing the silhouette method, modified to include scaled inertia (28). This agrees with the general clustering observed in the first two components of the PCA model.
Following K-means analysis, the clustering can be visualized by plotting against the first two principal components, as is expressed in Fig. 3.
Overlaying this clustering over the biplot of the first two principal components can help interpret these clusters. The first grouping, denoted in red, lies in the same direction as the loading vectors for phosphorus, sulfur, copper, potassium, nickel, and nitrogen. This indicates the first cluster will tend to possess higher values for these elements. In contrast, the second grouping, denoted in green, lies in the opposite direction of these vectors, indicating lower values for these elements. This trend is further confirmed by examining the distributions of these scaled analytes in each of these clusters, which is expressed in the first plot of Fig. 4. The distribution of the red grouping tends to fall above the other two groups for each analyte present, while the distribution of the green grouping tends to fall below the other two.
Likewise, boron, calcium, manganese, magnesium, zinc, molybdenum, and iron are all directed toward the third cluster, denoted in gray. This indicates the third cluster will tend to possess higher values for these elements. The second plot in Fig. 4 corroborates this conclusion.
3.4 Strain Assignment to K-Means Clusters
Utilizing the strain identifiers for each cultivator, frequencies of cluster assignment were determined for each distinct strain. Strains were discarded from analysis which possessed fewer than ten samples in order not to bias classifications with underrepresented strains. This resulted in 742 samples divided among twenty-three distinct strains. The percentage assignment of these samples to each of the three categories identified by the clustering algorithm are presented in Table 3. Dominant clusters are bolded.
Table 3
Strain specific groupings into K-means categories. Category colors denote the percent inclusion of each strain into the categories described in Figs. 4 and 5. Dominant clusters are bolded.
Strain Code | Sample Count | Red | Green | Grey |
C1-S4 | 10 | 100 | 0.00 | 0.00 |
C1-S2 | 15 | 93.33 | 0.00 | 6.67 |
C1-S10 | 63 | 66.67 | 31.75 | 1.59 |
C1-S8 | 45 | 66.67 | 17.78 | 15.56 |
C1-S12 | 50 | 58.00 | 36.00 | 6.00 |
C1-S1 | 28 | 57.14 | 39.29 | 3.57 |
C1-S3 | 106 | 44.34 | 34.91 | 20.75 |
C2-S4 | 23 | 8.70 | 86.96 | 4.35 |
C2-S10 | 25 | 8.00 | 84.00 | 8.00 |
C2-S8 | 31 | 16.13 | 83.87 | 0.00 |
C2-S23 | 15 | 20.00 | 80.00 | 0.00 |
C2-S7 | 19 | 21.05 | 78.95 | 0.00 |
C2-S2 | 36 | 16.67 | 63.89 | 19.44 |
C2-S9 | 37 | 35.14 | 62.16 | 2.70 |
C1-S11 | 20 | 30.00 | 55.00 | 15.00 |
C2-S3 | 32 | 34.38 | 52.12 | 12.50 |
C1-S7 | 77 | 31.17 | 45.45 | 23.38 |
C3-S3 | 11 | 0.00 | 27.27 | 72.73 |
C3-S1 | 14 | 7.14 | 28.57 | 64.29 |
C5-S1 | 27 | 3.70 | 33.33 | 62.96 |
C5-S3 | 18 | 5.56 | 38.89 | 55.56 |
C3-S2 | 13 | 0.00 | 46.15 | 53.85 |
C5-S2 | 27 | 7.41 | 40.74 | 51.85 |
It is evident that several strains were able to be grouped almost entirely into a single cluster. As evidenced by Fig. 4, strains C2-S4, C2-S10, and C2-S8 are almost entirely contained within the first cluster, meaning they tend to have higher concentrations of copper, potassium, nitrogen, nickel, phosphorus, and sulfur in comparison to the other two groups. Likewise, strains C1-S4 and C1-S2 are almost entirely contained within the second cluster, indicating they would tend to have lower concentrations of those same elements. Strains present in the third cluster would therefore tend to possess higher values for boron, calcium, iron, magnesium, manganese, molybdenum, and zinc.