Study area
The study was conducted in the municipality of Villavicencio, located in the Depositional Piedmont of the Eastern Cordillera in the department of Meta, southeastern Colombia (Latitude (4° 3’2.67”, 4°14’5.31”, 4° 7’47.95”, 4° 2’35.13” N) Longitude (73°45’40.61, 73°37’29.84, 73°14’20.59, 73°15.39’44.76”)). The total study area covers 673 Km2, with altitudes between 232 and 1349 m, representing 51% of the total area of the municipality. According to the Köppen classification system, the regional climate is Am - tropical monsoon; and according to the Holdridge classification system the life zone of the study area is BmhT - very humid tropical forest. This formation corresponds to the transition zone between the mountainous landscape of the Eastern Cordillera and the Depositional Piedmont that occupies the central region of the department (IGAC, 2004).
Geology, geomorphology and soils
With the differential uplift of the Eastern Cordillera during the Quaternary, the Paleozoic formations formed by the alternation of schists with argillites, sandstones, gneiss, and phyllites currently outcrop in most of the Eastern Cordillera and give rise to the mountain soils. At the end of the Pleistocene, the erosive cycle of the Eastern Cordillera and sedimentation in the plain were reactivated due to the last glaciation, since the Cretaceous material of San Juanito, El Calvario and Páramo de Sumapaz (sandstones, argillites and shales) was finally eliminated almost in its entirety and deposited in the partially subsided foothills. These materials of clays and sandy conglomerates were deposited in the foothills, following the course of the main rivers such as the Guayuriba, Guatiquía, et al., as part of their alluvial plains. At the end of the glaciations in the Holocene, the level of regional erosion decreased and created the most recent geoforms that are currently identified in the plains landscape. The rivers also decreased leaving their level below the earlier river level, thus forming a system of terraces. The sedimentation pattern at the foot of the Eastern Cordillera is typically fluvial and comprises a vast plain extending from the Cordillera to the Orinoco River (IGAC, 2004).
The region is dominated by inceptisols, oxisols and, to a much lesser extent, entisols, all of which have developed mainly on a depositional piedmont landscape; about 54% of the total study area corresponds to this type of landscape followed by 26% in an alluvial plain and 13% in a mountain landscape.
Climate
The study area encompasses two major bioclimatic units: the Eastern Cordillera area, a very humid medium climate (m-MH) with precipitation above 4000 mm per year and temperatures that decrease as altitude increases; and the Depositional Piedmont with a warm humid climate (c-MH) where precipitation decreases and temperatures gradually increase as one approaches the alluvial plain from west to east (IGAC, 2004). In general, the Depositional Piedmont climate has an average temperature of 27°C and a bimodal precipitation regime that is largely determined by the Intertropical Convergence Zone and the influence of the mountainous relief of the Eastern Cordillera. The first rainy season occurs between April and June, and a second rainy season peaks in October. The average annual rainfall is 4260 mm and the dry season includes the months of December to February with less than 200 mm in total (Varón and Vargas, 2019).
Soil sampling and laboratory analysis
A regular sampling grid with 12.5m×12.5 m for every 2 km was established at the study site. This resulted in 152 samples collected at different locations. At each location one sample was compose for three different soil sample for each soil layer 0–10 cm, 10–20 cm and 20–30 cm and a one composite sample for the depth 0–30 cm. The soil samples were air-dried, ground and sieved (2 mm mesh). The total carbon content was determined using dry combustion (Dumas method) in a CN Elemental Analyzer (LECO Inc., St. Joseph, MI, USA). Analysis of soil particle size distribution was performed using the hydrometer method, in which sodium hydroxide (4 g L-1) and sodium hexametaphosphate (10 g L-1) were employed as dispersing agents. Elevation data are from the ALOS/PALSAR digital elevation model (DEM) developed by Alaska Satellite Facility (ASF) of spatial resolution 12.5 m × 12.5 m (https://nasadaacs.eos.nasa.gov/).
Acquisition and pre-processing of NIR spectra
To obtain spectral measurements, the samples were oven dried for 24 hours at 45°C and they were ground and sieved (2 mm mesh). Subsequently, a modular FT-NIR NIRFlex N-500 spectrometer (https://www.buchi.com/se-en/node/822) was used to measure the NIR reflectance spectrum for each soil sample, 152 in total. The data were obtained with a resolution of 0.2 nm between 1000 and 2500 nm. Before scanning the soil spectra, the soil samples were placed in petri dishes (9 cm in diameter by 1.5 cm in height) and distributed homogeneously to reduce the influence of the roughness of the material. The instrument was calibrated using a standard Spectralon white plate, and this procedure was performed at first and repeated every 20 minutes during the readings. For each soil sample, the spectrometer automatically calculated an average of 100 scans. In addition, three independent measurements were taken for each soil sample, mixing the soil sample in each measurement to ensure a better representation of the examined surface. The final spectrum was finally calculated as an average of the three spectral scans. NIR reflectance spectra were converted to apparent absorbance (A) using the expression A = log(1/reflectance); the logarithm being natural in all cases. Multiplicative interference and light scattering effects were reduced by normalizing spectra using the standard variable normal algorithm (SNV) (Barnes and Lister, 1989).
Statistical analysis
Once the reflectance data matrix was available for the soil samples of the three landscapes being considered (first factor: mountain, piedmont and plain) and the three soil depths and a composite sample of the three depths (second factor: 0–10 cm, 10–20 cm, 20–30 cm and 0–30 cm), a matrix of dimension 72⋅(2 + 1501) was generated, using two columns for factors and 1501 reflectances, where two columns correspond to the structure of the classification factors of the observational study; this generated 12 combinations of levels. In order to carry out this analysis, two options are proposed: the first one, the evaluation of the whole spectrum using the method combining the AOV variance analysis and the principal component analysis, not for a run as it would be done separately with each technique but generating a series of sub-matrices associated to the two-factor model with interaction given by the decomposition of the response matrix, where it represents the landscapes (l = 1,2,3), d represents the different depths (d = 1,2,3,4) and the number of samples taken (s = 1,2,...,6). The matrix y represents the 1501 soil reflectances for the combined l x d x s = 72 rows of data. The decomposition for a single vector of particular reflectances is written as:
$$\:{y}_{lds}=\stackrel{-}{y}+\left({\stackrel{-}{y}}_{l.}-\stackrel{-}{y}\right)+\left({\stackrel{-}{y}}_{d.}-\stackrel{-}{y}\right)+\left({{\stackrel{-}{y}}_{ld}-\stackrel{-}{y}}_{l.}-{\stackrel{-}{y}}_{d.}+\stackrel{-}{y}\right)+\left({y}_{lds}-{\stackrel{-}{y}}_{ld}\right)$$
1
where\(\:\:\stackrel{-}{\varvec{y}}\) represents the vector of overall reflectance averages, \(\:{\stackrel{-}{y}}_{l.}\) represents the vector of means for the first factor with its levels, \(\:{\stackrel{-}{y}}_{d.}\) represents the vector of means of the second factor and its levels, and \(\:{\stackrel{-}{y}}_{ld}\) represents the vector of means for the combination of levels of both factors. The first operation performed by the algorithm is to standardize the responses, which is achieved with \(\:{\varvec{D}}_{1}={\varvec{y}}_{lds}-\stackrel{-}{\varvec{y}}\) (Harrington et al, 2005 notation), thus using the whole spectrum the first working submatrix of the algorithm is achieved. For the first factor analysis associated with the landscape, the effect matrix\(\:\:\stackrel{-}{{\varvec{D}}_{\varvec{A}}}=\left({\stackrel{-}{\varvec{y}}}_{l.}-\stackrel{-}{\varvec{y}}\right)\) is calculated for all the reflectances considered in the model, so that using Eq. 1 for a single reflectance would mean that\(\:\:{D}_{2}={y}_{lds}-\stackrel{-}{y}-\left({\stackrel{-}{y}}_{l.}-\stackrel{-}{y}\right)={D}_{1}-\stackrel{-}{{D}_{A}}\) (remember that the bold letters refer to all the reflectances, so that the normal text only illustrates a particular reflectance). Similarly, for the second factor associated with depth, we obtain (in a sequential work, i.e., the second factor\(\:\:\stackrel{-}{{\varvec{D}}_{\varvec{B}}}=\left({\stackrel{-}{\varvec{y}}}_{d.}-\stackrel{-}{\varvec{y}}\right)\) after considering the first one) first the effects matrix of the second factor and then we obtain \(\:{D}_{3}={y}_{lds}-\stackrel{-}{y}-\left({\stackrel{-}{y}}_{l.}-\stackrel{-}{y}\right)-\left({\stackrel{-}{y}}_{d.}-\stackrel{-}{y}\right)={D}_{1}-\stackrel{-}{{D}_{A}}-\stackrel{-}{{D}_{B}}={D}_{2}-\stackrel{-}{{D}_{B}}\). With the calculation of D3 for a reflectance (or D3 for the whole spectrum or some subset selected by related theory or by some other applied method). Eq. 1 operated sequentially has on the right-hand side of the equality only the interaction effect and that associated with the error, which in the case of a model with no interaction and a single reflectance observation for each level of the factors (s=1) involved would be written as:
$$\:{y}_{ld}=\stackrel{-}{y}+\left({\stackrel{-}{y}}_{l.}-\stackrel{-}{y}\right)+\left({\stackrel{-}{y}}_{d.}-\stackrel{-}{y}\right)+\left({y}_{ld}-{\stackrel{-}{y}}_{l.}-{\stackrel{-}{y}}_{d.}+\stackrel{-}{y}\right)$$
2
Thus, if we have the case of Eq. 2, \(\:{\varvec{D}}_{3}\), it would already represent the error matrix, however, in the present case, it is still necessary to calculate\(\:\stackrel{-}{{\:\varvec{D}}_{\varvec{A}\varvec{B}}}\) the one associated with the interaction effect, so that the error matrix would be obtained from\(\:\:{\varvec{D}}_{4}={\varvec{D}}_{E}={\varvec{y}}_{lds}-{\stackrel{-}{\varvec{y}}}_{ld}\).
The evaluation of the first factor with the technique that integrates ANOVA and PCA requires the matrix\(\:{\:\varvec{M}}_{A}=\stackrel{-}{{\varvec{D}}_{\varvec{A}}}+{\varvec{D}}_{E}\) to be obtained and it is here where PCA enters on this matrix using the decomposition by singular values of\(\:{\:\varvec{M}}_{A}={\varvec{U}}_{A}{\varvec{S}}_{A}{\varvec{V}}_{A}^{T}\), where\(\:\:{\varvec{V}}_{\varvec{A}}\) compresses the loadings of the variables, \(\:{\varvec{S}}_{A}\) is a diagonal matrix with the singular values and \(\:{\varvec{U}}_{A}\) compresses the normalized scores. With the matrix product of \(\:{\varvec{U}}_{A}{\varvec{S}}_{A}\), we obtain the values that are finally plotted and that in the technique could reflect the differences attributable to the effect of the first factor. This same procedure is followed for each effect involved in the study to generate the characteristic outputs of the technique, which are used to study the interaction effect, as well as the main effects.
Once the interaction of factors and the main effects of the first database had been studied, we proceeded to work with a second database generated at a single depth (thanks to the results of the first analysis) for all the landscapes. First, a plot was created between the ratio of soil bulk density (BD) and soil organic carbon with each of the reflectances measured at each wavelength. For each iteration of three variables (fixing bulk density and SOC), we obtained the correlation matrix, extracted the first eigenvalue and percentualized with respect to the sum of all other eigenvalues. With the data of the first eigenvalue it was generated a plot for all the lengths, which was used to apply six classification algorithms and thus select three bands from the whole spectrum evaluated intersecting all the ranges obtained from each cluster created; in our case this was three groups (suggested by the algorithm). We proposed a weighted average reflectance (\(\:\stackrel{-}{{\rho\:}_{w}}\)) for any band with the expression:
$$\:\stackrel{-}{{\rho\:}_{w}}=\frac{\sum\:_{{j=\lambda\:}_{\text{m}}}^{{j=\lambda\:}_{\text{M}}}\sum\:_{i=1}^{n}{\rho\:}_{ij}{\lambda\:}_{j}}{\sum\:_{j={\lambda\:}_{\text{m}}}^{{j=\lambda\:}_{\text{M}}}{\lambda\:}_{j}}$$
3
where \(\:{\lambda\:}_{j}\) represents the j-th wavelength (j = 1,...,mk), where mk is an indicator of the number of wavelengths within each band (which can have different sizes), represents the reflectance observed at the i-th observation and j-th wavelength, and \(\:{\lambda\:}_{\text{m}}\:\)and \(\:{\lambda\:}_{\text{M}}\) represent respectively the minimum and maximum wavelength value of a band. With these calculations, we obtained the three weighted mean reflectance vectors, which allowed forming the final data matrix to estimate soil organic carbon using a spatial regression model, specifically the spatial model of the error, which according to Elhorst's (2014) taxonomy is written as:
$$\:\:\:\:\:\:\varvec{Y}=\alpha\:{\varvec{\iota\:}}_{n}+\varvec{X}\varvec{\beta\:}+\varvec{u}\:;\varvec{u}=\lambda\:\varvec{W}\varvec{u}+\varvec{\epsilon\:}$$
4
where \(\:\varvec{Y}\) represents the vector of observed SOC value, \(\:\varvec{\alpha\:}{\varvec{\iota\:}}_{\varvec{n}}\) represents the intercept (vectorially), \(\:\varvec{X}\) the matrix of explanatory variables, namely the weighted mean reflectances of each of the selected bands and the bulk density, \(\:\varvec{\beta\:}\) represents the vector of parameters in the explanatory variables, \(\:\varvec{u}\) represents the residuals of the model (with spatial dependence), which is explained by the expression \(\:\varvec{u}=\lambda\:\varvec{W}\varvec{u}+\varvec{\epsilon\:}\), where \(\:\lambda\:\) represents the spatial autocorrelation coefficient, \(\:\varvec{W}\) the matrix of spatial weights, created with the inverses (element to element) in the matrix of Euclidean distances (d) following the pattern d-1/2 and with a structure of neighborhoods given by the nearest neighbor (with W-type standardization). Finally, \(\:\varvec{\epsilon\:}\) represents the vector of normal, independent and identically distributed residuals.
Using the modeling results and metrics for model fit (statistical significance of the spatial autocorrelation coefficient), along with the fulfillment of assumptions (normality and independence of the type \(\:\varvec{\epsilon\:}\) residuals, evaluated with the Shapiro-Wilk test and the Moran index), we estimated soil organic carbon values by aggregating data in hexagons with a 3 km diameter in QGIS (Qgis.org, 2024), adjusting the anchor point from the software's default setting. This generated a table of attributes that we append in the supplementary material in the GitHub address (Supplementary Materials). As we built the spatial model based on regular polygons, some landscapes fell into the same polygon for some of the 72 polygons. The estimated value of each of these was assigned equally to each landscape according to its location in the aggregation. With these values, we constructed a scatter plot between the SOC estimated by spatial regression and the mean bulk density of each polygon per landscape. We used the ordinary least squares method to adjust the (non-spatial) regression line to study the bulk density domain and the values where the SOC oscillates in each landscape to characterize the bulk density and the estimated SOC per landscape. We calculated all PCA analyses using the R::ChemoSpec library (Hanson,2024), spatial regression with R::spatialreg (Bivand and Piras, 2015) and Moran index calculations with R::ape (Paradis and Schliep,2019).