Study area
In Nepal, hills and high mountains cover about 86% of the total land area and the remaining 14% is the flatland which lies below less than 300m Altitude. Altitude varies from 60m above sea level in the Terai- the lowland stretching from east to west- to the Mount Everest, with 8,848m the highest peak in the world (25). An altitude is considered as a major factor that has resulted wide range of climatic conditions in Nepal (25). Average temperature decreases by 6oC for every 1,000m gain in altitude ((26) cited in (25). Wide altitudinal variations and diverse climatic conditions have produced four main physiographic zones i.e. Terai (lowlands), midhills, high mountains and high Himal((27)cited in(25)). These wide ranges of climatic conditions influence composition of flora and fauna in Nepal, and are mainly due to altitudinal variation (25).
Stainton ((28)cited in (25)) has classified 35 forest types in Nepal. These forests are broadly categorized into 10 major groups based on altitudinal range (25).These forests found in varied altitudinal ranges have been reported to store soil organic carbon, above ground tree biomass to the different level (29).
Data collection
The study used National Forest Inventory (NFI) data collected during 2010 to 2014. The NFI adopted two phase systematic sampling design with cluster plots. The NFI established 450 clusters and 1,553 Permanent Sample Plots (PSPs) systematically in the forest area representing different physiographic zones of the country. Data collection was conducted from Terai region (flat land) to high Himal region (High altitude land). Concentric circular sample plots (CCSPs) were used to record biophysical data. Tree attributes (diameter at breast height, total tree height, crown length, crown cover, species, quality class etc), stand level variables (forest types, disturbances on forest stands, management regime, slope, aspect, location etc) and soil samples were collected. The detail methodology of data collection has been shown in States of Nepal’s Forests (29)(hereafter, Forest Resource Assessment (FRA) report of Nepal)). The data represents the wide topography, i.e., altitude ranging from 88m to ~4000m and slope of 0-100%. Likewise, the data were taken from sparse forests to very dense forests.
Above ground tree biomass (AGTB) and soil organic carbon (SOC) analysis:
Above ground tree biomass was calculated by summing up stem biomass, branch biomass and foliage biomass. Stem biomass was calculated as a product of volume of the stem and air dried wood density (Equation 1). Furthermore, branch biomass and foliage biomass were calculated using branch-to-stem and foliage-to-stem ratios based on the species type and size of the stem at diameter at breast height (DBH)(30).
Biomass = Stem biomass = Volume * Density ………..Equation (1)
Where,
Volume = Stem volume in cubic meters (m3)
Density = Air dried wood density
Similarly, for SOC analysis, black wet combustion method ((31)cited in (32)) was applied in the soil samples collected up to 30 cm depth in the DFRS soil laboratory Nepal while dry combustion and LECO CHN Analyzer was used in the Metla Soil Laboratory, Finland for the quality assurance. Detailed method of estimating AGTB and SOC has been explained in the FRA report of Terai forest in detail(32).
Data split
Data analysis was focused on assessing SOC based on topographic (altitude, slope and aspect) and forest variables (AGTB, Basal Area (BA)/ha and crown cover)). Before splitting data, boxplot was used to check the presence of outliers in the data. Altogether 1026 SOC plots were used for the analysis. The data was split into two sets i.e. train data (80% i.e. 822) for developing models and test data (20% i.e. 204)for data validation using “createDataPartition” function in "caret" package (33).It splits data randomly into two different sub-sets with different proportions. All the data analysis work was done in R program (34).
Modelling
Before modelling, correlation analysis was performed to determine linear relationship of SOC with six independent variables (altitude, basal area, AGTB, slope, aspect, crown cover) using “cor” function in "stats" package in R(34).Afterwards, these six independent variables were used as function of SOC using “lm” function in the "stats" package (34).To find out the best predicted variables, stepwise regression method was applied using “step” function in the "stats" package (34). Altogether six different models were developed (M1, M2, M3, M4, M5, M6). Akaike Information Criterion(AIC) and coefficient of determination (R2) value were used as selection criteria for the best model. Number of predicted variables used in the models having lower AIC value and higher R2 value indicate better fits in the model. Moreover, VIF (variation inflation factor) function in "car" package (35)was used to determine the presence of multi-collinearity problem in the predicted variables in the models.In this way, models were tested based on several variables to assess SOC in the entire region of Nepal.
Data transformation
Prior to validation of the models, it is necessary to check whether the built models violate assumption of simple linear regression (i.e. homoscedasticity and normality). For this purpose, bptest function in "lmtest" package(36) for heteroscedasticity test and shapiro.testfunction in "stats" package (34) for normality tests were conducted. To overcome the problem of rejecting the null hypothesis of the homoscedasticity and normality assumption, the response variable (i.e. SOC) was transformed using BoxCoxTransfunction in "e1071" package (37) to normalize its distribution and newer models were developed ( TM1, TM2, TM3, TM4, TM5 and TM6).
Model validation
The model was validated using test data set. The predicted value of response variable was back transformed and compared with real value in the test data. The mean absolute percentage error (MAPE) was calculated to assess the accuracy of the model using “MAPE” function in "MLmetrics" package(33). Lower MAPE value indicates the higher accuracy of the models.