Feature selection and modeling forest tree canopies using supervised and unsupervised neural network self-organizing maps (case study: District 2, Kacha, Rasht, Iran)

doi:10.21203/rs.3.rs-4662954/v1

Download PDF

Research Article

Feature selection and modeling forest tree canopies using supervised and unsupervised neural network self-organizing maps (case study: District 2, Kacha, Rasht, Iran)

https://doi.org/10.21203/rs.3.rs-4662954/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Canopy is a component of gross primary production, and the corresponding dimensions reflect tree health. There is a need to study canopies in the forests of northern Iran, in particular the Hyrcanian Forests, due to their unique biodiversity, endangered conditions, and their role in climate moderation. The sampling was executed using a systematic random method with grid dimensions of 150 × 200 meters. In these circular sample plots, each covering an area of 0.1 hectares, the sampling intensity was designated at 3.3%.. Within each plot, in addition to recording topographical attributes such as elevation, slope, aspect, and of trees greater than 7.5 centimeters(DBH) essential data was gathered. The current study aims to use the SSOM neural network to estimate forest tree canopies in the District 2, Kacha using self-organizing maps (SOM)-selected variables. The SOM neural network results reveal the significant role of the elevation, slope, aspect, and diameter at breast in the map structure. After selecting major features affecting tree canopies with the SOM neural network, elevation, slope, aspect, and diameter at breast variables were introduced to the supervised self-organizing maps (SSOM) neural network to estimate Fagus Orientalis Lipsky, Carpinus betulus L., Diospyros lotus L., Alnus subcordata CAM, and Parrotia persica (DC) CAM tree canopies. The result show that the SOM neural network focuses on key factors to increase modeling efficiency by removing unnecessary data and improving prediction accuracy by ensuring the use of selected variables. Further more, the strong performance of SSOM neural network in tree canopy estimation, particularly Fagus Orientalis trees, by utilizing SOM-selected features. It further highlighted the network's ability to use selected features for accurate and reliable estimations.

Best matching unit

Competitive learning

Canopy

Hyrcanian forests

Neighborhood function

Canopy is a component of gross primary production, and the corresponding dimensions reflect tree health (Zörner et al., 2018). Crowded and massive canopies are linked to growth rate potential, and sparse, small canopies reflect the unsettled conditions of habitats (competition, moisture, and disease) (Chen et al., 2022; Liu et al., 2024; Song et al., 2024). There is a need to study canopies in the forests of northern Iran, in particular the Hyrcanian Forests, due to their unique biodiversity, endangered conditions, and their role in climate moderation (Ghomi Avili et al., 2020). These forests are vital habitats for various species, some of which are endemic and prone to extinction (Taefi Feijani and Azadnejad, 2020, Amini et al., 2022). The study of canopies enables scientists to evaluate forest health, monitor species diversity, and develop strategies for protection and sustainable management (Sefidi et al. 2011). Additionally, canopies play a critical role in carbon sequestration and providing ecosystem services, such as moisture adjustment and soil protection, for regional and global environmental health (Nakamura et al., 2017). Since traditional methods for canopy measurement, such as visual interpretation of satellite images or ground measurements, can be time-consuming, labor-intensive, and prone to errors, most forest managers and researchers seek to estimate such elements as canopies by measuring several variables (Boncina and Cavlovic, 2009; Korhonen et al., 2017; Vafaei et al., 2022). Artificial neural networks (ANNs) are computational models inspired by the human brain's structure and function (Imada, 2014). They can process huge data for recognizing trends and complex predictions and continue to grow due to progress in computational power, data accessibility, and the development of more complex algorithms in natural resources and forestry sciences (Kanwisher et al., 2023).

Liu et al. (2022) used a Network-Guided Interpolation (NNGI) neural network for mapping forest canopy height by integrating Gedi, ICESat-2, ATLAS, and Sentinel-2 images. They assessed its performance with a set of various validation data and concluded that the suggested method had high mapping accuracy in estimating forest canopy height in China. Akin et al. (2024) used Sentinel-2 satellite data with a multilayer perceptron (MLP) neural network for estimating urban percent tree cover (PTC) in Bursa, Turkey, by employing vegetation cover indices such as NDVI and LAI with socioeconomic, topographic, and biophysical indices. They reported that the MLP neural network could provide a very accurate PTC prediction map. Zhu et al. (2024) use a random forest regression for estimating forest canopy height using remote sensing, merging GEDI and ICESat-2 satellite data with Landsat 9 images, and SRTM ground data. The results revealed that space LIDAR data with Landsat 9 data yielded a more accurate estimation of the canopy height than using Landsat 9 data alone. Kohonen was the first to propose the self-organizing map (SOM), which is one of the most common ANNs (Kohonen, 1990). The SOM neural network maps data on a 2D neural network allowing it to easily recognize patterns, relationships, and clusters in dataset (Vesanto and Alhoniemi 2000). This feature is very useful for data non-linear relationships, which may not be possible through traditional analysis methods (Kohonen, 1990; Liu and Liu 2011, Yu et al. 2011, Fan et al. 2012, Javad et al. 2023). Since the SOM neural network is inherently used for unsupervised learning, such as clustering and data visualization, the supervised SOM (SSOM) neural network is an advanced form of the neural network combining the SOM neural network principles with supervised learning methods. This combined approach aims to use the advantages of unsupervised and supervised learning paradigms to improve performance in different tasks, such as classification, prediction, and pattern recognition (Keller et al., 2018; Riese et al., 2020). The current study aims to use the SSOM neural network to estimate forest tree canopies in the District 2, Kacha using SOM-selected variables. Despite its innate advantages, no previous work has so far estimated forest tree canopies using the SSOM neural network, hence this research is a pioneering attempt in this field.

Study area. District 2, Kacha, with a total area of 2 399 ha, is located in the Saravan forest, northern Iran (Fig. 1). The study area is bounded by 49°32'24''E to 49°35'29''E on the east longitudes. The elevation of the area is between 120 and 785 m asl. According to Dumarten's climate classification method, the average annual precipitation of about 1255.7 mm, and the annual temperature of the research area are 16 ◦C. The climate of the study area is very humid. The study area is covered by broad-leaved tree species: Fagus Orientalis Lipsky, Carpinus betulus L., Diospyros lotus L., Alnus subcordata CAM, and Parrotia persica (DC) CAM. Parceling was based on natural topographies, the current road network, and parceling borders on the map. In the intended three-digit number, the second digit indicates the series number, and the others denote the parcel of the District 2, Kacha.

2.1. Sampling of forest tree canopies

Preliminary sampling was conducted randomly with 30 circular sample plots (n), covering 0.1 ha in the region. This was done todetermine the number of required sample plots. The following Eq. (1) was then used to determine the number of sample plots (Zobeiry2005; Table 1):

N= $\frac{{t}^{2}\times {\text{\%}\text{S}\text{x}}^{2}}{{\text{\%}\text{E} }^{2}}$ (1)

Table 1

Preliminary sampling with 30 random circular samples
$\stackrel{-}{\mathbf{x}}$	SD	SD%	E	% E	t	N
10.57	3.687	34.891	0.264	2.5	2	779

$\stackrel{-}{\mathbf{x}}$ – mean of characteristic of the study; SD – standard deviation, SD% – standard deviation percentage; E – standard error; E% – percentage of statistical error; t – Student's t-test; n – number of circular sample plots; N – number of sample plots required; the t-student table was also used to obtain t based on the number of sample plots and the probability level

The sampling was executed using a systematic random method with grid dimensions of 150 × 200 meters. In these circular sample plots, each covering an area of 0.1 hectares, the sampling intensity was designated at 3.3%. The exact coordinates of these plots were transferred to a GPS device for precise location identification. Within each plot, in addition to recording topographical attributes such as elevation, slope, aspect, and of trees greater than 7.5 centimeters(DBH) essential data was gathered. Then, the [2–4] equations were used to calculate the forest tree canopies

CD= (C1×C2)×3.14/4 (2)

C1 and C2 represent large and small diameters (m), respectively.

The average canopy of a tree (m²) and the canopy percentage were calculated using Equations (3) and (4).

$$\stackrel{-}{\text{C}\text{C}}=\frac{\sum _{\text{i}=1}^{\text{N}}{\text{C}\text{C}}_{\text{i}}}{\text{n}}$$

CC%= $\frac{{\text{N}}_{\text{h}\text{a}} \times \stackrel{-}{\text{C}\text{C}}}{100}$ (4)

In these equations, $\stackrel{-}{CC}$ is the average canopy surface of a tree (m²), CC_i is the canopy surface of tree i (m²), n is the total number of measured trees, CC% is the canopy percentage, ${N}_{ha}\times \stackrel{-}{CC}$ is the canopy surface per hectare, and N_ha is the number per hectare.

The combined SSOM and SOM for feature selection and modeling represents a complex approach to data analysis and pattern recognition. This method uses a combination of the advantages of both learning paradigms to increase the accuracy, efficiency, and interpretation of developed models.

2.2. Feature selection with the SOM neural network

Feature selection is a process in machine learning in which the most related subset of features (variables and dimensions) are selected to facilitate the learning process, improve precision, and increase model interpretation ability. This is particularly crucial in datasets with a large number of features, where some can be unrelated or surplus, leading to over-fitting or increasing computational complexity (Kaur et al., 2021)

Feature selection using a complete SOM neural net is discussed in this section:

2.2.1. Initializing the SOM.

Random initialization is applied to the weight vector (codebook or reference vector) (Riese et al. 2020). SOM compares the input pattern of the data matrix x(t) with each SOM neuron weight vector in an unsupervised competitive learning method (Riese et al. 2020).

2.2.2. Finding the best matching unit.

The winner neuron is called the best matching unit, or winner takes all, because it has the smallest distance from the input pattern (${\text{d}}_{\text{j}}$). Several distance metrics can be used. Euclidean distance (d) is the most commonly used distance metric (Eq. 8) (Bigdeli et al., 2022).

${\text{d}}_{\text{j}}= {\left[\sum _{\text{i}=1}^{\text{m}}{\left({\text{x}}_{\text{i}}\left(\text{t}\right)- {\text{w}}_{\text{i}\text{j}}\right)}^{2}\right]}^{0.5}$ i = 1,…,m, j = 1,…,n (8)

Where w_ij indicates the weight value between input vector i and neuron j in the Kohonen layer, m represents the number of input variables, and n is the number of neurons in the Kohonen layer.

2.2.3. Neighborhood function calculation.

BMU weight vectors and neighbor neuron topologies are updated to reproduce input patterns. Gaussian functions are the most commonly used neighbor functions (Eq. 9) (Kahraman, 2012).

$${\text{N}}_{{\text{j}}^{\text{*}}\text{j} } \left(\text{t}\right) = {e}^{- \raisebox{1ex}{${‖{r}_{{ j}^{*}}-{ r}_{j}‖}^{2}$}\!\left/ \!\raisebox{-1ex}{$2{\delta }_{\left(t\right)}^{2}$}\right.}$$

Where ${\text{N}}_{{\text{j}}^{\text{*}}\text{j}}$(t) refers to the Gaussian neighborhood function of winner neuron ${\text{j}}^{\text{*}}$ in iteration t, ${{\delta }}^{2 }$represents the radius of the neighborhood in iteration t, and $‖{\text{r}}_{{ \text{j}}^{\text{*}}}-{ \text{r}}_{\text{j}} ‖$ is the distance between the winner neuron and the neighbor neuron.

2.2.4. Update weights

As a final step, the winner neuron's weight vector and selected neighbors are updated based on their distance from the winner neuron. Eq. (10) can be used to calculate the updated weights (Demirci et al.2013).

W_ij (t + 1) = w_ij (t)+ η(t)$\times {\text{N}}_{{\text{j}}^{\text{*}}\text{j} }\left(\text{t}\right)\times$(${\text{x}}_{\text{i}}\left(\text{t}\right)$- w_ij (t)) (10)

Where η (t) refers to the learning rate in iteration t, ${\text{N}}_{{\text{j}}^{\text{*}}\text{j}}$(t) is the neighborhood function of the winner neuron ${\text{j}}^{\text{*}}$ in iteration t, W_ij (t) is the weight between neuron i in the input layer and neuron j in the output layer at iteration t (Fig. 2).

In a SOM neural network, the weight of each feature is shown in separate planes, named the component plane. This includes a 2D network creation for each feature in which the value of each neuron corresponds to the weight of that neuron for the given feature. The study of variability and patterns in this component plane allows for recognizing the most distinction of each feature. Based on the above-mentioned analyses, features can be chosen that have lower quantization errors, show vivid patterns in the component plane, and have higher variance (Sharma et al., 2024)

2.3. Modeling with SSOM neural network

After feature selection, the SSOM neural network was retrained only using these features, and its performance was assessed to ensure that feature selection could improve model generalizability. There are several similarities between SSOM and SOM neural networks, including random initialization, selection of best matching unit (BMU), calculation of neighborhood function and learning rate. These two networks differ only in the stage of updating weights. SSOM neural networks have similar weights to the target variable, and competition occurs to estimate and reduce prediction errors. This is although the weights are the same dimension as the input data in the SOM neural network. And also, competition occurs over input data (Riese et al., 2020). By combining SOM and SSOM, BMU selection is done using SOM, SSOM links the BMU to an estimate (Keller et al., 2018). Figure 3 illustrates the framework for training SOM and SSOM neural networks. This algorithm was implemented using the MATLAB SOM toolbox provided by Helsinki University of Technology (http://www.cis.hut.fi/projects/somtoolbox).

2.4. Modeling the forest tree canopies using artificial neural networks

In the study, a dataset was prepared from 35 parcels (samples) with four input variables. The goal was to using SOM-selected variables to model Fagus orientalis, Carpinus betulus, Diospyros lotus canopies modeling as output variables, the variables were were elevation from sea level, slope, aspect, DBH as input variables. After data normalization, 70% and 30% of them were assigned to the train data set (24 parcels) and test data (11 parcels), respectively (Maier and Dandy 2010).

2.5. Data normalization

The data must be normalized according to Eq. (11) and placed between 0 and 1 to increase the speed and accuracy of neural network processes (Peshawa et al, 2014).

$$\text{Z}= \raisebox{1ex}{${X}_{i}- {X}_{min}$}\!\left/ \!\raisebox{-1ex}{${X}_{max}-{X}_{min}$}\right.$$

Where Z is the normalized data, X_i represents the used data, X_(max ) and X_min are respectively the maximum and minimum values for each variable.

After data preprocessing, the best artificial neural network structure can be adjusted with parameters such as the number of neurons in the Kohonen layer, The test stage was determined by the coefficient of determination (R2 ), adjusted R squared (R2 adj), root mean squared error (RMSE), percent root mean squared error (RMSE%), mean absolute error (MAE), bias (bias) and percentage bias (bias%).

3.1. Descriptive statistics of the forest tree canopies

Descriptive statistics of the forest tree canopies. As a result of analysing the forest tree canopies, it was determined that the canopy percentage, studied was 62.14%, with Fagus orientalis Lipsky, Carpinus betulus L., and Alnus subcordata C.A.M. representing 23.79%, 12.91%, and 7.24%, respectively. Descriptive statistics of forest tree canopies are shown in Table 2.

Table 2

Descriptive statistics of the forest tree canopies.
Characteristic	Fagus orientalis	Carpinus betulus	Diospyros lotus	Parrotia persica	Alnus subcordata	Other species	Total
Sum	185371	100572	56462	48716	82847	10136	484109
Frequency	8274	3884	3379	2999	2818	411	21767
(m²) $\stackrel{-}{\varvec{C}\varvec{C}}$	22.40	25.89	16.70	16.24	29.39	164.15	.2274.77
N/ha	106.21	49.85	43.37	38.49	36.17	5.24	279.42
CC (%)	23.79	12.91	7.24	6.25	10.63	1.269	62.14

The statistics of mean, standard deviation, standard error, percentage of statistical error, and coefficient of variation were calculated for the forest tree canopies in the sampling. The results showed an average canopies percentage of 62.14 with a 1.93% statistical error. Table 3 indicates other statistics results.

Table 3

The results of the forest tree canopies inventory statistics
Characteristic	Mean	standard deviation	standard error	– percentage of statistical error
canopy (%)	62.14	16.77	0.6	1.93

Figures 4 and 5 show the regional parcels in terms of canopy percentage, topographic conditions, and diameter at breast. Parcel 224 with a 60% slope, northward, and 785 m above sea level has the largest canopy percentages of Fagus Orientalis and Carpinus betulus trees. The lowest canopy percentage of Fagus Orientalis trees belongs to parcels 225 and 234 with a 35% slope, west-southern direction, and 300 m above sea level. Parcel 207 with a 25% slope, southward, and 150 m above sea level represents the lowest canopy percentages of Carpinus betulus trees. Moreover, the largest canopy percentages of Alnus subcordata, Diospyros lotus, and Parrotia persica trees are seen in parcel 207 with a 25% slope, southward, and 150 m above sea level. The same trees show the utmost canopy percentages in parcel 224 with a 60% slope, northward, and 785 m above sea level.

3.2. Feature selection with the SOM neural network

The affecting variables are grouped based on their correlation with the tree canopy (dependent variable) in the variable recognition and clustering process. This clustering process involves the review and comparison of maps related to each initial independent variable with the dependent variable. This allows a better understanding of the relationships and effects of these variables on tree canopies and performs grouping accordingly. In this research, the results of the SOM neural network are shown in a unified distance matrix (U-matrix) and component plane, with the U-matrix indicating a visual demonstration of distances between neurons (nodes) in the SOM neural network. These distances are generated by measuring the distance (often Euclidian) between each neighbor neuron on the map, followed by averaging. The U-matrix is a square matrix presenting the map, where each element is associated with one neuron, and the value in each circumstance represents the mean distance from its neighbors. Darker colors or lower values in the U-matrix denote clusters of similar data points while brighter colors or higher values indicate the distance between different clusters. Element planes are separate neuron weight images in the SOM neural network for each input data feature (component). In other words, a separate 2D plane is created for each variable or dimension in input data that shows the neuron weight value for the given feature throughout the map. The color bar legend next to each variable gives the range of each normalized variable. Color levels define the color meaning and similar colors present a direct relationship between variables. The component plane presents the distribution of a given feature on the map. Examining the surfaces of various components clarifies the feature with the major role in data point separation in the clusters (Fig. 6).

The component plane for each feature was obtained using distinct colors for each neuron. These planes were used to examine the interactions and correlations between variables. The unicolor corresponding sections indicate correlations between the features. The relationship level and correlation intensity between two variables can be studied based on color difference or similarity between the component planes. As shown in Fig. 6, the SOM neural network results reveal the significant role of the elevation, slope, aspect, and diameter at breast in the map structure formation. This result can be explained by their differences in the component plane, suggesting considerable changes throughout the map that are probably more important for the distinction between the canopies of different trees.

3.3. Modeling with SSOM neural network

After selecting major features affecting tree canopies with the SOM neural network, elevation, slope, aspect, and DBH variables were introduced to the SSOM neural network to estimate Fagus Orientalis, Carpinus betulus, Diospyros lotus, Parrotia persica, and Alnus subcordata tree canopies. Its performance was assessed using the evaluation criteria. According to the results of tree canopy estimation at the validation stage, the SSOM neural network estimated the Fagus Orientalis canopy with greater accuracy and fewer errors (R² = 0.9799, R²_adj = 0.9780, RMSE% = 0.251, Bias% = 0.083) (Table 4)

Table 4

The performance of SSOM neural networks in **estimat**ing the forest tree canopies (%) with four input variables
Estimation forest tree canopies (%)	Structure	stages	R²	R²adj	MAE	RMSE	RMSE%	Bias	Bias%
Fagus orientalis	3×4	training	0.9709	0.9866	0.0177	0.471	0.082	0.0177	0.05
Fagus orientalis	3×4	test	0.9768	0.9980	0.216	0.929	0.251	0.216	0.0813
Carpinus betulus	3×8	training	0.9613	0.9816	0.134	0.552	0.096	0.134	0.045
Carpinus betulus	3×8	test	0.9382	0.9669	0.648	1.185	0.320	0.648	0.244
Diospyros lotus	5×2	training	0.9704	0.9853	0.0843	0.481	0.083	0.0843	0.028
Diospyros lotus	5×2	test	0.8393	0.8159	0.594	1.315	2.635	0.594	0.223
Alnus subcordata	2×9	training	0.9831	0.9922	0.044	0.353	0.061	0.043	0.014
Alnus subcordata	2×9	test	0.8697	0.8473	0.324	1.138	2.507	0.324	0.122
Parrotia persica	4×4	training	0.9501	0.9768	0.182	0.833	0.145	0.182	0.061
Parrotia persica	4×4	test	0.84	0.89590	0.567	1.132	3.358	0.567	0.213

Figure 7 presents the observed and estimated values of Fagus Orientalis, Carpinus betulus, Diospyros lotus, Parrotia persica, and Alnus subcordata tree canopies with the SSOM neural network. Accordingly, this network estimated the values of tree canopies very similar to those of the observed values with a slight difference using the SSOM neural network. This network is particularly accurate for Fagus Orientalis trees due to the very close similarity between the observed and estimated values.

Forest statistics presents forest information on the quantitative and qualitative conditions of forest structural features, which underpin forest planning and policy (Fatehi et al., 2017). Traditional canopy estimation methods involve field surveys and measurements, which can be labor-intensive and time-consuming (Jing et al., 2023). In this research, therefore, a using a two-stage approach was used to analyze the relationships between variables and estimate the dependent variable. This approach includes selecting the most related variables to the SSOM neural network and then estimating the dependent variable (canopy) with this network.

Our results of the observed patterns reveal that changes in elevation, slope, and aspect are linked to changes in the diameter at breast. Similarly, Jin et al. (2008), Daghestani et al. (2017), Goodarzi et al. (2012), and Zhang et al. (2016) concluded that different environmental factors, including the elevation, slope, and aspect could play a critical role in the development and formation of plant communities and vegetation. These factors indirectly affect soil agents and cause major changes in the quantitative and qualitative features of trees, such as the diameter at breast and the canopy percentage. Wanga et al. (2015) found a linear relationship between topography and plant distribution and introduced the slope direction as one of the major factors affecting vegetation distribution. Hosseini et al. (2008) presented evidence that the tree canopy percentage increased initially and then decreased with increasing elevation. They observed the highest canopy in middle-height classes because of harvesting trees at low-height classes. According to Farhadur Rahman et al. (2022), topography variables (elevation, slope, and aspect) affect canopy height by indirect effects on solar radiation, disturbance, wind direction, and soil erosion.

Based on the results of previous studies, a nonlinear relationship apparently exists between independent variables and tree canopies. Using ANNs is one of the most fundamental methods to understand complex relationships. Since forest ecosystems are highly complex with interrelated variables, neural networks can better manage complex nonlinear relationships and learn large datasets than traditional statistical methods (Kucuk and Sevinc, 2023; Zhu et al., 2024). In recent years, neural networks have been increasingly used to estimate the canopy because of their ability to process complex and extract meaningful data. Their ability to generalize training data to unobserved samples has made them a useful tool for forest management (Bayat et al., 2020; Ghasemi et al., 2022; Tian et al., 2022). The literature indicates that ANNs as a novel modeling method are of interest for major reasons, including pattern recognition ability, good input-output relationship, less sensitivity to errors in input data, fully parallel processing, the need for less input data, ability to discover and predict between-variable relationships, simulation ability despite incomplete input data, and generalization and learning ability (Ercanlı et al., 2018; Ferraz Filho et al., 2018Reis et al., 2018; Bayat et al., 2020). Among the main advantages of the SOM neural network is its unique ability to recognize patterns and nonlinear relationships in data that groups similar data points with each other in an unsupervised style, which is particularly very useful for exploratory data. Their topology maintenance nature ensures that the spatial relationships of input data are maintained in the output map and present a meaningful demonstration of the data structure analysis (Kohonen, 1990; Kohonen, 2001). However, the SOM neural network is sensitive to the selection of parameters such as map size, learning rate, and neighborhood function. These parameters can largely affect the learning process and final output (Das et al., 2016). The SOM neural network is extensively used for representing and analyzing vast and multi-dimensional data (Kurasova and Molyt 2011). This has found applications in ecological sciences (Chon, 2011; Kim and Kwak, 2022), in clustering complex relationships among the variables of ecological and environmental datasets (Chon, 2011; Hong et al., 2020), in clustering forest land areas (Fujino and Yoshida, 2006), and in processing time-series data and image compression (Ley et al., 2011; Gao et al., 2014; Liu et al., 2018; Fratarcangeli et al., 2019; Lakshminarayanan, 2020; Liu et al., 2021).

In this study, the SSOM ANN performance was evaluated with topography and the diameter at breast variables to estimate tree canopies. This neural network could estimate more than 80–90% of tree canopies, with more considerable accuracy in estimating the Fagus Orientalis canopy than the other tree species (Carpinus betulus, Diospyros lotus., Alnus subcordata, and Parrotia persica trees). This result suggests the interrelation of the selected features, and SSOM can examine the relationships between input variables and the canopy. In previous studies, Eskandari (2020) concluded that the support vector machine (SVM) algorithm presented the highest accuracy for the regional canopy. Dabija et al. (2021) reported that the SVM algorithm performed better than the random forest in mapping tree canopies. Nazariani et al. (2022) presented evidence that the MLP neural network and the radial basis function showed more accuracy in tree canopy estimation than the k-nearest neighbors (KNN), SVM, and random forest algorithms. Liu et al. (2022) used multiple regression, ANN, KNN, and random forest based on remote sensing data to precisely estimate forest tree canopy height. They claimed that the ANN presented the best performance in upper-ground biomass estimation, with a root mean square error of 19.9%.

Our results demonstrated the expertise of the SOM neural network in feature selection. By reducing data dimensions and recognizing patterns, it can determine the features affecting tree canopies. The SOM neural network focuses on key factors to increase modeling efficiency by removing unnecessary data and improving prediction accuracy by ensuring the use of selected variables. The present results indicated the strong performance of the SSOM neural network in tree canopy estimation, particularly Fagus Orientalis trees, by utilizing SOM-selected features. It further highlighted the network's ability to use selected features for accurate and reliable estimations.

No funding was received to assist with the preparation of this manuscript

The authors declare that they have no conflict of interest

Author Contribution

A.B. and C.wrote the main manuscript text

Akın, A., Çilek, A., and Middel, A. 2024. Modelling tree canopy cover and evaluating the driving factors based on remotely sensed data and machine learning. Urban Forestry and Urban Greening, 86, 128035. https://doi.org/10.1016/j.ufug.2023.128035
Amini, S., Shataee Jouibary, S., Moayeri, M., Rahmani, R. 2022. Canopy gap delineation using UAV data in a Hyrcanian forest (Case study: Shastklateh Forest). Iranian Journal of Forest, 14(2): 135-154. doi: 10.22034/ijf.2022.301540.1801
Bayat, M., Bettinger, P., Heidari, S., Henareh Khalyani, A., Jourgholami, M., and Hamidi, S.K. 2020. Estimation of tree heights in an uneven-aged, mixed forest in northern Iran using artificial intelligence and empirical models. Forests, 11(324), 1-19.
Bigdeli, A., Maghsoudi, A., and Ghezelbash, R. 2022. Application of self-organizing map (SOM) and K-means clustering algorithms for portraying geochemical anomaly patterns in Moalleman district, NE Iran. Journal of Geochemical Exploration, 233, 106923. https://doi.org/10.1016/j.gexplo.2021.106923
Boncina, A.., and Cavlovic, J. 2009. Perspectives of forest management planning: Slovenian and Croatian experience. Croatian Journal of Forest Engineering, 30 (1), 77-87.
Chen, Y.; Ma, L.; Yu, D.; Feng, K.; Wang, X.; Song, J. 2022. Improving Leaf Area Index Retrieval Using Multi-Sensor Images and Stacking Learning in Subtropical Forests of China. Remote Sens. 14, 148.
Chon, T. 2011. Self-organizing maps applied to ecological sciences, Ecological Informatics, 6(1), 50-61.
Dabija, A., Kluczek, M., Zagajewski, B., Raczko, E., Kycko, M., Al-Sulttani, A.H., Tardà, A., Pineda, L. and Corbera, J., 2021, Comparison of Support Vector Machines and Random Forests for Corine Land Cover Mapping, Remote Sensing, 13(4), P. 777.
Daghestani, M., Zanganeh. M., and Taheri, M. 2017. Investigation on quantitative characteristic and soil properties of Juniperus excelsa M.Bieb stands in Tarom Zanjan. Journal of Forest Research and Development, 3(2), 175-190 (In Persian).
Das, G., Chattopadhyay, M., and Gupta, S. 2016. A comparison of self-organising maps and principal components analysis. International Journal of Market Research, 58, 07.
Ercanlı, I., Günlü, A., Şenyurt, M., and Keleş, S. 2018. Artificial neural network models predicting the leaf area index: a case study in pure even-aged Crimean pine forests from Turkey. Forest Ecosystems, 5 (29), 1-12.
Eskandari, S., 2020. Application of sentinel-2a data and pixel-based algorithms for land cover mapping in ilam-iran. Environmental Engineering and Management Journal, 19 (4), 655-666.
Fan, C.Y., Fan, P.S., Chan, T.Y., and Chang, S.H. 2012. Using hybrid data mining and machine learning clustering analysis to predict the turnover rate for technology professionals. Expert Systems with Applications, 39(10), 8844–8851.
Fatehi, P., Damm, A., Leiterer, R., Pir Bavaghar, M., Schaepman, M.E., and Kneubühler, M. 2017. Tree density and forest productivity in a heterogeneous Alpine Environment: insights from airborne laser scanning and imaging spectroscopy. Forests, 8(212), 1-21.
Ferraz Filho, A.C., Mola-Yudego, B., Ribeiro, A., Scolforo, J.R.S., Loos, R.A., and Scolforo, H.F. 2018. Height-diameter models for eucalyptus sp. plantations in Brazil. Cerne, 24, 9–17.
Fratarcangeli, C., Fanelli, G., Franceschini, S., De Sanctis, M., and Travaglini, A. 2019. Beyond the urban-rural gradient: Self-organizing map detects the nine landscape types of the city of Rome Urban For. Urban Green., 38, 354-370
Gao, Y., Feng, Z., Wang, Y., Liu, J., Li, S., and Zhu Y. 2014. Clustering urban multifunctional landscapes using the self-organizing feature map neural network model J. Urban Plann. Dev., 140 05014001
Ghasemi, M., Latifi, H., and Pourhashemi, M. A. 2022. Novel Method for Detecting and Delineating Coppice Trees in UAV Images to Monitor Tree Decline. Remote Sensing, 14(23), 5910. https://doi.org/10.3390/rs14235910
Ghomi-Avili, A., Akbarinia, M., Hosseini, S,, Talebian, M,, Knapp, H.D. 2020. Fuzzy and Boolean operation based modelling for evaluation of ecological capability in the Hyrcanian Forests. Journal of Forest Science, 66(4):170-184. doi: 10.17221/130/2019-JFS.
Goodarzi, Gh.R., Sagheb-Talebi, Kh., and Ahmadloo, F. 2012. The study of effective factors on Almond (Amygdalus scoparia Spach.) distribation in Markazi province. Iranian Journal of Forest, 4(3), 209-220.
Hong, S., Chon, T. S., Joo, G.J. 2020. Spatial distribution patterns of Eurasian otter (Lutra lutra) in association with environmental factors unravelled by machine learning and diffusion kernel method. J. Environ. Inform., 31, 130-141.
Imada, A. 2014. A Literature Review: Forest Management with Neural Network and Artificial Intelligence. Communications in Computer and Information Science 440:9-21.
Jin, X., Zhang, Y., Schaepman, M., Clevers, J., and Su, Z. 2008. Impact of elevation and aspect on the spatial distribution of vegetation in the Qilian mountain area with remote sensing data. The International Archives of the Photogrammetry. Remote Sensing and Spatial Information Sciences, 37, 1385-1390.
Jing L, Wei X, Song Q, Wang F. 2023. Research on Estimating Rice Canopy Height and LAI Based on LiDAR Data. Sensors (Basel). Oct 9;23(19):8334. doi: 10.3390/s23198334. PMID: 37837163; PMCID: PMC10575206.
Kahraman C. 2012. In: Kahraman CE, editor. Computational Intelligence Systems in Industrial Engineering. 1st ed. Vol. 6. Atlantis Press; pp. 295-315
Kanwisher, N., Khosla, M., and Dobs, K. 2023. Using artificial neural networks to ask ‘why’ questions of minds and brains. Trends in Neurosciences, 46(3), 240-254. https://doi.org/10.1016/j.tins.2022.12.008
Kaur, Amandeep and Guleria, Kalpna and Trivedi, Naresh. 2021. Feature Selection in Machine Learning: Methods and Comparison. 789-795. 10.1109/ICACITE51222.2021.9404623.
Keller, S., Maier, P.M., Riese, F.M., Norra, S., Holbach, A., Börsig, N., Wilhelms, A., Moldaenke, C., Zaake, A., and Hinz, S. 2018. Hyperspectral Data and Machine Learning for Estimating CDOM, Chlorophyll a, Diatoms, Green Algae, and Turbidity. International Journal of Environmental Research and Public Health, 15(9), 1-15.
Kim, H.G., Kwak, I. 2022. Evaluating the necessity of geographical locality for patterning biological integrity and its responses to multiple stressors in river systems. Ecological Indicator., 142 (2022), Article 109285.
Kohonen, T. 1990. The self-organizing map. Proceedings of the Institute of Electrical and Electronics Engineers, 78(9), 1464–1480.
Kohonen, T. 2001. Self-Organizing Maps, Berlin, Germany: Springer-Verlag.
Korhonen, L.; Hadi; Packalen, P.; Rautiainen, M. 2017. Comparison of Sentinel-2 and Landsat 8 in the estimation of boreal forest canopy cover and leaf area index. Remote Sensing of Environment, 195, 259–274
Kucuk, O., and Sevinc, V. 2023. Fire behavior prediction with artificial intelligence in thinned black pine (Pinus nigra Arnold) stand. Forest Ecology and Management, 529, 120707. https://doi.org/10.1016/j.foreco.2022.120707
Kurasova, O., and Molyte, A. 2011. Integration of the self-organizing map and neural gas with multi-dimensional scaling, Information Technology and Control, 40(1), pp. 12–20, 2011.
Lakshminarayanan, S. 2020. Application of self-organizing maps on time series data for identifying interpretable driving manoeuvres. Eur. Transp. Res. Rev., 12, 1-11
Liu, Y.C., and Liu, M. 2011. Research of fast SOM clustering for text information. Expert Systems with Application, 38(8), 9325–9333.
Liu, J., Shen, Z., Chen, L. 2018. Assessing how spatial variations of land use pattern affect water quality across a typical urbanized watershed in Beijing China. Landscape Urban Plann., 176 51-63
Liu, T., Sun, Y., Wang, C., Zhang, Y., Qiu, Z., Gong, W., and Duan, X. 2021. Unmanned aerial vehicle and artificial intelligence revolutionizing efficient and precision sustainable forest management. J. Clean. Prod., 311, 127546,
Liu, X., Su, Y., Hu, T., Yang, Q., Liu, B., Deng, Y., Tang, H., Tang, Z., Fang, J., and Guo, Q. 2022. Neural network guided interpolation for mapping canopy height of China's forests by integrating GEDI and ICESat-2 data. Remote Sensing of Environment, 269, 112844. https://doi.org/10.1016/j.rse.2021.112844
Liu, X., Feng, Y., Hu, T., Luo, Y., Zhao, X., Wu, J., Maeda, E. E., Ju, W., Liu, L., Guo, Q., and Su, Y. 2024. Enhancing ecosystem productivity and stability with increasing canopy structural complexity in global forests. Science Advances, 10(20). https://doi.org/10.1126/sciadv.adl1947
Maier, H. R., and Dandy, G. C. 2010. Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environmental Modelling and Software, 15, 101-124.
Nakamura A., Kitching R. L., Cao M., Creedy T. J., Fayle T. M., Freiberg M., Hewitt C. N., Itioka T., Koh L. P., Ma K., Malhi Y., Mitchell A., Novotny V., Ozanne C. M. P., Song L., Wang H., Ashton L. A., 2017. Forests and their canopies: Achievements and horizons in canopy science. Trends in Ecology and Evolution. 32, 438–451.
Nazariani, N., Fallah, A., Hamidi, S. K. and Varamesh, S. 2022. Estimation of quantitative characteristics of Zagros forests using data mining nonparametric algorithms (case study: Olad Ghobad Watershed, Koohdasht, Lorestan). Journal of Forest Research and Development, 8, 3.249-262.
Peshawa J. Muhammad Ali, Rezhna H. Faraj. 2014, Data Normalization and Standardization: A Technical Report. Machine Learning Technical Reports. 1(1): 1-6. https://docs.google.com/document/d/1x0A1nUz1WWtMCZb5oVzF0SVMY7a_58KQulqQVT8LaVA/edit#
Rahman, F., M.; Onoda, Y.; Kitajima, K. 2022.Forest Canopy Height Variation in Relation to Topography and Forest Types in Central Japan with LiDAR. For. Ecol. Manag. 503, 119792
Reis, L.P., Souza, A.L., Reis, P.C.M., Mazzei, L., Soares, C.P.B., Torres, C.M.M.E., Silva, L.F., Ruschel, A.R., and Rêgo, L.J.S. 2018. Estimation of mortality and survival of individual trees after harvesting wood using artiﬁcial neural networks in the amazon rain forest. Journal of Ecological Engineering, 112, 140–147.
Riese, F.M., Keller, S., and Hinz, S. 2020. Supervised and Semi-Supervised SelfOrganizing Maps for Regression and Classiﬁcation Focusing on Hyperspectral Data. Remote Sensing, 12(1), 1-23.
Sefidi, K., Marvi Mohajer, M.R. Mosandel, R and Copenheaver. C.A. 2011. Canopy gaps and regeneration in old-growth Oriental Fagus Orientalis (Fagus orientalis Lipsky) stands, northern Iran. Forest Ecology and Management, 262: 1094 – 1099.
Sharma, Krishna and Shivam, and Sharma, Ravi and Sharma, Nonita and Mishra, Mukesh. 2024. A Feature Selection Technique Using Self-Organizing Maps for Software Defect Prediction. 10.1007/978-981-99-4518-4_10.
Song, H.; Zhou, H.; Wang, H.; Ma, Y.; Zhang, Q.; Li, S. 2024. Retrieval of Tree Height Percentiles over Rugged Mountain Areas via Target Response Waveform of Satellite Lidar. Remote Sens. 16, 425.
Taefi Feijani, M., and Azadnejad, S. 2020. Introducing the improved Forest Canopy density (FCD) model for frequent assessment of Hyrcanian forest. Journal of Geospatial Information Technology. 8(2):40-58.
Tian, H., Zhu, J., He, X., Chen, X., Jian, Z., Li, G., Ou, Q., Li, Q., Huang, Q., Liu, G., Xiao, W., 2022. Using machine learning algorithms to estimate stand volume growth of Larix and Quercus forests based on national-scale Forest Inventory data in China. Forest Ecosystems, 9(100037), 1-11
Yu, D.J., Shen, H.B., and Yang, J.Y. 2011. SOMRuler: A Novel Interpretable Transmembrane Helices Predictor. IEEE Transactions on Nanobiosci, 10(2), 121–129.
Vafaei S., Maleknia R., Naghavi HA., Fathizadeh O. 2022. Estimation of Forest Canopy Using Remote Sensing and Geostatistics (Case Study: Marivan Baghan Forests). JOURNAL OF ENVIRONMENTAL SCIENCE AND TECHNOLOGY[Internet]. 2022;24(1 (116) ):71-82. Available from: https://sid.ir/paper/1065022/en
Vesanto, J.and Alhoniemi, E. 2000. Clustering of the self-organizing map. IEEE Trans. Neural Netw., 11 (2000), pp. 586-600, 10.1109/72.846731
Wanga B., Zhang G., Duan J. 2015. Relationship between topography and the distribution of understory vegetation in a Pinus massoniana forest in Southern China. International Soil and Water Conservation Research, 3: 291-304.
Zhang, Z.H., Hu, G., and Ni, J. 2016. Effects of topographical and edaphic factors on the distribution of plant communities in two subtropical karst forests, Southwestern China. Journal of Mountain Science, 10, 95–104.
Zhu, W., Li, Y., Luan, K., Qiu, Z., He, N., Zhu, X., and Zou, Z. 2024. Forest Canopy Height Retrieval and Analysis Using Random Forest Model with Multi-Source Remote Sensing Integration. Sustainability, 16(5), 1735. https://doi.org/10.3390/su16051735
Zörner, J.; Dymond, J.R.; Shepherd, J.D.; Wiser, S.K.; Jolly, B. 2018. LiDAR-Based Regional Inventory of Tall Trees—Wellington, New Zealand. Forests, 9, 702.

No competing interests reported.

Download PDF

Editorial decision: Revision requested
29 Jul, 2024
Reviews received at journal
29 Jul, 2024
Reviews received at journal
25 Jul, 2024
Reviews received at journal
24 Jul, 2024
Reviewers agreed at journal
23 Jul, 2024
Reviewers agreed at journal
22 Jul, 2024
Reviewers agreed at journal
20 Jul, 2024
Reviews received at journal
19 Jul, 2024
Reviewers agreed at journal
19 Jul, 2024
Reviewers invited by journal
08 Jul, 2024
Editor assigned by journal
05 Jul, 2024
Submission checks completed at journal
04 Jul, 2024
First submitted to journal
30 Jun, 2024

You are reading this latest preprint version

Feature selection and modeling forest tree canopies using supervised and unsupervised neural network self-organizing maps (case study: District 2, Kacha, Rasht, Iran)

Status:

Version 1

Abstract

Figures

1. Introduction

2. Material and methods

2.1. Sampling of forest tree canopies

2.2. Feature selection with the SOM neural network

2.2.1. Initializing the SOM.

2.2.2. Finding the best matching unit.

2.2.3. Neighborhood function calculation.

2.2.4. Update weights

2.3. Modeling with SSOM neural network

2.4. Modeling the forest tree canopies using artificial neural networks

2.5. Data normalization

3. Results

3.1. Descriptive statistics of the forest tree canopies

3.2. Feature selection with the SOM neural network

3.3. Modeling with SSOM neural network

4. Discussion

5. Conclusions

Declarations

Author Contribution

References

Additional Declarations

Status:

Version 1