Shade Canopy Density Variables (SCDV) in the scientific literature on cocoa agroforestry
The extent of the use of N, G and %C in the scientific literature on cocoa agroforestry was explored using a natural language processing algorithm to “read-and-search” within the Smithsonian Institute´s specialized bibliographic data base (https://www.zotero.org/groups/2785774/cocoa_library/library). The following set of “keywords” was used: density, trees per hectare, trees/ha, trees ha-1, individuals/ha, individuals ha-1, stems ha-1, stems/ha, plants per hectare, plants/ha, plants ha-1, basal area, m2 ha-1, m2/ha, canopy cover, shade percent, percentage of shade, tree cover, shade cover, percent cover, percentage of cover. Sentences containing these keywords were tagged and checked manually to discard off topics (e.g. the keyword density could be used to describe soil bulk density and not the density of shade trees in the canopy). Then, as various keywords refer to the same SCDV, we renamed synonyms according to the following rules: 1) density = trees per hectare = trees/ha = trees ha-1 = individuals/ha = individuals ha-1 = stems ha-1 = stems/ha = plants per hectare = plants/ha = plants ha-1, 2) canopy cover = shade percent = percentage of shade = tree cover = shade cover = percent cover = percentage of cover, and 3) basal area on its own (no synonyms). We then created a clean database for statistical analysis.
The frequency and relevance of the use of each keyword in the database (corpus) was evaluated with four indexes used in natural language processing (Robertson 2004), modified to suit the needs of this study:
- Term frequency {TF(t)] in a document is the number of times a keyword (t) is mentioned in a document [F(t)] expressed as a fraction of the total number of words in the document. However, since our interest is to compare between keywords in this study TF(t) is expressed as a fraction of the total number of times all search terms appear in the corpus [Q = F(density) + F(canopy cover) + F(basal area)]. With this definition TF(t) = F(t)/Q.
- Document Frequency (DF) is the number of documents [N(t)] in which a keyword (t) is present in the corpus (N). With this definition DF(t) = N(t)/N.
- Inverse Document Frequency of keyword (t) is the ratio between the number of documents in the corpus and the document frequency of a keyword (t) and is calculated as [IDF(t) = N/N(t)]. IDF(t) is usually expressed in logarithmic form to reduce the scale of the ratio when analyzing very large databases. In this study, the corpus is small, and consequently we expressed IDF(t) in its natural scale.
- Term Frequency Inverse Document Frequency integrates the number of times a search term appears in a document and the number of documents the search term appears in. This index is calculated as TF-IDH (t) = TF(t) * IDF(t).
The algorithm was coded in Python (Version 3.8). The 'SciPDF Parser' library was used to parse PDF files (https://github.com/titipata/scipdf_parser). This library utilizes GROBID, a machine-learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents, with a particular focus on technical and scientific publications. Pandas, a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool (https://pandas.pydata.org/). The classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and wrappers for industrial-strength NLP libraries used to process text information were implemented with the NLTK (Natural Language Toolkit) library (https://www.nltk.org/).
Cross-compensation effects between N, G, R and %C
To explore the cross-compensation effects between N, G and %C, and hence their limitations as good SCDV for describing the shading conditions in the understory, we calculated shade tree density and percent canopy cover for simple, mono-specific, even-sized, mono-layered shade canopies of Cordia alliodora for various stem diameters (d, in cm, 10, 20, 30, 40 and 50), various ratios (R = k/d) between crown diameter (k = 5 and 10, in meters) and tree stem diameter (d = 0.20, 0.25, 0.30 and 0.35) and G stocking targets (5, 10, 15, 20, 25 and 30 m2 ha-1). Low values of R represent small-crowned trees and high values represent big-crowned trees; the value of G represents “stocking” level of the shade tree stand.
The relationships between N, G, R, and %C were explored using equations 1 and 2:
N = (40000*G)/(π*d2) Equation 1
%C = 100*G*R2 Equation 2
In this analysis crown opacity was set a 0.5.
Shading under canopy typologies with same N, G, R, p and %C
The software ShadeMotion (Somarriba et al 2022) was used to quantify the effects of tree height, leaf fall patterns and spatial planting configurations of trees of different stem diameter on the level and spatiotemporal pattern of shading of 24 shade canopies with same N, G and %C. The software calculates the azimuth and solar elevation angle for one full year (365 days), every day from 9 am until 3 pm, every hour (i.e. a total of 365*7 = 2555 simulation instants per year). At every simulation instant, the software uses the two solar angles, the coordinate position (x,y) as well as the dimensions and the crown characteristics of every tree in the plot to “project” the shadow of every tree at ground level. Trees don’t grow nor change in numbers during the simulation year. Light passage through the crown is modeled by the product between crown opacity (p) and the percent of foliage retained monthly by the trees (f). The product p*f is used to estimate the fraction of shaded area in the shadow cast by every tree on any day of each month. Each shadow is described by a set of mathematical inequalities and a contour-tracing algorithm (Moore’s neighborhood) used to determine both the contour of the shadow and the set of coordinate pairs (grid cells) inside the shadow. A random number generator allocates the condition of shade (1) or no shade (0) to each grid cell inside the shadow according to p*f. The number of simulation events that every grid cell is under shade is recorded. More than one tree may cast shade on a grid cell at a simulation instant (shade overlap). At the end of the simulation period (one year in this case) a file is generated containing all plot coordinate pairs (grid cells) and their cumulative number of simulation instants with shade. To avoid edge effects, a central sampling area of 57 m x 56 m (i.e. 3192 grid cells, 1 m2 each) was used for statistical analysis. In this article, shading is defined as the cumulative number of shade-hours per grid cell, per year.
The construction of the 24 shade canopies followed the protocol described below (Fig. 1 and in S1_SupplementaryMaterial):
- Twenty-four simple, even-sized, mono-layered shade canopies of Cordia alliodora were constructed by combining different tree stem diameters (two levels: d = 20 and 40 cm, equivalent to crown diameters, k = 5 m and 10 m, respectively), leaf fall (two levels: yes/no), tree height (two levels, normal/taller) and planting spatial patterns (three levels: square, alley, random). All 24 canopy typologies had the same basal area (G = 10 m2 ha-1), same crown diameter to stem diameter ratio (R = k/d = 0.25), same percent cover (%C = 31%), and same crown opacity (p = 0.5). Trees were planted in a hypothetical plot 1.44 ha plot (120 m x 120 m, 14400 coordinate pairs or grid cells, 1 m2 each), located at the equator (latitude = 0) on a flat, horizontal terrain.
- Normal total tree heights for d = 20 (16 m) and d = 40 cm (27 m) were estimated using an allometric equation for C. alliodora (Somarriba and Beer 1987); the height of taller trees was set as 1.5 times normal height (changing only trunk height, but neither crown height nor crown diameter). Crown heights were set at 6 and 8 m for d = 20 and d = 40 cm, respectively.
- C. alliodora trees lose their leaves during the dry season in Central America according to the following pattern: trees have full foliage between June and January but lose foliage between February and May at the following monthly rates: 25% loss in February, 50% loss in March, 80% loss in April, 50 % in May and 25 % in June.
- Density (318 and 79 trees ha-1 for d= 20 and d= 40 cm, respectively) and square spacing (5.6 x 5.6 m and 11.25 x 11.25 m for d = 20 and d= 40 cm, respectively) were estimated for each stem diameter using G, R, and p. The width of the alley was set at twice the spacing of the square planting. To maintain the same N and G used in square and random planting, in alleys the spacing between trees within the line was set at half the spacing in square planting.
- The effect of the North-South (N-S) and East-West (E-W) orientation of the alleys on shading was evaluated only for trees with d = 40 cm, normal height and considering monthly leaf fall using a one-way ANOVA; cumulative frequency distributions of shade levels per grid cell were compared using a Kolmogorov-Smirnov test (NPAR1WAY procedure in SAS version 9.4).
- Shading between canopy typologies was compared using a balanced factorial analysis with heterogeneous variances (ANOVA). In this analysis we: 1) first assessed homogeneity of variance in a scatter diagram of residuals versus predicted and with a q-q plot; 2) since variances were found to be non-homogeneous [p < 0.0001; the model of heterogeneous variance performed better as indicated by its lower Akaike Information Criterion (AIC) and Bayesian Information Criteria (BIC)], a cluster analysis was used (using variances as clustering variable and Euclidean distances as measure of dissimilarity) to identify groups of grid cells with similar variances; 3) a linear mixed model (main effects plus their double, triple and quadruple interactions) was fitted using groups with similar variance as blocks (random effects); 4) estimated marginal means (also known as least-squares means) for factor combinations were computed with the R-emmeans package (version 1.8.3, https://cran.r-project.org/web/packages/emmeans/emmeans.pdf); and 5) pair wise comparisons between means are represented with letters following the algorithm proposed by Piepho (2004).
The model fitted has the form:
S = μ + d + t + l + h + d*t + d*l + d*h + t*l + t*h + l*h + d*t*l + d*t*h + d*l*h + t*l*h + d*t*l*h + ξ
With S = shade, μ = overall mean, d = tree stem diameter, t = spatial planting arrangement, l = leaf fall, h = tree height, and ξ = error
Published research supports the selection of the ranges used in the construction of the typologies (see S1_SupplementaryMaterial)