Study sites
We selected Siberia as the study area because a single tree species larch is distributed over a large area, and the land use rarely changes for a long time. This area is suitable for the SEIB-DGVM that does not consider artificial changes in the land use. Siberia is also suitable for the first DA experiment at a regional scale, because the vegetation structure in Siberia is simple. We consider only two PFTs: deciduous needle-leaf tree and undergrowth.
The non-overlapping circles in Fig. 1 illustrate the study sites, which represent the averaged vegetation state within circles. We first calculated the domain-averaged ratios of each PFT in each circle using the global land cover dataset (GLC2000: European Commission, Joint Research Centre 2003): deciduous needle-leaf forest (“tree cover, needle-leaved, deciduous”), evergreen needle-leaf forest (“tree cover, needle-leaved evergreen”), and broad-leaf forest (“tree cover, broad-leaved, deciduous, closed” + “tree cover, broad-leaved, deciduous, open” + “tree cover, broad-leaved, evergreen”). Next, we selected study sites covered mostly (≥ 50% cover) by deciduous needle leaf forest. Sites with > 10% cover of evergreen needle-leaf forest or broad-leaf forest cover were excluded. No tree was simulated within light blue circles, or some climate forcing data values were missing from the area within blue circles in Fig. 1. These sites were excluded from experiments. We selected total 750 sites (black circles in Fig. 1).
The SEIB-DGVM
This study used a particle filter-based DA system with the SEIB-DGVM (Arakida et al. 2017). Refer to Arakida et al. (2017) for detailed descriptions. Here, we provide only a fundamental overview of Arakida et al. (2017) and additional configuration changes. The simulation started with bare ground. Forced by climate conditions, the model simulated the establishment, growth, and decay of individual trees within a virtual forest (Sato et al. 2007). Photosynthetically active radiation at the undergrowth was attenuated by tree; therefore, the amplitude of tree LAI affected undergrowth LAI. Carbon flux, water flux, heat flux, and vegetation structures (e.g. tree LAI, biomasses of individual organs [leaf, root, trunk], and soil carbon) were also simulated through vegetation succession. For most processes, the model time step was a day. For mortality, establishment, and some of the adjustments of crown states, the model time step was a year. We used model version 2.71 (Sato and Ise, 2012) with modifications described by Arakida et al. (2017). In addition, we corrected coding bugs and modified some parameters for the experiment presented here (Table 1).
Among 14 prescribed PFTs, deciduous needle-leaf trees (“tree”) and C3 grass (“undergrowth”) were selected for experiments. The major tree species in this area is larch (Ponomarev et al. 2016), and the forest undergrowth includes cowberry, grass, and shrubs (Ohta et al. 2014). These vegetations were included in the category C3 grass (in this study, simply “undergrowth”), following Sato et al. (2010).
Climate forcing data
Daily climate forcing data were generated using the monthly Climate Research Unit observation-based dataset (CRU-TS3.23 0.5° monthly climate time series: University of East Anglia Climatic Research Unit et al. 2015) and the daily data at the spatial resolution of T62 with a Gaussian grid (about 1.9°) from the National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis (Kalnay et al. 1996). Interannual differences were not considered in this study; a yearlong climate forcing data averaged between 2003 and 2014 was used to simulate the averaged vegetation structures and functions during this time period (Table 1).
First, we interpolated bilinearly each NCEP/NCAR and CRU data points into the center of each study site. We supplementary used the monthly CRU data so that the monthly mean air temperature and cloudiness were identical to those of the CRU. Likewise, precipitation and specific humidity were rescaled so that the monthly totals matched those of the CRU. In contrast, we used the NCEP/NCAR soil temperature and wind velocity data without scaling. Finally, the climate observations from 2003 through 2014 were averaged for each study site to provide daily climate forcing data.
Particle filter-based data assimilation (DA) and observational data
Arakida et al. (2017) demonstrated that a well-known particle filter procedure, sequential importance resampling (SIR: Gordon et al. 1993), performed well as a DA method for the SEIB-DGVM. Particle filtering is a Bayesian process that remains robust when an ecological process such as phenology exhibits non-linearity; e.g., the state changes suddenly at the beginning and end of the leaf-bearing season (Arakida et al. 2017; Ise et al. 2018). In addition, particle filters can handle phase space variability in an individual-based model, such as occasional establishment or death of a tree. As proposed by Kitagawa (1998), the probability densities of the state variables and the model parameters were denoted by parallel simulations (particles), and the distributions were updated sequentially when the observational data were assimilated. The experiments in this study comprised three steps: (i) initial perturbation, (ii) spin-up, and (iii) DA. We used the particle, initial perturbation, and re-sampling perturbation sizes described by Arakida et al. (2017). Although the radius of the study sites was 10 km in Arakida et al. (2017), it was increased to 30 km (Table 1) to reduce the number of sites and therefore, the computational cost.
First, we created 8,000 random combinations of parameters from the initial perturbation ranges provided by Arakida et al. (2017) for the maximum photosynthetic rate (Pmax; µmol CO2 m− 2 s− 1) and the dormancy start date (Dor; day of year [DOY]) for tree and undergrowth. The initial perturbation ranges were as follows: Pmax for tree = [0, 60], Pmax for undergrowth = [0, 15], Dor for tree = [200, 300], Dor for undergrowth = [200, 300]. Here, [a, b] denotes the uniform distribution for an interval between a and b. Next, we performed 8,000 parallel simulations with these parameter sets over 100 years, in which the averaged forcing climate data for the period of 2003–2014 were used repeatedly for spin up. During the spin-up period, the perturbed parameters led to different leaf season, the amplitude of LAI, and forest states for each particle.
A yearlong time series of satellite-observed LAI is prepared for each site; we selected the MODIS LAI product of MCD15A3 with a 4-day interval (Knyazikhin et al. 1999) using the quality-control procedure described by Arakida et al. (2017). The observation error standard deviations were assigned to each pixel of MCD15A3 LAI at the original resolution of 1 km. The 4-day-interval LAI data and its error standard deviations for each study site were calculated using data for the same DOY in the period of 2003–2014 as the median for each 30-km radius (Table 1). We aggregated data within the 30-km radius. The aggregation helped complement the lack of data due to the strict quality control; however, aggregated observations with insufficient data tended to produce erroneous data. To exclude such erroneous data, we used only observations calculated from one-eighth or more of the quality-controlled set in each circle at each time step (Table 1). In addition, LAI data lower than 0.5 were not assimilated, following the procedure of Arakida et al. (2017). Observations with small error standard deviations made the DA system unstable; therefore, standard deviations lower than 0.5 were fixed at 0.5 (Table 1).
Finally, DA was performed for four years repeatedly using the yearlong time series of the satellite-observed LAI prepared in the third step. When an observation was assimilated, the likelihood was calculated for each particle, and the posterior distribution of the particles was updated with resampling using likelihood as the weighting factor. We used the resampling perturbation size as described by Arakida et al. (2017) to avoid particle degeneracy.
Assessment of the DA results
To assess the impact of DA, an experiment without DA (“NODA” here after) was also performed with the parameter sets which were used for the initial perturbations. The experiments with DA (“TEST” here after) at the fourth year of DA was compared with NODA at the same period (i.e., the 104th year of the simulation). TEST for model parameters was compared with the medians and ranges of the initial perturbations. NODA and TEST was compared for total LAI (tree + undergrowth), tree LAI, GPP, and above ground biomass. These results were also compared with the existing studies. Other variables were only compared with the observed LAI to explore the extent to which DA affected the unobserved state variables.
Data used for cross-comparisons
In-situ observation was not widely available in Siberia. Hence, we compared the results with those of existing studies to investigate the characteristics of the DA system. The estimated LAI was compared with assimilated observation data (MODIS LAI: Knyazikhin et al. 1999) to confirm that the DA system worked properly. Other estimated variables were compared with those of existing studies: gross primary production (GPP) of FLUXCOM (Tramontana et al. 2016; Jung et al. 2017), tree LAI (Delbart et al. 2005; Kobayashi et al. 2010), and aboveground biomass (Liu et al. 2015). Supporting information includes more details about the existing studies.
To compare the results of this study with those of the earlier works, we consider differences of the spatiotemporal resolution. As for the LAI, we used the annual maximum LAI for cross-comparisons. The spatial resolution of 30 km was identical for the observed LAI used in the DA experiment and the estimated LAI from the DA experiment, but the temporal resolution of the observation was different (i.e. 4 days). Both observed and estimated LAI reached the maximum value and did not greatly change in mid-summer. If the observation was successfully assimilated, the amplitude of LAI for the DA should have been close to that of the observed amplitude.
As for GPP, the spatial resolution for GPP of FLUXCOM was 0.5°, and the temporal resolution was a year. We first averaged the FLUXCOM data from 2003 to 2013 and multiplied it by 365 to calculate the annual total. Next, FLUXCOM data were interpolated bilinearly to the center of each study site. The interpolated GPP of FLUXCOM were compared to the annual total GPP estimated in this study.
The aboveground biomass of Liu et al. (2015), 0.25°annual mean data, was averaged from 2003 to 2012. Next, it was spatially interpolated by the same procedure used for interpolating FLUXCOM parameters, and cross-compared with the annual mean aboveground biomass estimated in this study.
The spatial resolution for tree LAI of Kobayashi et al. (2010) was 1/112 of a degree, and the temporal resolution was 10 days. Kobayashi et al. (2010) had a higher spatial resolution than the one in this study. We first calculated the average LAI from 2003 to 2013 for each grid/DOY combination, and subsequently calculated the annual maximum for each grid. Finally, we calculated the median of the maximum at each study site (i.e., within a 30-km radius) for comparison with the annual maximum tree LAI that we estimated.
The results for the lower latitudes south of 60°N were not stable: this will be discussed as “limitations of this study” in the discussion section. We therefore used results for latitudes higher than 60°N to construct a scatter plot of the relationships between estimates in this study and those of previous reports. We used Pearson's correlation coefficient (Cor) to examine relationships in the data collected for the same area.