1. Modelling Protocol
We follow the protocol from the Inter-Sectoral Impact Model Intercomparison Project phase 3a (ISIMIP3a), using reanalysis and de-trended climate data to attribute impacts to historical climate change32. We use seven fire models in this study, and each model performs a transient historical climate run with GSWP3-W5E5 reanalysis climate (“Factual”)32. ISIMIP3a also provides a detrended version of the GSWP3-W5E5 dataset where the long-term climate trend has been removed29 that was used to drive the models in the counterclim experiment, along with constant atmospheric CO2 and CH4 at 1901 levels (“Counterfactual”)29. We drive the models using 0.5° resolution daily climate (near-surface relative humidity, near-surface specific humidity, daily maximum temperature, daily mean temperature, daily minimum temperature, longwave downwelling radiation, near-surface wind magnitude, relative humidity, shortwave downwelling radiation, surface air pressure and total precipitation)44–46. Lightning was prescribed as a monthly climatology from LIS/OTD data47, and atmospheric CO2 concentration was prescribed annually48. Land use is derived from interpolating LUH2 and the Monfreda datasets49, while population density, used by some models for ignitions and/or fire suppression, was provided annually50.
Models were spun-up using constant CO2 (284.73 ppm) and fixed land use (1850), before moving to the transition period between spin-up and start of the simulations (1850–1900) where constant CO2, a transient climate and varying DHF were applied. Models provide their own test for equilibrium, as described in their documentation papers (see below).
2. Fire Models
In this study we use simulations from fire models taking part in the ISIMP3a fire sector, which is a continuation of the Fire Model Intercomparison Project (FireMIP)51,52, plus VISIT which contributed simulations for ISIMIP3a's biome sector. This so far incorporates simulations from seven models, as follows:
CLASSIC
The Canadian Land Surface Scheme Including Biogeochemical Cycles (CLASSIC) simulates fluxes of energy, momentum, water, carbon, nitrogen, and fire53. Land use is temporally varying where the fractional coverage of natural vegetation in the grid cell changes as the agricultural fractional coverage changes as derived from the Land-Use Harmonisation (LUH) project54. The make-up of PFTs within the natural vegetation fraction is derived from the European Space Agency (ESA) Climate Change Initiative (CCI) land cover map55. CLASSIC models both natural fires (caused by lightning) and anthropogenic fires (caused by human ignition) using fire ignition and extinguishing probabilities to calculate fire occurrence and area burned for each grid cell, assuming an elliptical shape for fire area. Ignition probabilities are based on above-ground biomass which is available as a fuel source, combustibility of the fuel which is based on soil moisture in the litter and vegetation root zone, and the presence of an ignition source (based on lightning strikes and population density). Extinguishing probability is based on population density.
JULES-INFERNO
The Earth System configuration of the Joint UK Land Environment Simulator (JULES-ES)56 is coupled to the INFERNO fire model57. JULES-ES includes dynamic vegetation from TRIFFID (Top-down Representation of Interactive Foliage and Flora Including Dynamics)58 and incorporates vegetation changes from fire and land use change59. Fire and background mortality varies by Plant Functional Type (PFT), of which there are nine natural PFTs (tropical and temperate broadleaf evergreen trees, broadleaf deciduous trees, broadleaf and evergreen needleleaf, broadleaf and evergreen shrubs and C3 and C4 grasses) and four crop and pasture (for C3 and C4) PFTs. Within each grid box, the fractional coverage of each PFT is determined by a competition hierarchy defined by TRIFFID59, and is limited to just C3 and C4 crop/pasture in the prescribed agricultural regions.
LPJ-GUESS-SPITFIRE
The LPJ-GUESS dynamic global vegetation model60,61 was coupled to the SPITFIRE global fire model62 to form LPJ-GUESS-SPITFIRE51,63. The version of LPJ-GUESS used here has been updated to include nitrogen cycling and limitation on photosynthesis51,61. Vegetation in LPJ-GUESS is represented by ten tree PFTs (for natural landcover), two grass PFTs (for natural and pasture landcover) and 5 crop PFTs (for cropland landcover). On natural areas, woody vegetation composition and structure is the result of competition for light and water between cohorts of identical tree PFTs individuals. Establishment, mortality and disturbance (including fires by simulated by SPITFIRE) occur stochastically, so replicate patches (here 20) are averaged to produce a grid cell mean. The SPITFIRE implementation simulates fires in pasture and natural land but no cropland burning. The version here also includes an increase of maximum fire duration to 12 hours complemented by a population density dependent reduction in fire duration64. To account for landscape fragmentation by cropland and its resulting effect on fires in adjacent pastures and natural lands, fire size was scaled by the factor e− 4*gridcell_crop_fraction.
LPJ-GUESS-SIMFIRE-BLAZE
The underlying dynamic vegetation model is the exact same version as described in LPJ-GUESS-SPITFIRE above. SIMFIRE-BLAZE is a two stage fire model implementation with SIMFIRE (SIMple FIRE model)65 computing burned area based on human population density, maximum annual fAPAR, the maximum annual Nesterov index, and a set of fire biomes. Here, a climatology is used to derive an annual distribution of daily fire probability. The combustion model BLAZE (BLAZe induced biosphere-atmosphere flux Estimator)51 forms the second stage. In case of a successful stochastic ignition it computes fire-line intensity from available fuels and fire weather, the mortality of PFTs depending on fire-line intensity and allometry, and, eventually, the C and N turnover from the biomass pools (life and litter) to the atmosphere and between life and litter.
ORCHIDEE-MICT-SPITFIRE
ORCHIDEE is a process-based ecosystem model developed for simulating carbon, water and energy fluxes in ecosystems, from site level to global scale66–68. ORCHIDEE-MICT is a version of ORCHIDEE with improved interactions between soil carbon, soil temperature and hydrology, and a fire module69. For the ISIMIP3a simulations, a version based on ORCHIDEE-MICT with further detailed representation of grassland management70,71 and northern peatland72,73 was used. The fire module of ORCHIDEE-MICT is derived from the SPITFIRE model and was described in detailed by74,75. The SPITFIRE fire module explicitly simulates open vegetation fires, including fuel load impacts on fire propagation and fire impacts on grassland and woodland dynamics. Fire and background mortality varies by Plant Functional Type (PFT). For the ISIMIP3a simulations, nine natural PFTs (tropical broadleaf evergreen forest, tropical broadleaf raingreen forest, temperate needleleaf evergreen forest, temperate broadleaf evergreen forest, temperate broadleaf summergreen forest, boreal needleleaf evergreen forest, boreal broadleaf summergreen forest, boreal needleleaf deciduous forest, and C3 and C4 natural grasses) and two grazing land PFTs (for C3 and C4 grazing land) can be burned, while cropland PFTs (C3 and C4 cropland) and norther peatland PFT (C3) were excluded from fire impacts.
SSiB4: SSiB4/TRIFFID-Fire consists of three components: a land surface model (Simplified Simple Biosphere Model; SSiB), a dynamic vegetation model (the Top-down Representation of Interactive Foliage and Flora Including Dynamics Model; TRIFFID), and a fire model of intermediate complexity, CLM-Li. The modelled plant functional types (PFTs) in SSiB4/TRIFFID include broadleaf evergreen trees, needleleaf evergreen trees, broadleaf deciduous trees, C3 grasses, C4 grasses, shrubs, and tundra shrubs. The vegetation competition in TRIFFID is based on the Lotka–Volterra equation and has been updated to represent the coexistence of grasses and shrubs. Fire impact on the terrestrial carbon cycle and vegetation fraction are updated every 10 days. SSiB4/TRIFFID-Fire is shown to reproduce the burned area and fire emissions globally from interannual to seasonal scales76.
VISIT
The Vegetation Integrative Simulator for Trace gases is coupled to a semi-empirical fire model77, mainly for estimating emissions from biomass burning (e.g., CO2, CO, CH4, and black carbon). Fuel accumulation and its moisture content are estimated by a carbon cycle and hydrological schemes of the VISIT. The fire component is briefly described in78,79, where the original fire scheme is modified to simulate seasonal change in burned area by calculating fire season length at monthly time step. Burned area is calculated once a year (without ignition), and then weighted by (monthly fire season length) / (annual fire season length) at each month. A prescribed biome map (e.g., tropical rainforest and savanna) is used in the model, and therefore fire does not affect vegetation composition in a dynamic manner but influences biogeochemical processes such as net carbon budget. This is likely to affect seasonality and the simulation of fuel, and may explain the poorer benchmarking results compared to other models in this study. The VISIT results are only reported to the ISIMIP biome sector, but have been included in this study for comparison with other models reporting burned area.
3. Observations
We use two global burned area datasets (GFED540 and FireCCI5.180) as reference data for the analysis. Both datasets are MODIS-based but have fundamental differences in how they were developed. FireCCI maps fires at 250m resolution using the spectral information from MODIS in combination with the thermal anomalies81. However, it has been reported that global burned area products heavily underestimate the total magnitude of burned area82. GFED5 is based on the MODIS global burned area product MCD64A1, but the omission and commission errors were corrected by dynamic adjustment factors which were estimated based on Landsat or Sentinel-2 reference burned area maps40. As such, global annual burned area in GFED5 is about double the burned area in FireCCI5.1 (Fig. S7). We use the burned area data from 2003 onwards, the first full year where both Terra and Aqua satellites are available83.
4. Relative Anomaly and Probability Ratio
As shown in Figure S7, the spread in the absolute burned area is large amongst the observations (GFED5 has ~ 1.5 to 2 times more burned area than FireCCI5.1), models (350 to 750 Mha year− 1), and regions (0 to 127 Mha year− 1). Attributing changes in absolute burned area, therefore, has considerable uncertainty. We overcome this problem by focusing on relative changes in burned area, and use normalised relative anomaly (RA) rather than absolute burned area for our analysis. Using normalised relative anomalies also facilitates inter-regional comparison .
For the Probability Density Functions (PDFs) in Fig. 1, we calculate RA compared to the counterfactual mean burned area (\(\overline{B{A}_{Cfact}}\)). Subtracting the mean removes systematic biases, and dividing by the mean resolves the interannual variability. By comparing both factual and counterfactual experiments to the counterfactual mean, we are looking at the fractional increase in burned area driven by climate change compared to a baseline without climate change.
$${{R}{A}}_{{B}{A}}= \frac{{B}{A}- \stackrel{-}{{{B}{A}}_{{C}.{f}{a}{c}{t}}}}{ \stackrel{-}{{{B}{A}}_{{C}.{f}{a}{c}{t}}}}$$
Equation 1: Burned Area Relative Anomaly.
Where BA is the time series of the monthly (global / regional) total burned area for factual (orange in Fig. 1) or counterfactual (blue in Fig. 1).
For evaluation in Extended Data Fig. 1, we compare RA for factual simulations and observations, with anomalies calculated from their respective means (\(\stackrel{-}{{\text{B}\text{A}}_{fact}}\) & \(\stackrel{-}{{\text{B}\text{A}}_{obs}}\)).
$${{R}{A}}_{{B}{A}}= \frac{{B}{A}- \stackrel{-}{{B}{A}}}{ \stackrel{-}{{B}{A}}}$$
Equation 2: Burned Area Relative Anomaly for the validation.
When constructing time series in Fig. 3, we are interested in the difference in burned area between the factual and counterfactual experiments through time. Interannual burned area is variable84,85 and the counterfactual simulations have a negative trend in most regions. Therefore, we calculate this difference over a 21-year running mean centred on year Y, and treat the counterfactual as our baseline burned area for normalisation. By subtracting a mean, we reduce the effect of interannual variability and by choosing for a centred 21-year window mean, as opposed to the entire time series mean, we remove most of its long-term trend:
$${{R}{A}}_{{B}{A}}= \frac{\stackrel{-}{{{B}{A}}_{{f}{a}{c}{t}21{y}{e}{a}{r}}}- \stackrel{-}{{{B}{A}}_{{C}.{f}{a}{c}{t}21{y}{e}{a}{r}}}}{ \stackrel{-}{{{B}{A}}_{{C}.{f}{a}{c}{t}21{y}{e}{a}{r}}}}$$
Equation 3: Moving Window Relative Anomaly.
Probability Ratio (PR) is the ratio of the amount of months, over the same time period, the factual and counterfactual simulations are above the counterfactual monthly mean burned area.
$${P}{R}= \frac{\sum {({B}{A}}_{{f}{a}{c}{t}}>\stackrel{-}{{{B}{A}}_{{C}.{f}{a}{c}{t}}})}{\sum {({B}{A}}_{{C}.{f}{a}{c}{t}}>\stackrel{-}{{{B}{A}}_{{C}.{f}{a}{c}{t}}})}$$
Equation 4: Probability Ratio.
5. Evaluation Methods
We evaluated the factual burned area simulations compared to observations for the seven fire models to assess the ensemble's ability to represent changes in burned area in each region. We constrain the model and observation data to 2003–2019, the common period of simulations and observations.
For general evaluation, we used the Normalised Mean Error (NME) metric used in previous FireMIP benchmarking51,52,86 for spatial comparisons86 to quantify model performance in simulating spatial patterns of averaged burned area (Table S4), i.e. taking the mean over the temporal domain. NME1 calculates the area-weighted absolute mean difference between observations and simulations, normalised by the mean variation in the observations. We also report NME Step 3 (NME3) from the FireMIP benchmarking framework. FireMIP recommends using NME1 and NME3 as they are appropriate for comparing non-normal variables such as burned area. This approach removes mean bias and absolute variance before comparison and assesses the model’s ability to simulate regions of high and low burned area, and is therefore appropriate here as we are interested in relative anomalies.
We also add a ‘temporal NME’ metric (NMEt), where we take the sum over the spatial domain (per region) and calculate the NME on the resulting time series. This allows us to quantify the model’s performance in representing the observed regional variations over time, which we assume indicates a model’s ability to represent the influence of changing climate conditions on burned area. We asses each of the IPCC regions separately to match reported results and to inform model weighing (see below).
The lower the score for NME’s, the closer the match between observation and simulation. For the evaluation, we also use a randomly-resampled null model, which compares a dataset generated by sampling the observations without replacement, with the observations59. As the resultant randomly-resampled score depends on sampling order, we repeat this procedure 1000 times to generate a "randomly-resampled" distribution. A model is better/worse than the randomly-resampled if its score is less than/greater than the mean +/- a standard deviation of this distribution86. We use the scores in conjunction with burned area maps to give context to the performance scores (Figure S8).
Following87, we also assess each model’s global and regional total burned area distribution compared to the two observational datasets using probability distribution and quantile-quantile analysis. This allows us to assess which parts of burned area temporal distribution each model can reproduce i.e., do models capture low, average, or extreme burned areas.
6. Ensemble Weighting
Analysis of the previous generation of this fire model ensemble34, and the analysis shown in this paper highlights significant model performance differences among different regions. Therefore, we apply a region-specific weighting for each of the IPCC regions, based on model performance in that region. We apply a model weighting based on the distance of the modelled to the observed burned area temporal RA88,89. We apply weights according to two aspects of temporal performance, which we assess for each observation (FireCCI5.1 and GFED5): (i) Do models capture the correct overall temporal variability in burned area? (ii) Do models capture year-by-year variations in burned areas? While land use and socio-economic factors can dominate long term trends in burned area3,5, year-by-year variations are largely driven by climate and fuel dynamics90. Therefore, we assume inter-annual performance is an indicator of a model’s ability to simulate climate impacts on burned area.
For both criteria, we perform an area-weighted sum of the pixels per IPCC region to get the regional total time series for the observations and the factual and counterfactual simulations. Then, we calculate the relative anomaly of the factual against counterfactual means, as per Eq. (1). For the first criteria, we perform these steps on annual total burned area NMEt against each observation in turn. For the second, we perform this on ranked monthly burned areas, and again use NME to compare against both observations.
This results in four NME scores (two criteria, two observations) for each model in each IPCC region (Tables S5-S6). We then sum these four NME scores to get a measure of a combined distance score (Di). We apply Di to Eq. 5, originally from88, but without the denominator of the formula which deals with model (in)dependence and is relevant for large ensembles with e.g., multiple simulations of the same model. Lastly, we normalize each score per region.
$${{W}}_{{i}}= {{e}}^{-\frac{{{D}}_{{i}}}{{{\sigma }}_{{D}}}}$$
Equation 5: Model weight calculation. The weight for model i (W i ) is calculated through the sum of its distances to the observations (D i ) and the shape parameter σ D . Here, the sum of the four NMEs (Tables S5-S6) are used as sum of distances, and σDwas set at 0.5, adapted from88.
There is no objective best method to determine a value for σd. However, a large σd corresponds to model democracy, whereas a small σd focuses all the weight on the few best model(s). Therefore, we apply a σd of 0.5 to calculate one weight per model per region, as this represents a general balance between democracy and performance89.
We use this weight (Table S7) to bootstrap (with replacement) the monthly regional burned area RA 2003–2019 time series, where the 100,000 bootstrapped samples are assigned according to the weight for each model (number of bootstraps = 100,000 x weight). The weighted factual and counterfactual ensemble is then used to calculate the change in PDFs and the Probability Ratio. Bootstrapping the models using these weights implies that we sample the best performing models more times than models that perform poorly, adding confidence to our results.
7. Evaluation Results
Evaluation shows that no individual model captures the observations perfectly, but that at least some models in the ensemble capture the main spatial patterns and relative temporal anomalies in most IPCC regions. Most regional trends are appropriately modelled by multiple models, regions with the poorest performance are the Arabian Peninsula (ARP), West and Central Europe (WCE), South-West South-America (SWS), SAU (South Australia), and New Zealand (NZ).
7.1 General Benchmarking
Our metric scores indicate that the models reproduce the overall pattern of burned area, and perform consistently against each other globally for all products (Table S4), except VISIT. This indicates that the simulated biases in VISIT compared to biases in burned area observations are greater than the observed spatial variance. Considering the spatial distribution, VISIT simulates the highest burned area in Australia and minimal in Africa or South America (Fig. S8). SSiB4 more accurately captures the position and magnitude of Africa’s Sahelian (Sahara (SAH), northern Western-Africa (WAF), Central-Africa (CAF) and North Eastern-Africa (NEAF)) and Miomboian (Northern West Southern-Africa (WSAF) and East Southern-Africa (EASF)) fire regions (Fig. S4, NME3 of 1.06–1.14). Meanwhile, CLASSIC reproduces the magnitude of these fire regions but over too extensive an area, simulating too much burning in some parts of Africa with little fire in the observations. JULES Sahelian fires are reproduced but are broader than in observations and shifted slightly northward, away from WAF, CAF and NEAF and into SAH. The two LPJ-GUESS models simulate too low Sahelian burning. ORCHIDEE captures the Sahel, but Miomboian burning extends too far into WSAF and EASF regions. However, importantly for this study, all models estimate the relative high and low fire regions fairly well, regardless of absolute magnitude, as indicated by lower NME3 scores (Tables S5-S6). JULES, CLASSIC and the two LPJ models simulate more burning in the Cerrado and Caatinga (North-East South-America (NES) and South-East South-America (SES)) than in observations, while SSiB4, ORCHIDEE and VISIT. JULES, LPJ and (to a lesser extent) CLASSIC capture fires around the Amazon arc of deforestation (South-American Monsoon (SAM) and the southern part of Northern South-America (NSA)). CLASSIC has too much burning in South American Atlantic Forests (coastal NES). LPJ-GUESS-SIMFIRE-BLAZE captures the Northern Australian (NAU) burned area, which is underestimated but present in JULES and LPJ-GUESS-SPITFIRE. SSIB4 and CLASSIC burn little in NAU and have too much burning in Southeastern Australia (East Australia (EAU) and South Australia (SAU)), extending too far into inland Mallee and Acacia Shrublands (towards Central Australia (CAU)). CLASSIC, LPJ-GUESS-SIMFIRE-BLAZE, and SSIB4 simulate the observed burning in the Kazakh (East Europe (EEU) < West Siberia (WSB) and northern West Central Asia (WCA)) and Mongolian-Manchurian Steppes (North East Asia (EAS) and into Russian-Far-East (RFE)) in Central and Eastern Asia, though not at the observed magnitude. All models tend to burn more than observed in North America, though the exact locations are model dependent.
7.2 Temporal Variability
We evaluate the temporal NME to assess the capability of our models in representing the observed temporal patterns in burned area. An NME of 1 indicates that our models are, on average, a factor of 1 away from the observed burned area. For example, if the observed burned area relative anomaly is 0.7, then a modelled burned area relative anomaly of 1.4 would result in an NME of 1 (\(abs(0.7-1.4)/abs\left(0.7\right)\)), a modelled relative anomaly of 0 would also result in an NME of 1, and a relative anomaly of 2.1 (or -0.7) would result in an NME of 2. Here, we find that SSiB4 performs best globally, and ORCHIDEE-MICT-SPITFIRE has the highest NME (Extended Data Table 1). However, the global trend is dominated by the African savannahs, where ORCHIDEE-MICT-SPITFIRE has a poor(er) performance. The full list of regional NME scores are provided in Supplementary Tables S5 and S6. We find that the models tend to have low ranked monthly NMEs in the Boreal regions, indicating they capture the seasonality of fires in those regions relatively well. The four selected regions of interest (CAU, SES, WSB and WNA) are all modelled well (NME < 1) by at least multiple models in each of those regions, this holds for the majority of the 43 IPCC regions. With ARP, SAU, NZ, WCA and WCE and being the regions that have the poorest general performance. However, some of these regions undergo very little burning and their high NME scores are partially the result of our choice to focus on the relative anomaly.
7.3 Burned Area Distribution
Globally, the models tend to have a narrower and higher distributions than the observations (Fig. S9), especially in JULES, although the LPJ models and SSiB4 match the observations more closely. There is high regional variation in the distribution of burned area (Fig. S8), and the global trend is dominated by Africa’s Sahelian and Miomboian high burned area fire regions. Regionally, VISIT stands out of the distribution in most regions (Fig. S9). Excluding VISIT (Fig. S10), there are many regions where the models simulate the observed distribution well, including our four selected regions (Extended Data Fig. 1, top row).
Figure S11 shows the quantiles of the distribution for simulated and observed burned area RA. Globally, the models simulate the observed distribution well, especially for the lower and middle ranges of the distribution. At the very high end of the distribution, all fire models simulate too little burned area compared to observations (below the 1:1 line). Regionally, this is even more evident, where the models simulate much lower burned area than observed generally across most regions at the top end of the distribution. This reflects how fire models are designed to capture the mean burned area globally, therefore do not capture the stochastic nature of burned area and extremes outside of the normal distribution52. As a result, we focus our attribution analysis on the change in mean burned area rather than the change in large fires which the models cannot simulate well.
In our selected regions, the burned area at the lower and middle end of the distribution is well simulated compared to observations (Extended Data Fig. 1, middle row). In SES, all the models do well across the whole distribution. In other regions, especially CAU, models underestimate the observed burned area, with VISIT tending to underestimate burned area RA the most compared to observations. CLASSIC performs well in WSB, and LPJ-GUESS-SIMFIRE-BLAZE performs well in WNA. ORCHIDEE-MICT-SPITFIRE performs very well in West North-America (WNA) compared to FireCCI (solid lines) but over predicts burned area compared to GFED5.
We also assess the variability of the models against observations with a power spectra analysis (Fig. S12). Globally, the observations are within the model range, although at the upper end. Regionally, there is generally good agreement between the models and observations. In the four selected regions, the observations mostly lie in the middle to upper end of the model spread (Extended Data Fig. 1, bottom row).