Spatial Distribution and Seasonality of SCF
The motivation behind this analysis lies in the difference in SCF across satellites and reanalyses. In Fig. 1a, we show the average spatial distribution of SCF in the late snowmelt season across four data sources (MODIS from satellite, ERA5-Land, MERRA-2, and MATCHA as reanalyses, see Methods). SCF is very high in ERA-Land which can be attributed to excessive snowfall in the ECMWF snow model, while extremely low SCF in MERRA-2 can be attributed to the high snow depth specified in its land model to consider 100% SCF, leading to lower SCF 49. This disparity in SCF across different datasets alludes to diverse model representations of processes driving SCF which we try to leverage in this work. SCF in MATCHA closely resembles that of MODIS, which is a possible result of CLM-SNICAR coupling within MATCHA effectively constraining SCF.
Figure 1b shows the monthly time series distribution of spatially averaged SCF across the datasets for high and low snow cover (HSC and LSC) regions (see Methods). A seasonal cycle can be observed across the four datasets, with high variability in both regions. As in Fig. 1a, ERA5-Land shows a consistently high median and interquartile range (IQR) for both regions across all months, except in the late snowmelt period (especially July), when the median is similar across the datasets. The largest differences with respect to MODIS occur in the snow accumulation months (DJF), especially in the HSC regions. The median SCF from MERRA-2 is closest to MODIS SCF across all months, especially in LSC regions but the former has a low IQR compared to the other datasets.
A cursory glance at the half-decadal trends of maximum SCF over the six glacier regions (not shown here) shows that the maximum decrease in SCF occurs in the glaciers of the Himalayas (HIM) and Kunlun Shan (KLN) which lie in the vicinity of pollution (anthropogenic LAPs emitted from the Indo-Gangetic Plain and dust emitted from the Taklamakan Desert) 50,51.
Spatial Distribution and Seasonality of MET and AER Predictors
As in Fig. 1, we now turn our attention to the spatio-temporal distribution of selected predictors across aerosols and meteorology in Fig. 2. The selection The meteorological variables are based on prognostic variables that have been used in snow models with black carbon and dust mixing ratios at the surface being some of the most relevant aerosol quantities for snow-darkening 52–54.
In Fig. 2a, we explore the average spatial distribution of the chosen aerosol and meteorology variables during the snowmelt period across the three reanalyses. Surface BC mixing ratios are higher in CAMS-EAC4 than in MERRA-2 and MATCHA despite emission input in MATCHA utilizing Copernicus Atmosphere Monitoring Service (CAMS) emissions (CAMS-GLOB-ANT v4.2) 55. High values of surface BC are primarily concentrated in the vicinity of the glacier regions pointing to pollution sources from nearby Asian countries. For surface dust mixing ratios, MERRA-2 shows higher values compared to CAMS-EAC4 and MATCHA. High values are mostly concentrated in northern HMA due to its proximity to desert regions. Skin temperature is consistent across datasets, with low values in ERA5 and MATCHA over the Tibetan Plateau. The wind speed at 10 m is higher in MERRA-2 and MATCHA compared to ERA5, especially near southern HMA. Specific humidity at 2 m shows a similar distribution across datasets with higher values in MERRA-2 and MATCHA concentrated at the geographical extremities of HMA.
In Fig. 2b, we show the monthly time series distribution of these predictors across datasets for HSC and LSC regions separately. Both MET and AER variables show a pronounced seasonal cycle in both regions. Surface BC is lowest in the late snowmelt season (May-July) with a high median from CAMS-EAC4 compared to the other datasets. Surface dust is high during May-July, with a higher median and IQR from MERRA-2. SKT is consistent across all three datasets for both regions, with a strong seasonal cycle that anti-correlates to that of SCF in Fig. 1. Wind speed at 10 m however shows a weak correlation with SCF, the effect of which can be buffered by dominant variables like temperature and moisture that drive SCF 25. Restating our Darwinian approach to the interaction problem, we are considering the importance of the associations between our predictors and SCF rather than their individual relationships, which can often be misleading due to buffering by the dominant drivers such as temperature or moisture.
Quantifying Importance of AMI to SCF
A brief overview of the importance approach is provided below and is extensively covered in the Methods section. We consider 22 different predictors spanning aerosols (AER), meteorology (MET), and elevation (ELEV) from the three reanalyses and regress them on SCF using 1) a statistical multi-linear regression and 2) a machine learning (ML) algorithm called eXtreme Gradient Boosting (XGBoost) (see Supplementary Methods). Interactions between AER, MET, and ELEV are defined as second-order terms in each algorithm. We focus specifically on the interacting terms between aerosols and meteorology, defined as aerosol-meteorology interactions (AMI) for our work (see Eq. 1 in Methods). Our analysis is based on the importance estimates from the regression algorithms, which denote the sensitivity of the 22 predictors and their higher-order terms to the target variable (SCF). This sensitivity is quantified by two metrics, relative importance (RI) from the multi-linear regression and Shapely contribution (SHAPc) from the ML model. We also use two model constructs on this regression framework to distinguish between the importance of AMI from an observational (Obs-Model) and reanalysis (Mod-Mod) point-of-view. The key point to note is that the target variable (SCF) in the regression is used from two sources, 1) satellite data from MODIS, and 2) each reanalysis model.
In Fig. 3, we show the RI and SHAPc distributions of AMI and MMI only in the Obs-Model (3b) and Model-Model construct (3c) for LSC regions in the late snowmelt season. RI and SHAPc importances for MMI are higher across all datasets with an average contribution of 50–70% (both RI and SHAPc) to SCF variability. AMI shows a consistent magnitude across all datasets in the Obs-Model construct with an average RI of 10–20% and SHAPc of 20–40%, indicating a significant contribution to SCF variability. In the Model-Model construct, RI and SHAPc for AMI are lower by an average of 10% than the Obs-Model construct. The variability of AMI across both constructs and datasets is higher than the difference in the average AMI importance (for both RI and SHAPc). A non-parametric Mann-Whitney test of the AMI distributions shows a significant difference (95% level) for both constructs across the three datasets. AMI is thus significant for both constructs, and the decrease in AMI in the Model-Model construct suggests second or higher-order interaction terms reflecting missing or unresolved processes that drive SCF. The large variability in the spatiotemporal distribution of SCF (Fig. 1) in the late snowmelt season, combined with the difference in AMI importance across both constructs suggests the disparity of AMI-related processes that drive SCF in each reanalysis dataset. SHAPc values for AMI are higher in both constructs compared to RI, which can be due to the ability of XGBoost to capture the non-linear interactions to a fuller extent, compared to the MLR model, where the interactions are restrictive in its definition (only non-square product terms).
Process Drivers of AMI
We further decompose the AMI importance in both the constructs (Fig. 3a and 3d) by meteorology (with five sub-groups of variables) and chemistry (with four sub-groups of variables). Among AER variables, carbonaceous aerosols contribute significantly to AMI (average 18%), surpassing dust (average 11%) in both the constructs and metrics. Within MET variables, circulation-related variables contribute the highest (average 13%) followed by cloud cover (average 10%).
The prevalence of carbonaceous aerosols can be attributed to increased surface BC and total aerosol optical depth (AOD) in the vicinity of the LSC regions during the pre-monsoon season (April-May). This includes wheat crop residue burning in the northern part of the Indian subcontinent leading to potential interactions with large-scale atmospheric circulation in the subsequent months (late snowmelt season) 56–59. Multi-model intercomparison of global aerosols has also reported that carbonaceous aerosols contribute an average of 70% to aerosol-induced absorption, a key process in AMI 60. A decrease in importance can be seen across the subgroups and datasets in AMI from the Obs-Model to the Model-Model construct for both metrics, which can be attributed to the absence of sufficient processes driving SCF in the model representation of the three datasets.
The use of the two metrics RI and SHAPc highlights two aspects of these interactions. SHAPc captures the bulk non-linear effect of AMI and is more consistent across datasets while RI seems to capture more of the specific or local second-order representations of AMI especially those seen in the Model-Model construct.
Emergent Connections within AMI
In Fig. 4, we show the importance of individual interactions within AMI using network diagrams in both model constructs 61. The five larger nodes (circles) represent the AER predictors while the small nodes represent the MET predictors. The connections (edges) between them are weighted (edge widths) by the pairwise importance metric (RI and SHAPc) and represent the interactions between AER and MET. In the Obs-Model construct (Fig. 4a), we observe a higher number of strong interactions (moderate - very high) whereas the Model-Model construct exhibits fewer but more specific strong interactions. A notable strong interaction consistently observed across the reanalyses in the Obs-Model construct involves total AOD and surface dust with meteorology variables related to circulation (viz. geopotential height at 300 and 500 hPa, mean sea level pressure), temperature (skin temperature), and radiation (surface sensible heat flux) corresponding to ACrI, ATI, and ARI (see Methods). Interactions are sparser in the Model-Model construct (Fig. 4b), indicating specific AMI processes captured by RI and SHAPc within each reanalysis model. Interactions with temperature (ATI) are predominant in SHAPc across the three datasets, while RI reveals interactions spread across cloud cover (ACCI), radiation (ARI), and precipitation variables (AMoI). SHAPc captures the overall (bulk) sensitivity of AMI to SCF, whereas RI focuses on the sensitivity of specific (local) second-order interactions within AMI.
For ERA5/CAMS-EAC4, RI shows strong interactions of total AOD and carbonaceous aerosols with low cloud cover (ACCI) and surface sensible heat flux (ARI). In contrast, for MERRA-2 and MATCHA, we observe interactions of carbonaceous and dust aerosols with precipitation (AMoI). SHAPc on the other hand, highlights a strong interaction of near-surface temperature with aerosol variables (ATI) in the Model-Model construct consistently across the three reanalyses.
The network diagrams reflect the complexity of each reanalysis model. The progression of importance, particularly SHAPc from ERA5/CAMS-EAC4 to MATCHA in both constructs signifies the degree of coupling incorporated in the three datasets, considering the absence of coupling between aerosols and meteorology in ERA5, and its presence in MERRA-2 and MATCHA. The ability to visualize the strength of each interaction within AMI through the network diagrams points toward relevant processes that drive SCF in the study period, which is otherwise difficult to interpret from the seasonality of the selected predictors in Fig. 2. Variables such as surface dust and wind speed at 10 m (U10 and V10) show significant differences in the late snowmelt season (from Fig. 2) indicating an apparent weak correlation with SCF which we discussed can be obscured by the buffering of dominant patterns by other predictors. An example of this can be seen in Fig. 4, where the low to moderate interactions related to wind variables at 10 m (U10 and V10) in the Obs-Model construct are not apparent, suggesting the misattribution of AMI processes related to near-surface wind in the Model-Model construct.
Observable AMI
Considering the significance of AMI in regulating SCF in the late snowmelt season for LSC regions in HMA, it would be useful to interpret what these interactions represent through relationships between the predictors and SCF. Although having multiple predictors from satellite observations would be ideal for exploring AMI to its fullest extent, we consider four such variables from satellite observations, MAIAC AOD, MODIS LST, and IMERG PRECIP, and visualize their relationship with MODIS SCF. Exploring the relationship between predictors in the Obs-Obs construct will provide a basis of ground truth for relative comparison with findings using the same predictors in the other two constructs.
In Fig. 5, we show the relationship between MODIS SCF and the predictors MODIS LST and IMERG PRECIP weighted by MAIAC AOD for LSC regions during May-July. We observe an exponential decay of MODIS SCF with MODIS LST above 0oC, dominated by high values of MAIAC AOD, especially at higher LST and lower SCF. Such behavior points to the radiative effects of absorbing aerosols causing warmer temperatures (high LST) and accelerated snowmelt (low SCF) 62. To assess the sensitivity of SCF with LST at different AOD values, we use a mutual information-based metric to quantify non-linear sensitivity (shown by the bars in Fig. 5) 63. The strongest sensitivity between MODIS LST and SCF occurs for low to moderate values of AOD (values within 0.07–0.17). Compared to satellite observations, MATCHA shows a similar relationship between SCF and LST compared to the other two datasets (for both Obs-Model and Model-Model constructs). Given that MATCHA is the only framework among the three datasets with coupling between snow, radiation, and LAPs, this similarity confirms the ability of parameterizations in MATCHA to represent AMI better than datasets from MERRA-2 or ERA5/CAMS-EAC4. In the Model-Model construct, MERRA-2 shows the strongest relationship between SCF and LST at moderate to very high values of AOD (0.17–5.7) suggesting an overestimated aerosol loading in LSC regions during the late snowmelt season that might contribute to the lower SCF values seen in Fig. 1 (for MERRA-2). ERA5/CAMS-EAC4 on the other hand, has stronger SCF-LST sensitivities for values below high AOD (< 0.3), which can allude to a lack of coupling within ERA5 based on aerosol radiative feedbacks in the model that translates to an absence of strong SCF-LST dependencies to the AOD distribution in CAMS-EAC4.
We also see an exponential decay between MODIS SCF and IMERG PRECIP in the Obs-Obs construct, with strong SCF-PRECIP dependency at low to moderate values of AOD (0.7–0.17). This is a potential representation of wet scavenging of absorbing aerosols by precipitation that minimizes the radiative effect of absorbing aerosols in accelerating snowmelt (thus higher SCF in the low-moderate AOD range, compared to when AOD is very high, where SCF is minimal) 64. As seen for the SCF-LST relationship, MATCHA exhibits the most similarity in the SCF-PRECIP relationship compared to the other two datasets, with strong sensitivity of SCF to PRECIP in the low-moderate AOD range, indicating the degree of coupling in MATCHA compared to MERRA-2 and ERA5/CAMS-EAC4. MERRA-2 reflects the underestimation of SCF as seen in Fig. 1 and is dominated mostly by moderate to very high values of AOD (0.08–6.75) compared to the other datasets, suggesting overestimated AOD in LSC regions during May-July. High SCF in ERA5/CAMS is dominated by low to moderate AOD (0.04–0.08) which can indicate lower-than-usual aerosol loading in the study region during the late snowmelt period.
We also visualized the relationship between SCF and total cloud cover fraction from MODIS through both constructs, as shown in Fig. 5. We did not find a clear separation across the different segments of the AOD distribution, as in LST and PRECIP. This indicates the complexity of ACCI (and ACI) which cannot be decomposed through a direct relationship between SCF and cloud cover 25. The definition of cloud cover (fraction) across the three reanalyses and MODIS is also different (see Supplementary Table 2) which makes it difficult to perform an equivalent comparison across the datasets.