4.1 Precipitated totals evaluation
Figure 3a shows the average number of rainy days for values ≥ 0.5 mm day−1. Figure 3b demonstrates the annual averages of observed (rain gauge network) and estimated (GSMaP-Reanalysis and GSMaP-Operational products) rainfall, for the periods from 2000 to 2014 and from 2014 to 2019, respectively. Considering the number of rainy days, the observed rainfall data were frequently smaller than the GSMaP estimated data throughout the period. However, except for 2007, annual averages between GSMaP-estimated and observed rainfall values showed similar trends. Moreover, the observed rainfall data showed greater variability than did the GSMaP-estimated data, which was expected since rainfall detection from rain gauge are punctual. Conversely, GSMaP estimation integrates the rainfall within a pixel area (~100 km2). By comparing annual averages between observed and satellite-estimated rainfall data (Figure 3B), GSMaP data showed accumulated values lower than the observed ones for the entire period evaluated. Still, observed data had greater variability than that estimated by GSMaP.
The number of rainy days estimated by the GPM can be influenced by its algorithms and measurements. Such tools are developed to obtain greater accuracy in instantaneous rainfall estimates, particularly for light rainfalls (>0.5 mm h−1) (Hou et al., 2014). When evaluating GSMaP over a Cerrado area in Brazil, Salles et al. (2019) pointed out that spatial differences between area (GSMaP grid) and point (rain gauge) scale influence conclusions due to rain gauge locations and distribution. Therefore, rainfall events identified by satellites may not be detected by rain gauges, as it might rain at other locations within the GSMaP grid area.
Regarding annual rainfall values, large-scale rainfalls are commonly underestimated by GSMaP. Deng et al. (2018) analyzed the performance of GSMaP over the Hanjiang River basin in China and found that it can overestimate and underestimate small-scale and large-scale rainfalls, respectively. After integrating the data into annual accumulated, annual averages of observed rainfall in a river basin can be underestimated.
Figure 4 shows the monthly rainfall averages between 2000 and 2019 estimated by the GSMaP products and observed in the local rain gauge network. These results show GSMaP capacity to reproduce inter-seasonal variations between dry and rainy periods in the region. The months between May and September are marked as the dry season, while from October to April is the rainy season. Just as the annual cumulative, monthly rainfall is underestimated by GSMaP concerning the observed in the region, mainly between November and April.
4.2 Quantitative statistics
A comparison of statistical indices (CC, MAE, RMSE, and PBias) between GSMaP and rain gauge data demonstrated differences in satellite data performance between dry and rainy seasons (Figure 5). During the dry season, CC values (Figure 5a) had greater variability. Although some CC values can be considered high, it was not verified here since 50% was below 0.4 for the dry seasons from 2000 to 2019. In the rainy season, the variability of correlations between GSMaP-estimated and observed data was smaller; yet, 75% of the CC for this period were below 0.5. This difference was already expected since, in dry periods, rains are substantially less frequent and more irregular, in addition to being in lesser quantity or almost zero, compared to rainy periods; consequently, the variability between rainfall observation (rain gauge) and detection (satellite) during this period is greater. During the rainy months in the Cerrado environment, seasonal rainfall is marked by high variability in the dry-rainy and rainy-dry transition months (Nimer, 1989). Furthermore, the punctual observation from the rain gauge network had greater variability compared to GSMaP-estimated data. Such variability increases errors when comparing the ground and satellite data (Darand & Siavashi, 2021). The Upper Tocantins River basin lies in Brazil midwest region, and its rainfall regime is governed by atmospheric mechanisms, fostering a regional and homogeneous climatic pattern. However, the local terrain, with altitudes between 325 and 1606 m (Figure 1), can create a certain heterogeneity in rainfall distribution over the basin area. In almost the entire Midwest region, more than 70% of the total rainfall accumulated during the year occurs from November to March. Conversely, the months from June to September are excessively dry, with an average of 4 to 5 days of rainfall per month (Nimer, 1989).
Another factor affecting ground-observed and GSMaP-estimated rainfall correlations is the low coverage density of rain gauges, despite their unlimited database between 2000 and 2019. In this regard, Darand & Siavashi, (2021) applied GSMaP over Iran to compare regions with different rain gauge densities. These authors found correlations between 0.1 and 0.7 for regions with low rain gauge densities, and between 0.4 and 0.9 (more often above 0.6) for regions with higher densities. Moreover, Hrachowitz & Weiler (2011) found that more sparse rain gauge networks tend to have different deviations, depending on the local rainfall regime. The larger and the more uniform the rainfall events occur within a basin, the smaller the data deviations observed in rain gauges. On the other hand, small summer storms or localized rain events are registered differently by rainfall gauge networks, wherein some events may not be observed, or only detected individually.
Given the variability in the rainfall formation process, studies using different algorithms and sensors have shown that topographic variations and convective rains have a significant influence on satellite rainfall estimation. This can therefore result in unexpected errors between embedded pixel values and gauge point values (Ma et al. 2016; Wang et al. 2019). Our rainfall analysis between 2000 and 2019 may also have been influenced by local atmospheric irregularities, which can cause rainfall to have distinct totals each year and far-from-normal values (Nimer, 1989).
The MAE and RMSE values (Figures 5b, 5c) demonstrate a great difference in data performance between dry and rainy seasons. In the dry season, rainfall events and volumes both for observed (rain gauge) and estimated (GSMaP) values were extremely lower than those in the rainy season. This can cause misinterpretation regarding the GSMaP system performance in the dry season. Darand & Siavashi et al. (2021) found RMSE values from 5 to 15 mm day−1 between March and December of 2018 in Iran. These authors highlighted the use of GSMaP-Gauge product calibrated with data from rain gauges rather than GSMaP-NRT (Near Real-Time) and GSMaP-MVK (research product) products to obtain more accurate estimates.
PBias (Figure 5d) had greater variability and amplitude for comparisons in the dry season. In the rainy season, PBias values were mostly negative, which denotes rainfall underestimation by GSMaP. Ma et al. (2016) evaluated the TRMM 3B42V7 and GPM IMERG rainfall data for the Tibet region and observed BIAS rates between -20 and 80% and between -30 and 70%, respectively, during the rainy seasons between 2014 and 2017. In the Upper Tocantins River basin, between 2000 and 2019, average PBias values for GSMaP data ranged between -80% and 103%, and between -68% and 80% for the dry and rainy periods, respectively.
4.3 Qualitative analysis
Figure 6 shows a set of categorical metrics used in qualitatively evaluating the performance of GSMaP products. In the dry season, POD values (Figure 6a) varied highly, with 50% being below 0.8, and reaching a minimum of 0.3. In the rainy season, POD values are concentrated above 0.7, showing greater efficiency or probability of success in detecting rains between October and April. Such a higher detection probability in the rainy season reduced false alarm rates (FAR) (Figure 6B). Otherwise, the lower performance in terms of POD in the dry period (May to September) promoted higher FAR values. Notably, FAR has a negative effect since the lower the value, the lower the false alarm rate. Therefore, during the rainy season, the sensor could properly detect the rainfall events observed in the rain gauge network (Wilks, 2006). Lastly, FOH values (Figure 6C) showed significant differences in averages between both periods; therefore, the frequency of accurate estimates was higher in the rainy season, corroborating FAR results.
HSS index (Figure 6d) compares a system quality or ability of prediction to other forecasts occurring randomly, in other words, statistically independent from reference observations (rain gauges). A perfect prediction would have an HSS equal to 1, while a random forecast receives 0 (Tuan et al. 2019). As in other indices, HSS variability in the dry period was greater, but its average was slightly higher than that in the rainy season. In short, only one observation in the rainy season was deemed random based on HSS.
The high variability of categorical indices in both seasons can be attributed to the seasonal effect of rainfall in the region under study. Salles et al. (2019) evaluated IMERG-v5 and GSMaP-v7 rainfall estimates for the Distrito Federal in Brazil, which is near the Upper Tocantins (Figure 1), and observed that both products had the best performances during the rainy season and the worst in the dry one. Moreover, median performances were obtained in intermediate periods between the dry and rainy seasons.
4.4 Performance analysis of monthly and daily GSMaP estimate
Figure 7 displays the accuracy of GSMaP Reanalysis and GSMaP Operational products on daily and monthly scales. After comparing with rain gauge data, rainfall tended to be underestimated on both temporal scales and analyzed periods. PBias values on the daily scale were -53.3% and -52% for the rainy and dry seasons, respectively, and on a monthly scale were -39.7% and -28.3%, respectively. Data fit was smaller on a daily scale than on a monthly scale, in which CC for daily data was 0.42 in the rainy and 0.35 in the dry season. On a monthly scale, CC values were similar between both products (0.86 for the rainy season and 0.85 for the dry season). For other regions of the globe, the best fit of satellite data with rain gauge data on a monthly scale has often been found in comparative studies with other sensors and GPM mission data. Wang et al. (2019) assessed GPM IMERG and TRMM 3B42V7 products and observed CC of 0.85 and 0.92 on a monthly scale, and 0.50 and 0.41 on a daily scale, respectively.
In terms of bias, GSMaP products have shown in previous studies a negative trend, or underestimation, in Bolivia and Distrito Federal, Brazil, -22.4% and -10%, respectively (Satgé et al. 2017; Salles et al. 2019a). To circumvent this problem, a bias correction procedure has been recommended before using a GSMaP product. Boluwade et al. (2017) showed one example of a bias correction method for satellite rainfall estimates.
Satellite rainfall underestimates or overestimates are troublesome in hydrological applications. For instance, problems in streamflow or surface runoff estimates can affect flood predictions or aquifer recharge estimates (Tian et al. 2010). GSMaP products, as well as other satellite rainfall data, require prior comparison with data from rain gauges distributed throughout the land to identify site-specific errors in satellite estimates (Salles et al. 2019b). Sungmin et al. (2017) evaluated the GPM products IMERG Early, Late, and Final over Austria and faced the need for reference data to assess bias, not only for rainfall volume but also its temporal scale acquisition. Such an approach is recommended to achieve the required accuracy in satellite-estimated rainfall data. Another issue with GSMaP products is the difference in sampling rates between infrared (IR) and passive microwave (PMW) input sources used by the system. Chen et al. (2019) found that a significantly different pattern in sampling frequency for the GPM-GSMaP input sources. In short, IR data is more used than PMW sensor data.
Figure 8 shows the probability distributions (PDFv) in terms of daily rainfall volume of rain gauge and GSMaP data. PDFv is a widely used metric for satellite rainfall products and measures accumulated rainfall rates within a rainfall volume ranking (Kirstetter et al. 2012; Tang et al. 2016; Xu et al. 2017). Both products tested here tended to overestimate rainfall for the categories drizzle (0.1~1 mm day−1) and light (1~10 mm day−1). For Wang et al. (2019), satellite overestimation of low rainfall volumes may be related to evaporation, since small rain droplets do not fall into the atmospheric profile. For rainfall volumes above 10 mm, an inverse behavior was observed, with rainfall events being underestimated by both GSMaP products. Similar behavior was observed by Wang et al. (2019), who used GPM IMERG and TRMM 3B42V7 data and noted an overestimating trend for drizzle data (0~1 mm day−1). Concerning heavy (25~50 mm day−1) and torrential (> 50 mm day−1) volumes, both GSMaP products showed a low probability of detecting events compared to the rain gauges in the Upper Tocantins River basin. This fact might be related to the distribution of isolated rains within the total pixel area.
4.5 Bias correction for GSMaP rainfall data in the Upper Tocantins River basin.
Previous analysis demonstrated that the GSMaP products underestimate rainfall data in the Upper Tocantins River basin from 2000 and 2019. Therefore, a bias correction procedure was required. We thus applied a multiplicative factor to correct GSMaP data bias and improve comparison statistical indices between GSMaP and rain gauge data (Table 5). Bias correction improved KGE, NSE, CC, RMSE, and MAE of GSMaP data in the dry and rainy seasons. Based on PBias values, one can say that bias correction practically eliminated data underestimation before application (Tables 4, 5). For daily scale (Table 4), correlation (CC) between GSMaP and rain gauge data increased in the dry period, besides substantial KGE and NSE increases. Both RMSE and MAE indexes could also be reduced for the dry period. On the other side, PBias correction and KGE and NSE improvements stood out in the rainy season, while RMSE and MAE were not substantially changed on a daily scale. In Turkey, Saber & Yilmaz (2018) applied a multiplicative correction factor on GSMaP data and obtained R² from 0.81 to 0.98, RMSE from 6.97 to 1.21, NSE from 0.57 to 0.99, and PBias from -56.44 to 0.20. These authors assessed these statistical indices, first on an hourly scale after a daily scale, for different rainfall volume thresholds (from 1 to 10 mm day−1). Condom et al. (2011) also used a multiplicative correction model on TRRM data in Peru and obtained greater RMSE reductions compared to additive bias correction models. The results of bias correction on a monthly scale (Table 5) improved almost all indices. Therefore, the method was efficient for GSMaP data in the Upper Tocantins River basin, both during dry and rainy seasons from 2000 to 2019.
Table 4
Statistical analysis comparing original and corrected daily GSMaP data for the dry season (May – September 2000-2019).
Statistical Measures | GSMaP before Correction | GSMaP after Correction |
Dry | Wet | Dry | Wet |
KGE | 0.15 | 0.11 | 0.74 | 0.34 |
NSE | 0.09 | 0.03 | 0.55 | 0.14 |
CC | 0.36 | 0.34 | 0.76 | 0.36 |
RMSE (mm day−1) | 17.70 | 13.49 | 4.80 | 14.54 |
MAE (mm day−1) | 9.90 | 7.32 | 1.01 | 8.22 |
PBias (%) | -36.8 | -38.6 | 0.9 | 0.1 |
Table 5
Statistical analysis comparing original and corrected monthly GSMaP data for the rainy season (October – April 2000-2019).
Statistical Measures | GSMaP before Correction | GSMaP after Correction |
Dry | Wet | Dry | Wet |
KGE | 0.43 | 0.33 | 0.82 | 0.82 |
NSE | 0.42 | 0.20 | 0.72 | 0.72 |
CC | 0.76 | 0.77 | 0.85 | 0.85 |
RMSE (mm month−1) | 87.77 | 117.65 | 61.13 | 69.85 |
MAE (mm month−1) | 58.94 | 83.42 | 42.64 | 48.99 |
PBias (%) | -28.1 | -39.5% | 0 | 0.1 |
Figure 9 shows the scatterplot of monthly rainfall data obtained by GSMaP products, before and after bias correction, for the dry and rainy periods between 2000 and 2019. If compared to its direct use (without bias correction), bias-corrected GSMaP data had a general improvement in terms of correlation with rain gauge data.