Correlation analysis
Initially, correlation analysis was done in order to pass the most prominent climatic features to the model. Figure. 1 shows the bio-climatic variables initially considered for correlation analysis.
The maximum permissible correlation value is set to < 0.6.
The below heat map (Fig. 2) shows the correlation between each variable. Variable selection is always important and a bit complex in order to avoid multi-collinearity. There is no pre-defined correlation value to select the variable, so we have chosen variables based on past studies and the importance of variables on vector sustainability [11]. Also, we attempted the SDM model to maximize AUC values with different features then after we finalized the variables.
From the above Fig. 2, the least correlated values are Bio1, Bio2, Bio14, Bio19, and Köppen zones. Here, Bio1 and Bio2 represent mean annual temperature (℃) and mean diurnal temperatures (℃), respectively, and Bio14 and Bio19 represent precipitation of the year’s driest month (kg m− 2) and coldest quarter of the year’s nearest month (kg m− 2).
Statistics of Bio-climatic Variables
The bio-climatic variables vary significantly over the domain, but to understand the microscale spatial variability of these variables (variability means suggest changes in highs and lows of magnitudes at locations), we plotted these variables in a quantile pattern (Fig. 3).
Since the ranges of variables are narrow in some places and some extreme outliers are present in the domain, the plots are shown in between the 10th to 80th percentile range.
From Fig. 3a, Bio-1 (mean annual temperature (℃)) ranges from 23–27 ℃, and there is a high visible variability across India. In contrast (Fig. 3b), Bio-2 (mean diurnal temperature (℃)) ranges from 0–10 ℃, and variability is seen on the east coast of India. On the other hand, the west coast and northwest have high magnitudes.
From Fig. 3c, Bio-14 (precipitation of the year’s driest month (kg m− 2)) ranges from 0-4.5 kg m− 2. There is a high visible variability across the west coast and northwest India; the east coast has high magnitudes. In contrast (Fig. 3b), Bio-19 (coldest quarter of year’s nearest month (kg m− 2)) ranges from 10–100 kg m− 2, and variability is seen on the east coast along with India’s IGP (Indo Gangetic plane); the south tip and north tip have high magnitudes. The above-represented variables are utilized in the final model, and data corresponds to the baseline (1981–2010). Since the above plots only describe the variability of parameters and have a narrow range throughout the domain, the plots below are generated to understand the mean of those variables over different states (Fig. 4).
From Fig. 4a and b, the mean annual temperature is about 25 ℃. Still, only Jammu Kashmir, Sikhim, and Uthara Khand have lower limits and mean diurnal temperature has lower values in the same states as the mean annual temperature. Still, Andaman, Daman Diu, and Pudicheri exhibited lower values. From Fig. 4c and d, the direst month’s precipitation is very low in states like Goa, Gujarat, Daman and Diu, Rajasthan, etc., and the coldest quarter of the year’s nearest month is very high in Kerala, Goa, and Puducherry. These variations signify the importance of selected features on model performance.
Accuracy estimation
The accuracy estimation was done by measuring the AUC(Area under the ROC curve). For this purpose, the data was divided into training accuracy and testing accuracy.
AUC, or Area Under the Curve, signifies the likelihood that a randomly selected positive (green) example is positioned to the right of a randomly selected negative (red) example.
The AUC value ranges from 0 to 1. A model with entirely incorrect predictions has an AUC of 0.0, whereas a model with completely accurate predictions has an AUC of 1.0.
AUC is advantageous for two main reasons:
Scale-Invariance: AUC is not influenced by the scale of the predictions; it assesses how well predictions are ordered rather than their absolute values.
Table. 1 shows the AUC of training and testing of aegypti and albopictus models.
Classification-Threshold-Invariance: AUC gauges the quality of the model’s predictions regardless of the chosen classification threshold, making it a robust metric for model evaluation.
Table 1
AUC of Training and Testing of aegypti and albopictus models
|
aegypti
|
albopictus
|
Training
|
0.8081
|
0.8252
|
Testing
|
0.7658
|
0.8056
|
Vector Distribution
Figure 5 represents the baseline aegypti vector distribution with three aforementioned climatic scenarios. From Fig. 5a, box B1 represents the northeastern area with the highest possible occurrence from historical data, followed by the B2 southern tip of India. The B3 is the central part of India, having the least risk, along with Jammu and Kashmir. In the case of the SSP126 scenario, from Fig. 4b, there is an increment of suitability by 2040 in several parts of India, especially seen in B4. As in the case of the SSP370 scenario, further progress is visible in some parts of India, such as B5 and B6. In the cases of full fossil fuel emission where high global temperatures occur, such as SSP585, the habitability environment reaches high vectors. This phenomenon is visible in some parts, such as B7, B8, and B9.
In contrast (Fig. 6) to the aegypti vector, albopictus vector distribution is mild everywhere except in the northeastern region. It showed a substantial increase over India’s cost lines. However, due to very cold temperatures, both aegypti and albopictus are absent in Jammu and Kashmir regions.
From Fig. 7a and b, a few states exhibited very high Dengue risk in both aegypti and albopictus. The states like Andaman, Lakshadweep, Puducherry, Daman, and Diu have high baseline (SSP 2.6) risk. The color combinations such as blue (SSP 126), orange (SSP 370), and red (SSP 585) represent corresponding climatic Scenarios. From this, we can observe that Manipur, Nagaland, and Lakshadweep have witnessed high risk as climatic Scenarios change from SSP 126 to SSP 370 and SSP 585. The risk changes very high when SSP 126 changes to SSP 370, but it is minimal when SSP 370 changes to SSP 585.
As most states like Andaman, Daman, and Diu, Lakshadweep are union territories and have vast greenery with optimal temperatures, resulting in high dengue risk. As you observe, the Indian mainland states, West Bengal, Uttar Pradesh, Punjab, Haryana, Delhi, Gujarat, Kerala, Karnataka, and Tamil Nadu, are the high dengue burden states most dengue-vulnerable states according to [20]. In our model, all these states also have dengue occurrence probability > 0.5 in either aegypti or albopictus SDM models.
From Fig. 8a and b, the model risk is high when Bio-1 and Bio-19 are high ( change from low to high when values change low to high), and the inverse is true in Bio-2 and Bio-14 in the case of aegypti. The model risk is high when Bio-1 and Bio-19 are high ( change from high to low when values change from low to high), and the inverse is true in Bio-2 and Bio-14 in the case of albopictus.