Spatio-temporal synchronization between population mobility and multi-source CO2 emissions
High synchronization exists between population mobility and CO2 emissions. The correlation between daily national multi-source CO2 emissions, encompassing emissions from seven sources (power, industry, residential consumption, ground transportation, domestic aviation, international aviation, and international shipping), and the total population mobility was Pearson’s r=0.89 during the study period from January 1 to February 29, 2020 (Fig. 1a). Notably, the first two months of 2020 witnessed human mobility being quickly and drastically changed by the COVID-19 lockdowns. Despite these perturbations, the temporal synchronization between CO2 emissions and population mobility remained stable. Daily intercity population movement increased from an average of 53.53 million to a peak of 68.74 million during the Lunar New Year travel season (starting on January 10), subsequently dropping by 71.30% due to travel restrictions. This decline was followed by a gradual recovery as restrictions were lifted. Correspondingly, multi-source CO2 emissions, which averaged 356.52 million kgC/d from January 1 to January 17, experienced a sharp drop to their lowest levels of 209.64 million kgC/d on February 13, marking a 41.20% decrease. Emissions then embarked on a recovery with intermittent fluctuations. This pattern of variation was also mirrored in the CO2 emissions specifically associated with domestic aviation and ground transportation (Fig. 1b, c).
Furthermore, the spatial distribution of multi-source CO2 emissions in China is of significant heterogeneity and unevenly distributed among cities (Fig. 1d). The analysis of regions sharing the same latitude reveals that cities on the east coast exhibit substantially higher CO2 emissions compared to their western counterparts, Longitudinally, CO2 emissions are notably more concentrated in the northern cities. A parallel trend is observed in the distribution of population mobility across cities. These observations demonstrate a spatial consistency between multi-source CO2 emissions and population mobility, with Pearson’s r= 0.49.
These findings confirm that changes in intercity population movements and CO2 emissions exhibit temporal and spatial consistency. Furthermore, the discrepancies in CO2 emissions between cities correspond with the varying connectivity of intercity mobility. Consequently, the large-scale population mobility data illustrated in this study is deemed a reliable predictor of CO2 emissions.
Correlations between mobility network features and CO2 emissions
In this investigation, we focused on examining the complex connection between human mobility and CO2 emissions by analyzing network topology. To achieve this, we created temporal directed mobility networks using both unweighted and weighted approaches (see Methods). From these networks, we extracted different node characteristics to access the activity features of cities and global characteristics for each network snapshot, allowing us to gain a comprehensive understanding of population movement over various time periods. Detailed descriptions of network features analyzed in this study can be found in Supplementary Table 1, which are further categorized into unweighted and weighted node and global features.
Our analysis revealed positive correlations between node features and CO2 emissions, indicating cities with higher transportation accessibility and extensive connections to other cities tend to have increased CO2 emissions (Fig. 1e). Furthermore, all global features demonstrated strong correlations with CO2 emissions, with the majority of |r| values surpassing 0.80. Notably, the number of edges (NE) had a strong positive correlation with CO2 emissions, reaching r=0.86. In contrast, the Strength Assortativity coefficient (SA) had a pronounced negative correlation, quantified at . Further insights into the correlation analysis can be found in Extended Data Table 1, Supplementary Figs. 1 and 2, and Extended Data Fig 2.
More surprisingly, node characteristics derived from unweighted mobility networks showed a stronger correlation with CO2 emissions when compared to their weighted counterparts (Fig. 1e). For example, the correlation values for Inward Closeness Centrality (ICC), Eigenvector Centrality (EC), and Betweenness Centrality (BC) were 0.50, 0.46, 0.38, respectively. In contrast, the correlation of Weighted Inward Closeness Centrality (WICC), Weighted Eigenvector Centrality (WEC), and Weighted Betweenness Centrality (WBC) were 0.31, 0.07, 0.27. This suggests that the network topology features, even without the detailed edge flow information, play a significant role in understanding spatio-temporal variation in CO2 emissions.
Predicting CO2 emissions with mobility network features
In this section, we combined the selected network features from the preceding step with the corresponding geographic and demographical information to predict CO2 emissions. The integration yields a complete dataset with a sample size of 60 (days) × 366 (cities) =21,960. Subsequently, we employed eight distinct machine learning models and two different data split strategies to predict CO2 emissions. Split Strategy 1 involved randomly partitioned all the 21,960 samples, whereas split Strategy 2 entailed partitioning based on days, ensuring that, samples from the same day were simultaneously allocated to either the training or test set (Extended Data Fig. 3).
Among the models tested, the LGBM demonstrated the best performance, achieving R2 value near 1.0 in predicting multi-source CO2 emissions under both data split strategies. The performance of all other models varied slightly depending on the data split strategies, with Strategy 1 generally performing better (Extended Data Table 2).
Fig. 2 shows the agreement between the predicted and observed CO2 in mainland China during the turbulent period of Chinese Spring Festival Travel Rush and strict lockdown measures, indicating that LGBM accurately models the relationship between human mobility and CO2 emissions. Despite the substantial fluctuations in both human mobility and CO2 emissions throughout the study period, the predictions remained satisfactory (R2≥0.98). For split Strategy 1, LGBM outperforms all the rest models with R2=1.00, 0.99, 0.98 for predicting CO2 from multi-source, domestic aviation, and ground transportation, respectively; and for split Strategy 2, LGBM remains the best performed model with R2=1, 0.97 and 0.94 for the three types of CO2. XGB runs next to LGBM as the second-best model in both scenarios, with slightly decreased R2 (0%~1%) and higher RMSE (7%~17%) and MAE (2%~18%). The significant superiority of LGBM and XGB indicates that, the gradient boosting frameworks for ensemble learning are powerful in predicting spatio-temporal carbon emission with high precision, such that there is a great potential of extrapolating the approach to other settings with either missing observable emission data for some places, or for forecasting future trend of emission with human activity data.
Feature importance and robustness analysis
The importance of geographic and demographic information in predicting CO2 emissions was confirmed by SHapley Additive exPlanations (SHAP) values 29 (Methods). This analysis identified latitude (LAT), longitude (LON), and population (POP) as the three most crucial features (Fig. 3a, b, and Supplementary Fig. 3), which consistent with previous studies 30,31. Notably, the substantial difference in winter temperature across different latitudes in China contributes to divergent heating demands. The northern region, characterized by its heavy industrial base, requires more heating compared to the southern region, which is predominantly developed in light industry. This climatic and industrial disparity significantly influences the regional CO2 emissions.
Connectivity and population flow, as represented by network features such as WDAC, WICC, OCC and IS, were found to be crucial for predicting CO2 emissions. To further investigate the contribution of network information to the prediction, an ablation experiment was executed. A null model was built using LGBM trained solely on latitude, longitude and population. The model’s predictions were repeated 200 times with independent random splits to mitigate potential biases from data splitting (Supplementary Figs. 4, 5). Notably, when network features were incorporated, there was a marked improvement in prediction accuracy compared to the null model. For multi-source CO2 emissions, the R2 value increased from 0.93 to 0.98. Additionally, an increase of over 60% was observed across other evaluation settings, such as RMSE and MAE (Fig. 3c, Extended Data Table 3).
To further evaluated the prediction performance, we conducted analysis that involved gradually extending the study period and increasing the time span of training data. Specifically, at each timestep, the model was trained using all historical data, meaning that for prediction the outcome on the nth day, the training set consisted of data accumulated from the preceding n-1 days. The results demonstrate that an increase in the quantity of historical data led to significant improvements in all three prediction performance metrics (Fig. 3d-e). Notably, there was a distinct turning point observed on the seventh day, indicating that even with just one week’s worth of historical data, the method could generate predictions with satisfactory accuracy. This resilience was evident despite substantial fluctuations in population mobility. Moreover, we conducted sensitivity analyses by considering segmentation and scaling in the training and test sets, reaffirming the robustness of our model and ensuring accurate predictions across diverse conditions (Extended Data Fig. 4).
Generalization and extrapolation of the method to Italy, the U.S. and Mexico
To validate the generalizability and practicality of the proposed method, we utilized open-source mobility data from Italy 32, the U.S. 33, and Mexico 34 as a basis for predicting their respective CO2 emissions (Fig. 4a-c). Surprisingly, despite the distinct socio-economic landscapes of these countries, our method consistently demonstrated commendable forecasting accuracy across all the three countries. In the case of Italy, outflows of each province were normalized on a daily basis, leading to the exclusion of weighted network features in our predictive model. Despite this constraint, we achieved an impressive R2 value of 0.91. Similarly, our method demonstrated success in predicting CO2 emissions for the U.S. and Mexico, yielding R2 values of 0.96 and 0.99, respectively (Fig. 4d-f, and in Supplementary Fig. 6). These results demonstrate the generalizability and efficacy of the computational framework across diverse countries, highlighting its considerable potential for monitoring and forecasting carbon emission in LMICs where reliable emission measurements are often deficient.
Delineating CO2 emissions in city clusters through human mobility
Given the notable carbon footprint of city areas, it is imperative to establish specialized carbon mitigation units, tailored to city clusters, for effective monitoring and reduction of carbon emissions. The previous finding of close association between CO2 emissions and human mobility (Figs. 1 and 2), suggests the latter as a promising objective means of delineating city clusters. Therefore, we utilized Louvain algorithm 35 (Methods) to detect communities in mobility networks during the period unaffected by Chunyun and COVID-19 lockdowns (Supplementary Table 2). The study period coincided with the run-up to the annual Chunyun mass migration and COVID-19 epidemic, which was divided into four distinct periods: normal times (January 1-9), Chunyun (January 10-23), stringent travel restrictions (January 24 - February 10), and recovery times (February 11-29).
During the normal times, significant geographical disparities in CO2 emissions across city clusters were observed. Notably, the Beijing-Tianjin-Hebei (62.75 million kgC/d), the Northwest (Ordos) (60.24 million kgC/d), and the Yangtze River Delta (49.71 million kgC/d) were identified as leading emitters, primarily due to their dense population and industrial development activities (Fig. 5a-c). In addition, our findings indicate that polycentric city clusters are associated with higher CO2 emission efficiency compared to city clusters with a concentric structure centered around a single unban core. For example, the Northwest (Changji) city cluster records emissions at 15.32 million kgC/d, significantly lower than 60.24 million kgC/d observed in the Northwest (Ordos).
A detailed examination of CO2 emissions across different time periods reveals a dispersion of emissions throughout China. During the Chunyun period, characterized by high variability in mobility patterns, emissions were more dispersed throughout China (Fig. 5d). Stringent travel restrictions imposed during the COVID-19 pandemic substantially reduced CO2 emissions in all city clusters (Fig. 5e). In the subsequent recovery period, strengthened local interactions within the mobility network resulted in lower CO2 emissions compared to normal levels (Fig. 5f). These investigations, which delve into the examination of CO2 emissions within city clusters, have significant implications for designing of customized CO2 emissions policies. Such an approach can contribute to effective governance of CO2 emissions at an aggregated level.