Large disagreements in estimates of urban land across scales and their implications

doi:10.21203/rs.3.rs-3958909/v1

Download PDF

Article

Large disagreements in estimates of urban land across scales and their implications

https://doi.org/10.21203/rs.3.rs-3958909/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Improvements in high-resolution satellite remote sensing and computational advancements have sped up the development of global datasets that delineate urban land, crucial for understanding climate risks in our increasingly urbanizing world. Here, we analyze urban land cover patterns across spatiotemporal scales from several such current-generation products. While all the datasets show a rapidly urbanizing world, with global urban land nearly tripling between 1985 and 2015, there are substantial discrepancies in urban land area estimates among them influenced by scale, differing urban definitions, and methodologies. We discuss the implications of these discrepancies for several use cases, including for monitoring urban climate risks and for modeling urbanization-induced impacts on weather and climate from regional to global scales. Our results demonstrate the importance of choosing fit for purpose datasets for examining specific aspects of historical, present, and future urbanization with implications for sustainable development, resource allocation, and quantification of climate impacts.

Earth and environmental sciences/Climate sciences/Climate change/Projection and prediction

Earth and environmental sciences/Climate sciences/Climate change/Climate-change impacts

Urbanization, the global shift of rural to urban societies, leads to replacement of natural land with roads, buildings, pavement, parks, etc. and added anthropogenic activities, together impacting local energy, water, and carbon budgets^1,2. Currently, over half of the global population lives in urban areas, which is expected to increase to around 68% by 2050³. These urbanization estimates are defined based on population thresholds, with no standard threshold across countries⁴. Moreover, these population-based definitions do not necessarily correspond to the physical extent of urbanized land, which primarily modulates local to regional climates^2,5, due to differing population densification patterns in different regions of the world.

The proliferation of global satellite imagery and remote sensing techniques has led to estimates of urbanization from a physical perspective using spatially continuous observations of the spectral reflectance and emissions from the Earth’s surface⁶. Both physical and population-based estimates of urban land have a wide range of applications, from quantifying risks to urban populations^7–9, to providing boundary constraints for isolating urban climate impacts^8,10,11, to being incorporated as surface inputs in weather and climate models across scales^12–15. In the last decade in particular, there have been multiple estimates of urban land, or some proxy for urbanization, across space and time¹⁶. These developments have paralleled the rise of cloud computing capabilities and satellite missions with measurements at finer spatial scales. There are currently at least four ~ 10 m resolution global land use land cover products, which include urban classes, and several urban-specific datasets that span multiple decades^16,17.

Due to differences in data sources, methods, and even definitions, there has traditionally been large discrepancies in estimates of urban land from datasets across scales¹⁸. Previous studies that have explored these discrepancies have focused on earlier-generation datasets that were generally coarser (~ 1 km), in line with the resolutions of the commonly deployed Earth observing satellites of the time, and did not include enough observations to provide a time series of urban expansion^18,19. More recent comparisons of higher resolution datasets are limited because they are either restricted to regional extents²⁰, or focus on comparing product accuracies and not area estimates or product typologies²¹. Here we provide a comprehensive comparison of almost all the medium to fine resolution (100 to 10 m) global datasets currently available (see Table S1), showing that definitions of urban remain a critical issue across these datasets, particularly in the newer products. We also examine the consequences of the choice of dataset on a few common use cases relevant for examining urban climate change and its human impacts.

Country level urban land and its variability

Large variabilities in the degree of present-day urban land (for the year 2019 or 2020; see Methods) are seen across countries (Fig. 1a) based on eight global datasets, with Vatican City and Singapore showing the highest values (eight-product mean urban percentages of 79.87% and 53.7%, respectively). Ignoring uninhabited territories, on the low end, there are several overseas island territories, such as Cocos, Midway, and Pitcairn Islands, with negligible urban land detected by these satellite-derived products. Overall, China is the country with the most urban land (264403 sq. km covering 2.82% of its total area based on the eight-product mean), followed by the United States (183735 sq. km; 1.94%), India (85760 sq. km; 2.77%), and Russia (59311 sq. km; 0.35%) (Fig. 1b). The present-day global urban land percentage varies between 0.52% in the World Settlement Footprint (WSF) 2019 dataset²² to a four times higher estimate of 2.07% according to the Esri land Cover product (for the year 2020; 1.93% for 2019)²³. The eight-product mean global urban percentage is ~ 0.95%.

Since countries have different baseline levels of urbanization (Fig. 1a), we calculate the coefficient of variation or Normalized Root-Mean-Square Deviation (standard deviation across products divided by their mean expressed as a percentage) to standardize the degree of disagreement between these eight data products (Fig. 1c). Larger disagreements are seen for Greenland, countries in East Africa (Ethiopia, Kenya, Uganda, and Tanzania), Russia, countries in south Asia (Afghanistan, Pakistan, India, and Myanmar), Paraguay, etc. Better agreements between datasets are seen for Brazil, Argentina, Japan, most countries in western Europe, parts of Central and South Africa, Canada, etc.

Regional- to city-scale differences across datasets

We also compare the present-day estimates of urban land for four distinct regions in the world: the Great Lakes and Mid Atlantic regions in North America, and the Indo-Gangetic and Yangtze River Basins in Asia (Fig. 2a). While all four of these regions are heavily urbanized, in the last few decades, the first two have shown stable urbanization levels, and the latter two have shown significant urban growth. For present-day urban percentage, higher variabilities are seen for the Indo-Gangetic and Yangtze River Basins (coefficients of variation of 88.1% and 83.3%, respectively) than for the Great Lakes and Mid Atlantic regions (56.5% and 62.8%, respectively). The differences between these products are evident at all spatial scales, from global to local. For example, large local scale differences are evident for the highly urbanized Shanghai Metropolitan Area in China (Fig. 2b) and over Delhi, India (Fig. S1). Since illustrating these differences between datasets for all regions of the world is not possible here, we developed a web app for this purpose (https://ee-tc25.projects.earthengine.app/view/urbancomparison).

The disagreements among the data products reflect differences in methods, inputs, and native resolutions^24,25. However, a key difficulty in making any apples-to-apples comparisons between these datasets is that, while all the products represent some aspect of physical urbanization, they define ‘urban’ differently. These specific definitions are already baked into the training data (for supervised learning methods), accuracy estimates, and pre- and post-processing steps. For example, among the four ~ 10 m resolution present-day estimates of urban land – WSF 2019²², Esri Land Cover²⁶, European Space Agency (ESA) WorldCover²⁷, and Dynamic World²⁸ – the Dynamic World dataset calls this class ‘Built Area’ and includes urban vegetation and green space in that definition, while the ESA WorldCover calls the class ‘Built-up’ and explicitly excludes urban vegetation in their class definition (Table S1). Interestingly, while the Esri Land Cover dataset does not mention inclusion of urban trees, it generally shows much higher urban land percentages across scales than ESA WorldCover (Figs. 1b, 2a, 3), even though both are based on Sentinel-2 data. Some of the differences between these three datasets are related to methodology, in that the Dynamic World and Esri Land Cover datasets use convolutional neural networks that consider contextual information in the classification through the use of convolution kernels, while the ESA WorldCover uses a random forest approach with each pixel classified independently¹⁷. Another difference is due to the choice of minimum mapping unit, which is 50 × 50 m for Dynamic World²⁸ and therefore necessitates a mosaic of built and natural surfaces in areas labelled as ‘Built Area’. Finally, the WSF 2019 dataset²² is for human settlements and excludes roads. For readability, and in line with how some of these products have been used in the scientific literature as a proxy for physical urbanization^29–31, we refer to all of them as ‘urban’ in the present study.

Urban growth over time

The explosion of medium-resolution global urban products, and global land cover datasets in general, has been largely made possible due to the free release of the Landsat archive in 2008³². Consequently, there are several long-term estimates of urban land at the Landsat resolution (30 m) starting from the 1980s. In contrast, the first-generation global urban land cover products were generally limited to the Moderate Resolution Imaging Spectroradiometer (MODIS) resolution of 250 to 500 m resolution and starting from the year 2001¹⁹. Some of these multi-year urban land cover products do not extend till 2019/2020 and thus were not included in the earlier comparison of present-day urbanization. In total, we examine twelve global data products, including the complete time series (when multiple years are available) of the eight products considered earlier. The four new datasets considered are the Global Artificial Impervious Area (GAIA)³³, World Settlement Evolution (WSE)²², the Copernicus Global Land Service (CGLS) product³⁴, and the Global Urban Footprint (GUF)³⁵. All long-term urban datasets show large global urban growth over time during their respective time spans (Fig. 3a). For, GISA (Global Impervious Surface Area)³⁶, GAIA, and WSE – the three datasets with longest time series – global urban percentage increased by 297.4%, 123.4%, and 111.2%, respectively (three-product mean of 177.3%), for the 1985-2015 common period. This pattern of rapid urban expansion is consistently echoed across all continents (Fig. 3; Fig. S2a), with Africa, Asia, and South America showing a notable rise (three-product means of 226.2%, 425.3%, and 186.5%, respectively), although from a lower baseline compared to North America and Europe.

The impact of urban definition and methodology also reflects on the variability in the change in urban land percentage over time across datasets. For instance, the percentage of urban land in the WSF 2019 dataset is much lower than the values in WSF 2015³⁷ for all continents. This is for two reasons: 1) the WSF 2019 dataset uses Sentinel-2 instead of Landsat 8, the latter being much finer (10 m versus 30 m); the scale effect²⁵ and 2) the WSF 2019 uses ancillary data to mask out roads to focus only on pixels where people live (Table S1). Another evident difference in time series arises when comparing the MODIS data³⁸ with the others. The global percentage of urban land increases by only 5.5% between 2001 and 2015 according to the MODIS Land Cover; yet the GISA/GAIA/WSE mean change for the same period is around 40.2%, almost an order of magnitude higher. The low estimate of urban expansion in MODIS is a function of its definition of urban as a minimum of 30% impervious at the 500 m scale³⁹. Conceptually, this means that a MODIS pixel starts being classified as urban at a lower percentage than other datasets, which generally consider the dominant land cover (which can exceed 30% of the area) as the class of a pixel or use higher impervious percentage thresholds (50% for GAIA³³, for instance), and that a pixel remains urban over a much larger range of values (from 30–100%). Other key specifics of these differences between the global data products are provided in Table S1.

Implications for observational and modeling applications

Global estimates of urban land have become critical for both science and applications. However, most use cases of these datasets do not simultaneously consider multiple estimates due to a combination of legacy, convenience, and potential redundancies. Here we examine how the choice of dataset may lead to biases for some common use cases. These use cases are divided into: 1) direct incorporation of these products to generate derived datasets, 2) combining global urban datasets with estimates of hazards to quantify environmental risks, and 3) using these products as surface constraints in process-based models.

First, global urban land cover datasets are used as inputs for other derived products. For instance, the most commonly used MODIS land surface temperature (LST) products for urban climate studies (MOD11 and MYD11⁴⁰) use a classification-based emissivity method⁴¹ with the pixel emissivity taken from a look-up table and the class of the pixel according to the MODIS Land Cover product. This needs to be done because LST and surface emissivity cannot be analytically separated using only thermal observations⁴². Similarly, the MODIS evapotranspiration products mask out any pixels that are classified as urban in the MODIS Land Cover data since the empirical model used is not calibrated for urban surfaces⁴³. There are also some inter-dependencies between different global urban land cover datasets. The CGLS product uses WSF 2015 as the training data for its ‘Urban / Built up’ class (Table S1)³⁴, while ‘Urban Areas’, as classified within the ESA CCI product, are identified based on the GUF dataset³⁵ as well as the Global Human Settlement Layer (GHSL)⁴⁴ datasets (Fig. S3). Other composite urban datasets, such as global annual urban dynamics (GAUD) dataset³⁰, have also been generated by combining various existing estimates (GUF, GAIA, GHSL, etc.)

Second, various urban land cover datasets are used as inputs for examining urban climate impacts and city-level environmental risks. The choice of dataset influences the magnitude of these estimates. For the surface urban heat island (SUHI) intensity, the impact of urbanization on local surface warming¹¹, (Fig. 4a; see Methods), larger values of absolute coefficient of variation are seen for urban clusters in the middle East, parts of India, southern and eastern Africa, and the southwest United States. Although most datasets capture the well-established impacts of background climate on SUHI and its seasonality⁴⁵, the choice of dataset can have larger impacts in arid regions during summer and for polar climate in winter i.e. when the actual SUHI signal is small, with inconsistencies in the sign of the SUHI seen (Fig. S5). Long-term changes in urban land are often combined with ancillary datasets to examine land use transitions³¹ and exposure to environmental hazards over time^29,11,8,9. The rates of change over time would depend on the choice of dataset (Fig. 3), while most studies typically use a single product. For instance, ESA CCI Land Cover, GHSL, MODIS Land Cover, WSE have been individually used in these types of studies^29,11,8,9. Andreadis et al. (2022)²⁹ and Rentschler et al. (2023)⁸ both examined increased urbanization in flood-prone areas using two different urban land cover datasets (GAUD and WSE, respectively); therefore finding different magnitudes of these changes. We replicate a comparison of urban growth in flood plains⁴⁶ between 1985 and 2015 based on GAIA, WSE, and GISA here (Fig. 4b), with particularly large differences seen for Asia and Oceania. Sometimes the choice of dataset can lead to artifacts due to mismatch between two products. For instance, Mentaschi et al. (2022)⁹ combined the GHSL 2018 dataset with the MYD11 LST product to estimate intra-urban SUHI extremes. However, as noted earlier, this LST product is constrained by the MODIS Land Cover through the classification-based emissivity method⁴¹. As such, we would expect artifacts in LST for a proportion of pixels due to the GHSL data considering a pixel as urban while the emissivity in the LST product being defined for a rural surface. Similar artifacts would be expected for other combinations of MODIS LST with non-MODIS urban land cover estimates^47,48 or when MODIS LST is used to validate simulations from models that use different urban emissivity constraints^42,45,49.

Third, urban land cover products are incorporated into process-based models, including weather and climate models, as surface input datasets^2,14,23,50. Since different land cover types in land models use distinct prescribed radiative, thermodynamic, and aerodynamics properties, the land cover data used strongly modulates crucial variables like the components of the surface energy budget^5,45,49 and thus the lower boundary conditions for the atmosphere in coupled model simulations. One of the most common mesoscale models used for urban climate research – the Weather Research and Forecasting (WRF) model^12,51,52 – uses the MODIS urban land cover as the default surface dataset. Newer versions of the urban components of this model can also use the local climate zone (LCZ) classification system⁵³, with a recent global 100 m dataset planned to become the default urban representation for future releases of WRF¹⁵. Earth system models (ESMs) rarely resolve urban areas, but one of the few such ESMs with an urban representation – the Community Earth System Model (CESM)^54–56 – uses a circa 2001 estimate of urbanization¹⁴. This urban dataset is also used in other ESMs that have branched off from CESM^57,58 and has also been incorporated into regional models⁵⁹. Large differences between these three products (MODIS Land Cover, Demuzere et al. (2022)¹⁵, and Jackson et al. (2010)¹⁴) as well as other present-day estimates of urban land are evident (Figs. 5a, S4). Note that except the MODIS Land Cover data, the other two are not pure estimates of physical urbanization. Jackson et al. (2010)¹⁴ actually uses population-based thresholds of urban density while several of the LCZ classes represent different mixes between built and natural surfaces. For example, LCZ9, the sparely built class, is characterized by a high abundance of natural land cover and behaves thermally like a natural land-cover, and is often excluded as a built up class⁶⁰. As such, there are potential mismatches here that should be kept in mind. For example, since CESM does not use the low-density urban class within the urban model⁶¹, it is implicitly assumed that anything up to medium-density would be an appropriate representation of physical urbanization at the grid-scale, which leads to massive overestimation for regions of the world with high population density and low physical urbanization such as Asia and Africa (Fig. 5b). Similarly, since the MODIS Land Cover can be as low as 30% impervious, the model may overestimate urban thermodynamic/radiative/aerodynamic impacts on climate^2,49 for urban-to-rural transition zones since urban models have not traditionally accounted for urban vegetation⁵⁰. As the newer urban land cover datasets are incorporated into these models, it is important to be aware of consistency in definitions. Using Dynamic World, which includes some vegetation in the urban class, is not appropriate if the model treats the entire urban grid as an impervious surface, such as in CESM, its offshoots^57,58, and most versions of WRF. Similarly, since WSF 2019 removes urban roads, incorporating this dataset into a process-based model will capture only a fraction of the physical impact of urbanization on weather and climate.

In light of the development of multiple new global urban datasets at finer resolutions in the last decade, our goal here was to examine whether these state-of-the-art products provide better constraints on our understanding of urbanization across scales, particularly for applications of these datasets in modeling and observational studies. We find large disagreements between the global urban data products across spatiotemporal scales. In fact, the largest divergence between datasets is seen for the most recent years from global to continental to country scales (Figs. 3, S6) on inclusion of the new 10 m resolution products. At this resolution, it is possible to partially resolve urban vegetation, settlements, and roads, making the different urban definitions produce larger variations, with the Esri Land Cover and WSF 2019 datasets showing the highest and lowest urban land percentage, respectively (Fig. 1b). This variability in urban estimates across scales underscores the challenge in achieving a standardized measure of urban land even with globally available satellite observations. Furthermore, we discuss the implications of these differences for several use cases (Figs. 4, 5), which requires being cognizant of the specifics of the datasets and any potential dependencies to ensure application-appropriate analyses.

The differences in present-day and historical estimates of urban land also influence future urban projections. Several products have recently been developed to represent future urbanization scenarios^62–65 that can be used directly or incorporated into weather and climate models^12,51,66. The large differences between these future estimates depend on methodology (different growth models), input data (choice of historical urbanization estimate for model calibration), and assumed scenario of urbanization (Fig. 6). For instance, Gao & O'Neill (2020)⁶³ consider distinct urbanization patterns across 375 sub-regions, while Chen et al. (2020)⁶² use only 32 regions. Although both datasets are trained using GHSL, the Chen et al. (2020)⁶² data are further calibrated against the ESA CCI estimate for 2015. The Li et al. (2021)⁶⁴ and He et al. (2023)⁶⁵ datasets are trained using annual nighttime light observations and the GAIA data, respectively. Structural differences between models are commonplace in the Earth sciences, which has encouraged the use of ensemble estimates to lend robustness to projections and for uncertainty quantification. Surface datasets, such as those for urban land, are an additional free parameter that can be largely decoupled from implementations of model physics. Uncertainty estimates due to differences in land use projections are much rarer⁶⁷, and do not currently resolve urbanization at finer scales. Based on the differences seen here, we recommend using multiple datasets, when possible and when the definitions of ‘urban’ align, to provide more robust estimates of uncertainties for urban-resolving climate projections and to better quantify risks for rapidly urbanizing populations as we prepare for a warmer and more urban world.

Datasets

We consider multiple global urban land cover datasets that have been developed over the last couple of decades to both examine differences between them across spatiotemporal scales and to discuss the impacts of these differences on a few use cases. Our focus here is primarily on datasets that have 100 m or finer resolutions, with the majority being derived from Landsat or Sentinel-2 satellite observations. We also consider the Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover and European Space Agency Climate Change Initiative (ESA CCI) Land Cover datasets, which are at ~ 500 m and ~ 300 m, respectively. The former is one of the few physical estimates of global urban land that has been continuously updated since Potere et al. (2009)¹⁹ and the latter because it is one of the few land cover products based on the Medium Resolution Imaging Spectrometer (MERIS) and has been used for multiple applications²³. We do not consider any regional land cover datasets or land cover datasets released after 2021. This is why we do not include the latest version of the Global Human Settlement Layer (GHSL). However, we provide results for 2018 in the supplementary information (Fig. S3) for a discussion about potential mismatches for urban applications (see Results section). For reference, the global urban percentage in GHSL 2018 is lower (0.51%) than the corresponding MODIS estimate (0.59%). Among the datasets considered with varying time series, we choose those with data for 2019 and/or 2020 as present-day estimates. This is done because that maximizes the number of datasets that can be used for this comparison since several global 10 m land cover datasets were released in 2020 and some datasets end in 2019. Combining the two years should have minimal impact on differences between products since a single year would not lead to major urban changes and because 2020 was also the year of multiple COVID-19 lockdowns that significantly halted infrastructure development projects. The earliest year considered for multi-year datasets is 1985. It should be noted that new global land cover datasets are being developed at a rapid rate. We did not consider some datasets since they have not been publicly released yet⁶⁸ and some because they are essentially combinations of other datasets over the similar time periods³⁰. Overall, we aimed for our selection of datasets to represent the primary modes of variability in resolution, methodology, urban definitions, and time spans. Table S1 provides an overview of all these datasets, including the urban definition used and other notes relevant to this study.

In addition to the satellite-derived estimates of urban land, we consider two global datasets that are used in regional and global urban modeling. First is the recent 100 m global urban local climate zone estimates by Demuzere et al. (2022)¹⁵, which will be the default urban representation for future releases of the Weather Research and Forecasting (WRF) model, the most commonly used mesoscale model for urban climate studies^12,51,52. Second is the 1 km estimate of urban densities used in global models such as the Community Earth System Model (CESM)⁶¹, the Energy Exascale Earth System Model (E3SM)⁵⁷, and the Climate Change coupled climate model (CMCC-CM2)⁵⁸, as well as regional models like RegCM (Regional Climate Model)⁵⁹, from Jackson et al. (2010)¹⁴. The former is valid for the year 2018 while the latter is for 2001. While the Demuzere et al. (2022)¹⁵ dataset maps 17 LCZs, we only consider the 10 LCZs that are directly relevant to the built environment for our analysis (Figs. 5a, S2b, S4).

Finally, we consider four recent projections of future urbanization under various Shared Socioeconomic Pathways (SSPs), which are socioeconomic equivalents to future emission scenarios⁶⁹. The resolutions of these datasets range from 1/8th degree (with fractional urban land) in Gao & O'Neill (2020)⁶³ to 1 km in Chen et al. (2020)⁶², Li et al. (2021)⁶⁴, and He et al. (2023)⁶⁵. This resolution would be considered coarse in the current remote sensing literature and fine in the climate modeling domain. While He et al. (2023)⁶⁵ includes more scenarios than the other datasets, only the common five (SSP1, SSP2, SSP3, SSP4, and SSP5) are used for the comparison (Fig. 5). SSP1 represents the ‘sustainability’ scenario, SSP2 corresponds to the ‘middle-of-the-road’ scenario, SSP3 is the ‘regional rivalry’ scenario, SSP4 is the ‘inequality’ scenario, and SSP5 denotes the ‘high-emission’ scenario⁶⁹.

Regions of interest

We consider four sets of regions of interest in this study to calculate total urban land for each. First, we consider all countries as recognized by the World Bank (Fig. 1). No disputed territories are considered, which cover a negligible portion of the global land surface. Second, we consider four regions of interest, namely the Great Lakes region, Mid Atlantic region, Indo-Gangetic Basin, Yangtze River Basin (Fig. 2a) to illustrate the variability between datasets at the regional scale. Third, we consider the Köppen-Geiger climate zones⁷⁰ to examine the variability of the surface urban heat island intensity for different background climate (see more below). Finally, we divide the global land surface into 0.9° latitude x 1.25° longitude grids to estimate grid-level urban percentage. This is a common resolution used to run CESM⁵⁵.

Surface Urban Heat Island estimates

We illustrate the role of the choice of urban land cover dataset on a well-known urban climate signal – the surface urban heat island (SUHI) effect. We calculate the SUHI for over 10,000 urban clusters using the Simplified Urban Extent algorithm, which has been used in the urban climate literature in the past to examine the SUHI across scales^42,48. Of note, the algorithm separately calculates the urban and rural land surface temperature (LST) for each cluster, their difference being the SUHI intensity. The urban LST is the average LST of all urban pixels (for each of the 8 present-day estimates of urban land) within a cluster, while the average LST of the non-urban pixels is the rural LST. The LST data used here are from the Landsat collection 2 science product⁷¹ for 2018 to 2022, which covers the time of the 2019/2020 global urban land cover datasets used. Water pixels are masked out for both urban and rural cases before generating the urban and rural LST based on the Global Surface Water product⁷². Due to the 16-day return period of Landsat 8, multiple years of data are needed to get sufficient clear-sky observations. Separate analyses are done for summer (June, July, and August for clusters whose centroids are in the northern hemisphere and December, January, February for clusters in the southern hemisphere) and winter (vice versa) after quality controlling all Landsat image using pixel-level quality control flags to minimize contamination from clouds and cloud shadows (Figs. 4a, S5).

Urban growth in flood plains

We examine the impact of choice of different estimates long-term urbanization on urban risk analysis following Andreadis et al. (2022)²⁹ and Rentschler et al. (2023)⁸. For this, we consider the GAIA (Global Artificial Impervious Area)³³, GISA (Global Impervious Surface Area)³⁶, and WSE (World Settlement Evolution)²² datasets, which have the longest time series. For the first and last years of the common period (1985 and 2015, respectively), we calculate the total urban land globally and by continent that overlaps with the Global high-resolution floodplains dataset⁴⁶. The percentage change between 1985 and 2015 is calculated for the world and each continent by dataset (Fig. 4b). The most urban growth is seen for GAIA and the lowest for GISA or WSE (depending on continent).

Grid-wise comparison between satellite-derived and model-prescribed urban datasets

For each 0.9° latitude x 1.25° longitude grid on the Earth’s surface, we calculate the urban percentage for 2001 from GISA and from the sum of the medium density, high density, and tall buildings district classes of the Jackson et al. (2010)¹⁴ dataset. GISA is used since it shows the highest accuracy among the urban datasets for present day (Table S2; also discussed later) and the year 2001 is considered since it is the approximate validity of the Jackson et al. (2010)¹⁴ estimate. The low density class from Jackson et al. (2010)¹⁴ is not included since it is not considered in CESM simulations⁶¹. Only the grids with positive values from both datasets are considered. Jackson et al. (2010)¹⁴ detects urban land in much fewer grids (6858) compared to GISA (8587). Separate correlations are shown for all common global grids and by continent (Fig. 5b). The main accuracy metrics used are mean bias error (MBE) and mean percentage error (MPE). As an example, for Asia, the mean urban percentage in Jackson et al. (2010)¹⁴ is 1.18% higher than the value detected by GISA based on all common grids with any urban land. In percentage terms (percentage of a percentage), Jackson et al. (2010)¹⁴ shows over double the urban percentage (MPE = 136.2%) than GISA for Asia.

Data processing

All the datasets are processed on the Google Earth Engine cloud computing platform⁷³. The total area of each region of interest (the denominator to estimate urban percentage) is the geometric area of each vector (corresponding to countries or regions). For summarizing the total urban area of these regions of interest, we calculate the sum of area of ‘urban’ pixels within each vector using the native resolution of the corresponding dataset as the scale of aggregation. The country level regions of interest and another set of boundaries for the European and Asian part of Russia are combined to summarize results by continent. For the SUHI estimation, a scale of 100 m is used for all cases, which is the native resolution of the Landsat 8 thermal band. Among the global urban land cover datasets considered, Dynamic World is unique in that a classification is done for every Sentinel-2 scene. Here we only consider the pixels as urban if the mode of all the overlapping scenes for the year 2020 are urban. Comparisons of median, mode, and means of these images show relatively small differences¹⁷.

Validation

Our primary goal in this study was not to focus on comprehensive accuracy assessments of these datasets. This is because of two main reasons. First, there have been multiple accuracy assessments of global land cover estimates across scales^{16,17,19,20,33}. Second, given the differences in urban definitions in these datasets, standard accuracy estimates may not be particularly helpful. There has been discussion about the term ‘ground-truth’ in the broader remote sensing community that is relevant here⁷⁴. However, as a sanity check, we provide a basic accuracy estimates of the eight datasets used for representing present-day (2019/2020) urban land using the validation dataset created by the Dynamic World team²⁸. The development of the training data employed 70 annotators, who manually labeled land use and land cover types in high-resolution images from Sentinel-2 for random dates in 2019. The annotation was done following the classification typology of the Dynamic World dataset. We chose this dataset since it is the largest available validation data that is relevant at the 10 m scale. Our accuracy estimates for the world and all the continents are summarized in Table S2 and show the percentage of the urban pixels in the reference that are correctly identified as urban in the dataset (overall accuracy). Based on this assessment, the GISA and WSF 2019 products perform the best followed by Esri Land Cover and Dynamic World. Overall, MODIS Land Cover performs the worst. However, beyond this sanity check, we should be cautious about the implications of these accuracy estimates. As an example, note that the urban definition in the Dynamic World dataset includes a mixture of residential buildings, streets, lawns, trees, isolated residential structures or buildings surrounded by vegetative land cover (Table S1). Therefore, it is a mosaiced land cover definition, partly because it uses minimum mapping unit of 50 x 50 m. Other products like World Cover with 10 x 10m or GISA with 30 x 30m minimum mapping units may not be directly comparable (Table S1). Another example of a typological difference is that WSF 2019 does not even include roads. Basically, due to mismatch between land cover typology in the global datasets and the typology considered when creating ground truth data, it is difficult to provide an unbiased conclusion about relative accuracies of these datasets.

Acknowledgments

Pacific Northwest National Laboratory is operated for the U.S. Department of Energy (DOE) by Battelle Memorial Institute under contract DE-AC05-76RL01830. This study is supported by a DOE Early Career award to T.C. as well as the Coastal Observations, Mechanisms, and Predictions Across Systems and Scales-Great Lakes Modeling (COMPASS-GLM) and Integrated Coastal Modeling (ICoM) projects. The latter two are multi-institutional projects supported by DOE’s Office of Science’s Office of Biological and Environmental Research. L.Z. acknowledges the support by the U.S. National Science Foundation (CAREER Award Grant No. 2145362) and the Institute for Sustainability, Energy, and Environment at the University of Illinois Urbana-Champaign. We thank Samapriya Roy and the Earth Engine Community Data Catalog for providing seamless access to many of these data products and Mattia Marconcini for discussions on the world settlement footprint suite of products.

Author contributions

T.C. designed the study, processed the satellite observations, analyzed the data, and wrote the manuscript. Z.S.V., M.D., W.Z., J.G., L.Z., and Y.Q. provided comments and suggestions on the research design and writing.

Data availability

All data will be made available by the authors upon publication of the manuscript.

Code availability

The codes used for data analysis will be made available by the authors upon publication of the manuscript.

Elmqvist, T. et al. Urbanization in and for the Anthropocene. Npj Urban Sustain. 1, 6 (2021).
Qian, Y. et al. Urbanization Impact on Regional Climate and Extreme Weather: Current Understanding, Uncertainties, and Future Research Directions. Adv. Atmospheric Sci. (2022) doi:10.1007/s00376-021-1371-9.
UNDESA, P. World urbanization prospects: the 2018 revision. Retrieved August 26, 2018 (2018).
Ritchie, H. & Roser, M. Urbanization. Our World Data (2018).
Oke, T. R., Mills, G., Christen, A. & Voogt, J. A. Urban Climates. (Cambridge University Press, 2017).
Zhu, Z. et al. Understanding an urbanizing planet: Strategic directions for remote sensing. Remote Sens. Environ. 228, 164–182 (2019).
Tuholske, C. et al. Global urban population exposure to extreme heat. Proc. Natl. Acad. Sci. 118, (2021).
Rentschler, J. et al. Global evidence of rapid urban growth in flood zones since 1985. Nature 622, 87–92 (2023).
Mentaschi, L. et al. Global long-term mapping of surface temperature shows intensified intra-city urban heat island extremes. Glob. Environ. Change 72, 102441 (2022).
Iungman, T. et al. Cooling cities through urban green infrastructure: a health impact assessment of European cities. The Lancet 401, 577–589 (2023).
Liu, Z. et al. Surface warming in global cities is substantially more rapid than in rural background areas. Commun. Earth Environ. 3, 1–9 (2022).
Gao, J. & Bukovsky, M. S. Urban land patterns can moderate population exposures to climate extremes over the 21st century. Nat. Commun. 14, 6536 (2023).
Ching, J. et al. WUDAPT: An urban weather, climate, and environmental modeling infrastructure for the anthropocene. Bull. Am. Meteorol. Soc. 99, 1907–1924 (2018).
Jackson, T. L., Feddema, J. J., Oleson, K. W., Bonan, G. B. & Bauer, J. T. Parameterization of Urban Characteristics for Global Climate Modeling. Ann Assoc Am Geogr 100, 848–865 (2010).
Demuzere, M. et al. A global map of Local Climate Zones to support earth system modelling and urban scale environmental science. Earth Syst. Sci. Data Discuss. 2022, 1–57 (2022).
Ren, H. et al. Mapping High-Resolution Global Impervious Surface Area: Status and Trends. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. (2022).
Venter, Z. S., Barton, D. N., Chakraborty, T., Simensen, T. & Singh, G. Global 10 m Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and Esri Land Cover. Remote Sens. 14, 4101 (2022).
Potere, D. & Schneider, A. A critical look at representations of urban areas in global maps. GeoJournal 69, 55–80 (2007).
Potere, D., Schneider, A., Angel, S. & Civco, D. L. Mapping urban areas on a global scale: which of the eight maps now available is more accurate? Int. J. Remote Sens. 30, 6531–6558 (2009).
Zheng, K., He, G., Yin, R., Wang, G. & Long, T. A Comparison of Seven Medium Resolution Impervious Surface Products on the Qinghai–Tibet Plateau, China from a User’s Perspective. Remote Sens. 15, 2366 (2023).
Huang, X. et al. Toward accurate mapping of 30-m time-series global impervious surface area (GISA). Int. J. Appl. Earth Obs. Geoinformation 109, 102787 (2022).
Marconcini, M., Metz-Marconcini, A., Esch, T. & Gorelick, N. Understanding current trends in global urbanisation-the world settlement footprint suite. GI_Forum 9, 33–38 (2021).
Bontemps, S. et al. Consistent global land cover maps for climate modelling communities: current achievements of the ESA’s land cover CCI. in Proceedings of the ESA living planet symposium, Edimburgh 9–13 (2013).
Liu, Z., He, C., Zhou, Y. & Wu, J. How much of the world’s land has been urbanized, really? A hierarchical framework for avoiding confusion. Landsc. Ecol. 29, 763–771 (2014).
Woodcock, C. E. & Strahler, A. H. The factor of scale in remote sensing. Remote Sens. Environ. 21, 311–332 (1987).
Karra, K. et al. Global land use/land cover with Sentinel 2 and deep learning. in 2021 IEEE international geoscience and remote sensing symposium IGARSS 4704–4707 (IEEE, 2021).
Zanaga, D. et al. ESA WorldCover 10 m 2020 V100, Zenodo. (2021).
Brown, C. F. et al. Dynamic World, Near real-time global 10 m land use land cover mapping. Sci. Data 9, 251 (2022).
Andreadis, K. M. et al. Urbanizing the floodplain: global changes of imperviousness in flood-prone areas. Environ. Res. Lett. 17, 104024 (2022).
Liu, X. et al. High-spatiotemporal-resolution mapping of global urban change from 1985 to 2015. Nat. Sustain. 3, 564–570 (2020).
van Vliet, J. Direct and indirect loss of natural area from urban expansion. Nat. Sustain. 2, 755–763 (2019).
Wulder, M. A. et al. Fifty years of Landsat science and impacts. Remote Sens. Environ. 280, 113195 (2022).
Gong, P. et al. Annual maps of global artificial impervious area (GAIA) between 1985 and 2018. Remote Sens. Environ. 236, 111510 (2020).
Buchhorn, M. et al. Copernicus global land cover layers—collection 2. Remote Sens. 12, 1044 (2020).
Esch, T. Breaking new ground in mapping human settlements from space – The Global Urban Footprint. ISPRS J Photogramm Remote Sens 134, 30–42 (2017).
Huang, X. et al. 30 m global impervious surface area dynamics and urban expansion pattern observed by Landsat satellites: From 1972 to 2019. Sci. China Earth Sci. 64, 1922–1933 (2021).
Marconcini, M. Outlining where humans live, the World Settlement Footprint 2015. Sci Data 7, 242 (2020).
Sulla-Menashe, D. & Friedl, M. A. User guide to collection 6 MODIS land cover (MCD12Q1 and MCD12C1) product. Usgs Rest. Va Usa 1, 18 (2018).
Huang, X., Huang, J., Wen, D. & Li, J. An updated MODIS global urban extent product (MGUP) from 2001 to 2018 based on an automated mapping approach. Int. J. Appl. Earth Obs. Geoinformation 95, 102255 (2021).
Wan, Z. MODIS land surface temperature products users’ guide. Inst. Comput. Earth Syst. Sci. Univ. Calif. St. Barbara CA USA 805, (2006).
Snyder, W. C., Wan, Z., Zhang, Y. & Feng, Y.-Z. Classification-based emissivity for land surface temperature measurement from space. Int. J. Remote Sens. 19, 2753–2774 (1998).
Chakraborty, T. C., Lee, X., Ermida, S. & Zhan, W. On the land emissivity assumption and Landsat-derived surface urban heat islands: A global analysis. Remote Sens. Environ. 265, 112682 (2021).
Mu, Q., Zhao, M. & Running, S. W. MODIS global terrestrial evapotranspiration (ET) product (NASA MOD16A2/A3). Algorithm Theor. Basis Doc. Collect. 5, 600 (2013).
European Commission. Joint Research Centre. GHSL Data Package 2023. (Publications Office, LU, 2023).
Zhao, L., Lee, X., Smith, R. B. & Oleson, K. Strong contributions of local background climate to urban heat islands. Nature 511, 216–219 (2014).
Nardi, F., Annis, A., Di Baldassarre, G., Vivoni, E. R. & Grimaldi, S. GFPLAIN250m, a global high-resolution dataset of Earth’s floodplains. Sci. Data 6, 1–6 (2019).
Venter, Z. S., Chakraborty, T. & Lee, X. Crowdsourced air temperatures contrast satellite measures of the urban heat island and its mechanisms. Sci. Adv. 7, eabb9569 (2021).
Hsu, A., Sheriff, G., Chakraborty, T. & Manya, D. Disproportionate exposure to urban heat island intensity across major US cities. Nat. Commun. 12, 2721 (2021).
Brousse, O. et al. The local climate impact of an African city during clear‐sky conditions—Implications of the recent urbanization in Kampala (Uganda). Int. J. Climatol. 40, 4586–4608 (2020).
Masson, V. et al. City-descriptive input data for urban climate models: Model requirements, data sources and challenges. Urban Clim. 31, 100536 (2020).
Krayenhoff, E. S., Moustaoui, M., Broadbent, A. M., Gupta, V. & Georgescu, M. Diurnal interaction between urban expansion, climate change and adaptation in US cities. Nat. Clim. Change 8, 1097–1103 (2018).
Krayenhoff, E. S. Cooling hot cities: a systematic and critical review of the numerical modelling literature. Env. Res Lett 16, 053007 (2021).
Stewart, I. D. & Oke, T. R. Local climate zones for urban temperature studies. Bull. Am. Meteorol. Soc. 93, 1879–1900 (2012).
Zhang, K. et al. Increased heat risk in wet climate induced by urban humid heat. Nature 1–5 (2023).
Zhao, L. Global multi-model projections of local urban climates. Nat Clim Change 11, 152–157 (2021).
Li, D. et al. Urban heat island: Aerodynamics or imperviousness? Sci. Adv. 5, eaau4299 (2019).
Caldwell, P. M. The DOE E3SM Coupled Model Version 1: Description and Results at High Resolution. J Adv Model Earth Syst 11, 4095–4146 (2019).
Cherchi, A. et al. Global mean climate and main patterns of variability in the CMCC‐CM2 coupled model. J. Adv. Model. Earth Syst. 11, 185–209 (2019).
Elguindi, N. et al. Regional climate model RegCM: reference manual version 4.5. Abdus Salam ICTP Trieste 33, (2014).
Demuzere, M. et al. Combining expert and crowd-sourced training data to map urban form and functions for the continental US. Sci. Data 7, 264 (2020).
Oleson, K. W. & Feddema, J. Parameterization and surface data improvements and new capabilities for the Community Land Model Urban (CLMU). J. Adv. Model. Earth Syst. 12, e2018MS001586 (2020).
Chen, G. Global projections of future urban land expansion under shared socioeconomic pathways. Nat Commun 11, 537 (2020).
Gao, J. & O’Neill, B. C. Mapping global urban land for the 21st century with data-driven simulations and Shared Socioeconomic Pathways. Nat Commun 11, 1–12 (2020).
Li, X. et al. Global urban growth between 1870 and 2100 from integrated high resolution mapped data and urban dynamic modeling. Commun. Earth Environ. 2, 1–10 (2021).
He, W. et al. Global urban fractional changes at a 1 km resolution throughout 2100 under eight scenarios of Shared Socioeconomic Pathways (SSPs) and Representative Concentration Pathways (RCPs). Earth Syst. Sci. Data 15, 3623–3639 (2023).
Marcotullio, P. J., Keßler, C. & Fekete, B. M. Global urban exposure projections to extreme heatwaves. Front. Built Environ. 8, 947496 (2022).
Lawrence, D. M. The Land Use Model Intercomparison Project (LUMIP) contribution to CMIP6: rationale and experimental design. Geosci Model Dev 9, 2973–2998 (2016).
Friedl, M. A. et al. Medium Spatial Resolution Mapping of Global Land Cover and Land Cover Change Across Multiple Decades From Landsat. Front. Remote Sens. 3, 894571 (2022).
O’Neill, B. C. et al. The roads ahead: Narratives for shared socioeconomic pathways describing world futures in the 21st century. Glob. Environ. Change 42, 169–180 (2017).
Rubel, F. & Kottek, M. Observed and projected climate shifts 1901-2100 depicted by world maps of the Köppen-Geiger climate classification. Meteorol. Z. 19, 135 (2010).
Earth Resources Observation And Science (EROS) Center. Collection-2 Landsat 8-9 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) Level-2 Science Products. U.S. Geological Survey https://doi.org/10.5066/P9OGBGM6 (2013).
Pekel, J.-F., Cottam, A., Gorelick, N. & Belward, A. S. High-resolution mapping of global surface water and its long-term changes. Nature 540, 418–422 (2016).
Gorelick, N. et al. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).
Woodhouse, I. H. On ‘ground’truth and why we should abandon the term. J. Appl. Remote Sens. 15, 041501–041501 (2021).

There is NO Competing Interest.

Supplementarydraftv3.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Large disagreements in estimates of urban land across scales and their implications

Status:

Version 1

Abstract

Figures

Introduction

Results

Discussion

Methods

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1