Current trend identification and labeling
Using data from the Census Bureau’s American Community Survey (ACS), we applied MK tests to population data from 2010 to 2020 5-year population estimates at the place level from the U.S. Census. MK test is a nonparametric test to identify whether a time series data has a monotonic increasing (or decreasing) trend or not. The null hypothesis is that the data do not possess any discernible trend. The test statistic (S) is used to label the trends. If S is a large positive number, an upward trend is indicated while a large negative number indicates a downward trend. When the absolute value of S is small, no trend is indicated [35]. Once the trend was found from the MK test, we defined the extent of population loss/gain and labeled cities based on the percent change. Since studies have defined the range of population change differently, no universal standard exists [36]–[38]. To be consistent with the existing literature and to make our finding translatable, we labeled the cities based on average population change and total population change for the 11-year time span for which consistent data are available at the time of this study.
Here, we defined the average loss as mean population change and the total loss as total population change. They are calculated as follows:
$$Total population change rate= \frac{{P}_{n}-{P}_{0}}{{P}_{0}}$$
1
$$Mean population change rate= \frac{{(P}_{n}-{P}_{n-1})/ {P}_{n-1}+\dots + {(P}_{1}-{P}_{0})/ {P}_{0}}{n-1}$$
2
where, \({P}_{n}\) refers to the population in a city at year \(n\). Here, \(n \in \{\text{0,1},2, \dots , 10\}\) for 11 years of data.
These labeling boundaries were derived from the existing literature on shrinking cities since population decline has been diagnosed as a primary indicator of urban shrinkage in both developed and developing regions [37]–[39]. In this study, we used population decline, depopulation, or shrinkage synonymously. Some researchers classify cities as shrinking based on population such as Wiechmann et al. [37] that defined cities with a continuous population decline for more than two years as shrinking cities. Others such as Oswalt et al. [38] labeled cities with over a 10% loss in population or more than 1% average loss as shrinking cities [36]. Based on this information, we opted to label cities with a total population decrease of 10% or lower as either severely or moderately depopulating if the mean population change rate is also below 2.5%. We labeled cities with a total population change rate of −0.5 as depopulating irrespective of their mean population change rate. Some cities were labeled as fluctuating if their mean population change rate is positive while total population change rate is negative and vice-versa since these trends indicate cities experiencing fluctuation in population loss and gain.
A detailed table on the labeling of cities based on mean population change and total population change is presented in Table SI.1.
Data processing for future trend forecasting
Hauer projected the population at the county level up to the year 2100 using historical census data from 1990 to 2015 [16]. To forecast population trends, first we redistributed the county population to each city for all SSPs. Cities can expand over multiple counties. Figure 8 shows that New York City falls into 5 boroughs (county equivalents). We used population ratio to distribute the county level population projections to cities.
Here are the calculation steps to distribute county-level population to cities:
1. Find the intersected area of a city within a county, for cities located in one county it is the area of the city, for cities partially located in multiple counties, it is the intersected area in the respective county.
$${A}_{{i}_{mC}}= {A}_{m}\text{Ո}{A}_{C}$$
3
where \({A}_{{i}_{mC}}\) is the intersected area of city \(m\) that lies in county \(C\). \({A}_{m}\) refers to the total area of the city and \({A}_{C}\) refers to county area.
2. Calculate the area factor as a ratio of the total city area.
$${f}_{{A}_{mC}}= \frac{{A}_{{i}_{mC}}}{{A}_{{l}_{m}}}$$
4
where \({f}_{{A}_{mC}}\) is the area factor and \({A}_{{l}_{m}}\) is the land area of city\(m.\)
3. Calculate density.
$${d}_{m}=\frac{{P}_{{m}_{2020}}}{{A}_{m}}$$
5
4. Find the population in the intersected area for 2020.
$${P}_{{{A}_{i}}_{mC}}={f}_{{A}_{mC}}\bullet {\text{d}}_{\text{m}}\bullet {A}_{m}$$
6
where \({P}_{{{A}_{i}}_{mC}}\) is the population of the intersected area, \({P}_{{m}_{2020}}\) is the Census population of the city in 2020.
5. Calculate the population factor for the intersected area as a ratio of total county population in 2020.
$${f}_{{m}_{i}}= \frac{{P}_{A{i}_{mC}}}{{P}_{{C}_{2020}}}$$
7
where \({P}_{{C}_{2020} }\) is county population and \({f}_{{m}_{i}}\) is the population factor for the intersected area/city.
6. Calculate projected population for each intersected area.
$${P}_{{i}_{mys}}={f}_{{m}_{i}}\bullet {P}_{{C}_{ys}}$$
8
where \({P}_{{i}_{mys}}\) and \({P}_{{C}_{ys}}\) is the projected population for the intersected area of the city and county-level projected population from Hauer’s data for county \(C\) at year \(y\) for SSP scenario \(s\).
7. Aggregate the intersected area population projection to find the projected population for each city.
$${P}_{mys}={\sum }_{i}^{n}{P}_{{i}_{mys}}$$
9
where \({P}_{mys}\) is the projected population for city \(m\) at year \(y\) for SSP scenario \(s\).
For the NCAR data, we used area weighted zonal statistics from overlapping area of 1-km grided projected population and city administrative boundaries to derive city level population projections.
Once city level projections were available from both datasets, we compared the values based on their error in predicting the 2020 city population and generated a weighted population projection for all five scenarios combining both datasets. Since we used the same weights for all five scenarios, no subscript to indicate scenario is used in the equations. The following section shows the weighting process.
Weighting the datasets
For cities that are under projected by one data and over projected by the other data:
$$\varDelta {p}_{{i}_{k}}=\left|{P}_{{{20}_{i}}_{census}}-{P}_{{{20}_{i}}_{k}}\right|$$
10
$${w}_{{i}_{k}}= \frac{1}{\varDelta {p}_{{i}_{k}}}$$
11
$${W}_{{i}_{k}}= \frac{{w}_{{i}_{k}}}{\sum _{k}^{n}{w}_{{i}_{k}} }$$
12
For cities that are either under projected or over projected by both the datasets:
$${r}_{{p}_{{i}_{k}}}= \left|\frac{{P}_{{20}_{{i}_{k}}}}{{P}_{{{20}_{i}}_{census}}}\right|$$
13
$${w}_{i{}_{k}}= \frac{1}{ {r}_{{p}_{{i}_{k}}} }$$
14
$${W}_{{i}_{k}}= \frac{\sum _{k}^{n}{w}_{{i}_{k}} }{n}$$
15
where, \(k= \left\{Haue{r}^{{\prime }}s data , NCAR data \right\} \text{a}\text{n}\text{d} i ϵ C=Cities in the US\).
$${P}_{{w}_{i}}= \sum _{k}^{n}{W}_{{i}_{k}}\bullet {P}_{{20}_{{i}_{k}}}$$
16
where, \(\varDelta {p}_{{i}_{k}}\) is the difference in actual population and forecasted population, \({w}_{{i}_{k}}\) are the weights, \({W}_{{i}_{k}}\) are the weights after normalization and \({{P}_{{{20}_{i}}_{census} ,} {P}_{{{20}_{i}}_{k} }and P}_{{w}_{i}}\) refers to census population in 2020, forecasted population from dataset \(k\) and weighted population for city \(i\) respectively.
Estimating the impact of international immigration
Immigrants constitute a large share of the U.S. population. According to the U.S. Census Bureau, 13% of the U.S. population in 2021 was foreign born [40]. To understand the impact of international immigration, we investigated the racial change to find cities that may have been gaining more immigrants inside a metropolitan statistical area among other cities. The Migration Policy Institute provides a database containing the percent of the population that are immigrants in an MSA for the 2017–2021 period [41]. Since the immigration data refers to people staying in the MSA from 2017–2021, we inspected ACS 5-year estimates of race and ethnicity from 2010 to 2020 for all cities. We used the percent change in different racial groups from the 2010–2016 average to the 2017–2020 average for each racial group to identify the cities that have gained more immigrants compared to the previous time period of analysis.
Although a study on the settlement of immigrants in the U.S. previously found that immigrants settle in the central areas of MSAs and are likely to be geographically concentrated based on country of origin and language spoken [42], more recent studies find that high income immigrants are more likely to settle in suburban cities with access to good public schools [43], [44]. For Asian immigrants, suburban cities overrode the percent of Asians who lived in city centers [45]. Another study on settlement pattern of Chinese immigrants in New York shows that Chinese Americans relocated from Manhattan to the outer boroughs such as Queens and Brooklyn due to lower housing costs [46]. For Mexican immigrants, the settlement areas shifted from MSAs to nonmetropolitan, small cities where manufacturing or service economies were developing [47]. Considering the fact that 27% of the immigrants in the U.S. are Asian immigrants and 44% are of Hispanic or Latino origin, we focused on these two main racial groups to identify cities that are likely to gain population from international immigration [40].
To find the relation between the percentage of immigrants in an MSA and the percentage change in different racial groups for cities inside that MSA, we plotted the percentage change in all racial groups along with the percentage of immigrants in their respective MSA (Figure SI 5). The plot shows that when the percentage of immigrants in an MSA is high, their member cities show a positive change for Hispanic and Asian population from 2016 to 2020. Therefore, we assume that these cities are more likely to gain population from international immigration. However, if Hispanic / Asian population constitute a small share of the city’s total population, the increase will not be high enough to override the loss from depopulation. Therefore, we incorporated a threshold of 10% to identify cities that are more likely to gain population from immigration. The reason behind this 10% threshold is derived from Oswalt et al. [38] that defined a 10% loss from total population as population decline. Therefore, if the increase in immigrant population exceeds this 10% threshold, the added population from immigration will nullify the loss. Here are the steps of calculation:
$$\varDelta {p}_{r}={p}_{{r}_{{m}_{1}}}-{p}_{{r}_{{m}_{2}}}$$
17
$${p}_{i}= \left\{\begin{array}{c}+ if \varDelta {p}_{r}\bullet {P}_{I}>10\\ \pm if \varDelta {p}_{r}\bullet {P}_{I}\le 10\end{array}\right.$$
18
where \(\varDelta {p}_{r}\) refers to percent change in city population of racial group \(r\), \({p}_{{r}_{{m}_{1}}}\)refers to average population of at time interval 1 and \({p}_{{r}_{{m}_{2}}}\) refers to average population at time interval 2 as percent of total city population for racial group \(r\). \({p}_{i}\) refers to the change in immigrant population for a city\(\)and \({P}_{I}\) is the percent immigrants inside an MSA. \({p}_{{r}_{{m}_{1}}}\)and \({p}_{{r}_{{m}_{2}}}\)is derived from ACS 5-year estimates of race and ethnicity and \({P}_{I}\) can be found from the Migration Policy Institute. In Eq. 18, ‘+’ refers to gain in population that can override loss in population for that city whereas ‘±’ refers to there can be gain or loss in population, but the value is not high enough to override loss in depopulating cities.
Classifying cities in an urban-rural continuum to define the degree of urbanization
We classified cities based on the degree of urbanization using four variables:
Specifically, we classified cities as urban if the conditions defined by [48] are met. To differentiate between suburban and periurban, we introduced a criterion based on mean commute time. Cities that belong to an urbanized area are classified as suburban if the mean commute time for the city is less than the mean commute time of the MSA to which the city belongs. Cities with commute times greater than mean MSA commute time were classified as periurban. The remaining cities were classified as rural.
Joining core-based statistical area (CBSA) and Urbanized areas (UA)
To find the allocation factor of each city in the core-based statistical area (CBSA) and urbanized areas (UA), the Geocorr 2022: Geographic Correspondence Engine application by the Missouri Census Data Center (MCDC) was used weighted by the population [49]. CBSA provides the list of Metropolitan and Micropolitan areas in the U.S. To find housing density in the cities, Geocorr 2022 was applied to convert census tracts to cities weighted by the number of housing units. Next, housing unit data from ACS 5-years estimates for the year 2020 was used to calculate the housing unit weighted housing density for the cities.
Table 1
Density variation for cities in the U.S. for different population trends
|
Population Density per km2
|
Current trend
|
mean
|
std
|
25%
|
50%
|
75%
|
max
|
Severely depopulating
|
95
|
218
|
9
|
32
|
98
|
2,534
|
Moderately depopulating
|
267
|
445
|
69
|
157
|
304
|
11,020
|
Slowly depopulating
|
489
|
695
|
118
|
304
|
589
|
29,930
|
Fluctuating
|
337
|
438
|
83
|
206
|
416
|
4,945
|
Slowly increasing
|
801
|
1,022
|
246
|
491
|
1,000
|
22,572
|
Moderately increasing
|
542
|
712
|
159
|
350
|
662
|
16,455
|
Highly increasing
|
357
|
556
|
61
|
180
|
435
|
7,338
|