3.2 Data
We gathered data from multiple sources. Bike-sharing data were obtained from the City of Chicago (Chicago Data Portal, 2020b). The Chicago Department of Transportation launched its stationed bike-sharing system (Divvy) in June 2013. After several expansions, the Divvy system consisted of a total of 654 stations within 367 census tracts by 2019. We focus on Divvy trips conducted during weekday peak hours (7:00 a.m. to 9:59 a.m.), defined by Chicago Metropolitan Agency for Planning (CMAP), during the 12-month period in 2019, before COVID-19. Divvy trip data are provided at the station level, and we aggregated the trip data by the origin census tracts.
Figure 2 shows the number of Divvy stations by census tract. In the 801 census tracts, 367 of them have Divvy stations. Tracts within and close to downtown Chicago tend to have more Divvy stations. Accordingly, Divvy trips tend to concentrate near downtown but are scarce in remote areas. Figure 2 also shows that Divvy trips departed from 590 stations in 313 census tracts with Divvy stations, which are located in the central and northern parts of the city, whereas the remaining 54 (= 367 − 313) census tracts with Divvy stations, mostly in the south side Chicago, did not originate any trips. In the following analysis, trip data in the 367 tracts are used to examine the association between neighborhood characteristics and bike-sharing usage.
We collected census-tract level socioeconomic, employment, transportation services, and built environment data from five sources. (1) The socioeconomic data come from the 2013–2018 5-year average American Community Survey (ACS) (U.S. Bureau of the Census, 2018). (2) Employment data come from the 2017 LEHD Origin-Destination Employment Statistics (LODES) (U.S. Bureau of the Census, 2020). (3) Transportation services data, i.e., the number of bus stops and train stations, are calculated based on data from the City of Chicago data portal. (4) The built environment data come from the 2013 Smart Location Database (U.S. Environmental Protection Agency, 2013). (5) We calculated the straight-line distance between downtown (the Loop) and each census tract in GIS (Chicago Data Portal, 2020a).
Table 1 gives descriptive statistics for all tracts in the city of Chicago (801 tracts) and tracts with Divvy stations (367 tracts) and without Divvy stations (434 tracts). The number of Divvy stations ranges from 1 to 17. The last column shows that tracts with Divvy stations tend to have a larger population, more carless households, a higher percentage of the high-education population, and lower-income, compared with tracts without Divvy stations. Additionally, total employment, employment density, rail density, bike lane density, and road density in the 367 census tracts are also higher.
Table 1
| City of Chicago N = 801 | w/ Divvy stations N = 367 | w/o Divvy stations N = 434 | Diff between w/ and w/o Divvy stations |
Min. | Max. | Mean | Mean | Mean |
Bike-sharing data |
Origin trips (peak hours) for 2019 | 0 | 46,951 | 1,884 | 1,884 | | |
Stations | 1 | 17 | 2 | 2 | | |
Docks | 6 | 467 | 30 | 30 | | |
Socioeconomic characteristics |
Population | 0 | 19,889 | 3,422 | 3,520 | 3424 | 96* |
# of zero-vehicle households | 0 | 3,473 | 354 | 460 | 264 | 196*** |
Median household income ($) | 9,787 | 178,750 | 57,298 | 52,152 | 52,618 | -466*** |
% ppl 25 years and over with a bachelor's degree or higher | 0.5 | 0.94 | 0.36 | 0.45 | 0.27 | 0.18*** |
Employment |
Total employment | 2 | 331,288 | 1,734 | 2,952 | 703 | 2249*** |
Employment density (per km2) | 6 | 279,308 | 2,908 | 4,631 | 1448 | 3183*** |
Transportation services and infrastructure |
Rail density (station/km2) | 0 | 22.7 | 0.62 | 0.99 | 0.31 | 0.68*** |
Bus density (station/km2) | 0 | 723 | 33.8 | 35.8 | 32.2 | 3.0 |
Bike lane density (km/km2) | 0 | 11.9 | 1.7 | 2.3 | 1.2 | 1.1*** |
Built environment |
Road density (km/km2) | 4.5 | 64.1 | 24.0 | 24.9 | 22.2 | 2.7*** |
Distance to downtown (km) | 0 | 36.1 | 14.6 | 12.6 | 16.2 | -1.6*** |
(*** significant level < 0.01 based on two-sample t-tests; ** 0.01 < significant level < 0.05; * 0.05 < significant level < 0.1) |
3.3 Methodology
We present our methodology in a two-step conceptual diagram (Fig. 3). This section explains the two major steps and several sub-steps in detail and how each step leads to the final goal—predicting bike-sharing usage with an equitable supply of bike-sharing services.
Both major steps consider spatial autocorrelation. We construct the spatial matrix based on distance, using Geoda (Anselin et al., 2006). Note that there are two matrices: one for actual bike-sharing data among the 367 census tracts with Divvy bike-sharing stations, and the other one for all 801 census tracts in Chicago.
The first step is to predict bike-sharing station locations across the 801 census tracts in the whole city of Chicago, based on neighborhood characteristics. We investigate two spatial models: spatial lag and spatial error. The spatial lag model assumes that the number of Divvy stations, i.e., the dependent variable, in one tract is influenced by Divvy stations in its neighboring tracts, whereas the spatial error model assumes there is a correlation between the error terms of different census tracts in predicting Divvy station numbers. We use the open-source software GeoDa (Anselin et al., 2006) to conduct spatial modeling and then predict the number of Divvy stations for the whole study area, including the census tracts that currently have no Divvy stations available.
The spatial lag model is expressed as:
Stationi=f (SEi, EMPi, Ti, BEi, Wi −1, station) (1)
The spatial error model is expressed as:
Stationi=f (SEi, EMPi, Ti, BEi, Wi −1, ε) (2)
Where
Station i is the number of Divvy stations in census tract i (i ∈801 tracts).
SE i denotes the socioeconomic characteristics of census tract i. Socioeconomic variables include the total population, household income, the number of zero-vehicle households, and educational attainment.
EMP i represents the total employment and employment by income level of census tract i.
T i is transportation services and infrastructure, including bike lanes, the number of transit stations, and rail density in census tract i.
BE i denotes built environment characteristics of census tracts i, such as intersection density and distance to downtown.
W i is the spatial matrix of the 801 census tracts.
W i −1, station and Wi −1, ε are the lag term and error term of the number of Divvy stations, respectively.
Results of this step will show the over- and under-supply of bike-sharing stations based on the neighborhood characteristics. The predicted number of bike-sharing stations is used in Step 2 to predict potential bike-sharing usage.
To predict potential bike-sharing usage, we need to first specify models based on actual Divvy trip data in Step 2.1. This step includes two sub-steps (steps 2.1.1 and 2.1.2), and resultant models from these two sub-steps are used subsequently to predict potential bike-sharing usage.
The reason for the two sub-steps is that the Divvy system covers 367 census tracts, 313 of which have actual bike-sharing trips and 54 tracts do not. We tried a log-linear model to predict the number of trips based on the 367 observations, but the model yields many estimated negative usages due to the negative intercept of the model. Therefore, we adopt two sub-steps to address the problem of negative estimates.
The first sub-step is a binomial logit model to estimate the association between neighborhood characteristics and the generation of Divvy trips (having Divvy trips or not) in the 367 census tracts.
Non-ZeroUsagek=f (SEk, EMPk, Tk, BEk) (3)
Where
Non-ZeroUsage k indicates whether census tract k generates Divvy trips (k∈367 tracts with Divvy stations). (Non-ZeroUsagek =1when tract k has actual Divvy trips; otherwise, Non-ZeroUsagek=0)
SE k denotes the socioeconomic characteristics of census tract k.
EMP k represents the total employment and employment by income level of census tract k.
T k is transportation services and infrastructure in census tract k.
BE k denotes built environment characteristics of census tracts k.
The specified binomial model is then used in the “predict trip generation step” in Step 2.2.1 to estimate whether the proposed bike-sharing service supply (results of Step 1) can generate bike-sharing trips. Only tracts that are predicted to have non-zero bike-sharing trips will be put in the final transformed spatial model to predict bike-sharing usage (in step 2.2.2).
The second sub-step in Step 2.1 is to use the 313 census tracts with actual Divvy trips to specify a transformed spatial model that predicts non-zero Divvy usage with neighborhood characteristics. The transformed spatial model considers spatial autocorrelation while still can predict a dependent variable, i.e., bike-sharing usage, for census tracts without actual data for the variable.
The transformed model is developed based on the conventional spatial lag model (Lan et al., 2019). Bike-sharing usage of census tract j (Usagej) can be predicted in the following way:
Usage j =f (SEj, EMPj, Tj, BEj, Wj−1,usage) (4)
Usage j−1 = f (SEj−1, EMPj−1, Tj−1, BEj−1, Wj−2,usage) (5)
Therefore,
Usage j = f (SEj, EMP j, T j, BE j, SE j−1, EMPj−1, Tj−1, BE j−1, W j−2,usage) (6)
We treat Wj−2, usage in Eq. (6) as a random error, as we expect an insignificant association between Usagej and Usagej−2. Eq. (7) is the final transformed spatial lag model to predict bike-sharing usage. The equation enables us to estimate the coefficients of each variable and test the robustness of the model before doing the usage prediction.
Usage j = f (SEj, EMPj, Tj, BEj, SEj−1, EMPj−1, Tj−1, BEj−1, Stationj, Stationj −1) (7)
Where
SE j , EMPj, Tj, and BEj represent neighborhood characteristics in census tract j.
SE j −1 , EMPj −1, Tj −1, and BEj −1 represent the spatial lags of respective neighborhood characteristics in census tract j-1.
Station j is the number of actual Divvy stations in census tract j.
Station j −1 is the spatial lag of actual Divvy stations in census tract j-1.
Note that Eq. (7) can be difficult to estimate, with many spatial lags of the independent variables. Therefore, neighborhood characteristics need to be carefully selected (please see Appendix for details).
Once the models based on actual Divvy data are specified, we conduct step 2.2, which first predicts whether the proposed number of bike-sharing stations can generate bike-sharing trips and second how many trips can be generated. In this way, potential bike-sharing usage with objectively planned bike-sharing stations can be estimated.