Planning for Bike-sharing System: Predicting Potential Usage with Spatial Regression Models

doi:10.21203/rs.3.rs-2010850/v1

The rapid growth of bike-sharing usage spurs a large amount of empirical research. However, much research focuses on existing bike-sharing services, without considering the gaps between revealed and potential demand, while some potential demand cannot be met without a supply of bike-sharing facilities. To address this gap, this research develops a two-step approach: the first step proposes an equitable supply of bike-sharing stations based on neighborhood characteristics, and the second step predicts potential bike-sharing usage with the proposed supply scenario. Using data from a station-based bike-sharing system in the city of Chicago, we specify and evaluate the new methodological approach with transformed spatial regression models. Results identify neighborhoods that have potential demand but are under-served. Our approach provides a tool for providing an equitable supply of bike-sharing services and promoting wide adoption of bike-sharing across diverse neighborhoods.

micro-mobility

spatial error model

spatial lag model

travel demand

forecast

Bike-sharing has grown rapidly in the U.S., particularly in large cities (NACTO Bike Share Initiative, 2018) and during the COVID-19 pandemic (Hu et al., 2021; Wang & Noland, 2021). Associated with the growth, research on bike-sharing services has boomed, and most of the existing research investigates service designs, user profiles, and determinants of usage. As a result, research tends to focus on existing bike-sharing services and their utilization. However, it is unclear whether the existing bike-sharing services can meet potential demand. Specifically, in places where bike-sharing services are scarce or unavailable, results based on the revealed demand would under-estimate potential demand, possibly leading to further under-investment. Objectively estimating potential demand with equitable service provision can inform service providers and policy-makers as well as facilitate the broad adoption of bike-sharing among diverse users.

Therefore, this research aims to predict potential demand, i.e., bike-sharing usage, based on a hypothetical scenario in which the location and number of bike-sharing stations are decided based on neighborhood-level socioeconomic, spatial, and transportation features. The research contribution is methodological. The analysis framework is composed of two major steps: the first step proposes an equitable supply of bike-sharing services, and the second step predicts potential demand based on the proposed service scenario. Transformed spatial regression models are adopted to tackle spatial autocorrelation of bike-sharing services and usage in places with and without actual bike-sharing stations and usage data.

The analysis results have policy implications. The first-step results can provide a relatively objective and equitable benchmark to decide bike-sharing service levels in different neighborhoods, and results from the two steps of analysis can inform gaps between potential demand for bike-sharing services and actual level of bike-sharing services, therefore drawing policy attention to places where the gaps are large.

The paper is organized as follows. Section 2 reviews literature on bike-sharing utilization and summarizes common factors that affect the supply of services and actual usage, as well as methodological complexities that should be tackled when predicting potential bike-sharing stations and usage. The next section introduces the study area, the city of Chicago, and its station-based bike-sharing system Divvy. It also elaborates the two-step framework to predict bike-sharing stations and usage across the study area, using spatial regression models. Section 4 reports results, and the last section provides discussion and contributions.

2.1 Planning Bike-sharing Services and Predicting Usage

Research has quantified the supply of bike-sharing services. The decisions of bike-sharing station locations often consider potential demand (Frade & Ribeiro, 2015; Ursaki & Aultman-Hall, 2015). Commonly, places with high population density (Ursaki & Aultman-Hall, 2015) and proximate to transit stations and commercial properties (Cheng et al., 2020; Ursaki & Aultman-Hall, 2015) tend to have more stations. Meanwhile, marginalized communities tend to have fewer bike-sharing stations (Hosford & Winters, 2018; Smith et al., 2015), and the reason could be associated with perceived low demand since bike-sharing usage can be unaffordable for low-income people (Hoe & Kaloustian, 2014). Nevertheless, the lack of bike-sharing services can further discourage bike-sharing usage without convenient checking-out and returning locations (National Association of City Transportation Officials, 2015).

It is imperative to address the spatial disparity in the supply of bike-sharing services since the disparity can enlarge the gaps in bike-sharing usage (Fishman et al., 2013; Lee et al., 2017) and may subsequently discourage investment in such services for some neighborhoods, further perpetuating environmental injustice which has already hampered transport mobility for marginalized neighborhoods (Hosford & Winters, 2018; Médard de Chardon, 2019). Equitable supply of bike-sharing services is particularly important during public health crises such as COVID-19. Private transportation modes (e.g., cars and bike-sharing) are perceived to be relatively safe during the pandemic due to lower levels of infection risks relative to public transit (Wee & Witlox, 2021), and much research has observed a dramatic increase in bike-sharing trips in U.S. cities during the COVID-19 pandemic (Hu et al., 2021; Wang & Noland, 2021). Under such circumstances, without bike-sharing services, people, particularly those who do not have cars, need to take public transportation and thus are more likely to be exposed to the virus. Therefore, developing an objective method to provide bike-sharing services in an equitable way is an important first step. This research uses neighborhood-level characteristics to propose bike-sharing station locations and the amount.

Much research has investigated the factors that explain bike-sharing usage. Common determinants of usage include transportation services, spatial, and socioeconomic characteristics. The most consistent determinants of bike-sharing usage are infrastructure, including bicycle-sharing station locations/capacity as well as bicycle lanes (Conrow et al., 2018; Faghih-Imani et al., 2014; Mateo-Babiano et al., 2016; Noland et al., 2016). Naturally, the availability of bike-sharing services decides bike-sharing usage: no service means no usage. Additional important transportation and spatial features include multimodal transportation connections, especially the availability of bus and subway services (Noland et al., 2016; Sun et al., 2018) and proximity to the city center or other important destinations (Guidon et al., 2020; Scott & Ciuro, 2019). Other neighborhood-level built environment characteristics can play a role in affecting bike-sharing usage, but their effects are uncertain. Concentrations of population and/or employment are found to encourage the usage in some studies (e.g., Conrow et al., 2018; Faghih-Imani et al., 2014) but not in others (e.g., Scott & Ciuro, 2019). Similarly, land use has mixed effects on bike-sharing trip usage. For example, the percentage of commercial land area is negatively correlated to bike-sharing trip generation (Guidon et al., 2020; Sun et al., 2018), but intersection density shows a positive association with bike-sharing usage (Wang et al., 2018).

Socioeconomic characteristics of riders affect their bike-sharing usage. Bike-sharing users tend to be male (Raux et al., 2017), highly educated (Fishman et al., 2013), young adults (Fishman et al., 2013), with high income (Raux et al., 2017), without a household vehicle (Orvin & Fatmi, 2021), white (Fishman et al., 2013), and living in urban areas or places with a high level of land-use mix (Fishman et al., 2013; Shen et al., 2018). Such socioeconomic characteristics at aggregated spatial levels similarly affect bike-sharing usages: neighborhoods with a large white population or higher household income generate more bike-sharing trips (Wang et al., 2016). However, it is unclear whether the reason for the correlation is that neighborhoods of certain socioeconomic characteristics generate more bike-sharing trips or that good bike-sharing services are provided in these neighborhoods.

2.2 Methodological Considerations

A complexity in predicting potential bike-sharing usage is the endogeneity between potential usage and the provision of bike-sharing facilities. Commonly, research that predicts actual or future bike-sharing usage with exogenously-determined station locations (El-Assi et al., 2017; Faghih-Imani et al., 2014; Noland et al., 2016; Qian & Jaller, 2020; Wang et al., 2016), although it is argued that location decisions of bike-sharing stations consider the potential demand (Frade & Ribeiro, 2015). To address this endogeneity, this research adopts a two-step framework that proposes bike-sharing service supply in the first step and predicts bike-sharing usage with the proposed supply scenario in the second step.

Modeling bike-sharing supply and usage requires some methodological innovations. One particular challenge is spatial autocorrelation. Two common types of spatial autocorrelation—spatial lag and spatial error (Anselin et al., 2006)—can both affect the model estimation. Spatial lag could be significant since bike-sharing stations form a network; a large supply and usage of bike-sharing in one location can affect those in nearby locations. Similarly, spatial error might occur as unobserved spatially-correlated factors, such as social norms and perception of neighborhoods, can affect bike-sharing usage. Models that do not consider such spatial relationships are expected to produce biased estimations.

Spatial regression models are commonly applied to address the spatial autocorrelation of bike-sharing data (Faghih-Imani & Eluru, 2016; Ma et al., 2018). However, conventional spatial regression models cannot be directly applied to conduct forecasting for spatial units without actual data for the dependent variables: in this paper, the number of bike-sharing stations or bike-sharing usage. Therefore, we introduce a transformed spatial model to address the methodological challenge. This method was originally developed by Lan et al. (2019) to study the spatial effect of geotagged tweets on crime but, to our knowledge, has not been applied in transportation research yet.

Another consideration is that bike-sharing data are not normally distributed: a few stations generate a lot of trips. Much research relies on non-linear models that conform to the features of bike-sharing data, and the negative binomial regression is commonly used (Corcoran et al., 2014; Ghaffar et al., 2020; Qian & Jaller, 2020; Schimohr & Scheiner, 2021). Some studies use linear regression models with log transformations (El-Assi et al., 2017; Wang et al., 2016). Other methodological innovations have been experimented. For example, Noland et al. (2016) used a negative binomial conditional autoregressive model to understand bike-sharing utilization in New York. Although the model takes into account both non-linearity and spatial autocorrelation, it can only estimate bike-sharing usage based on bike-sharing stations that were built or planned by service providers. But, our research scope includes proposing equitable bike-sharing service supply in the first step of the analysis.

Therefore, we apply a two-step analysis with conventional and transformed spatial regression models, using log-transformed variables. The approach efficiently serves our research goal.

3.1 Study Area

The study area, the city of Chicago, offers convenient transportation services. Chicago Transit Authority (CTA) operates eight rail lines and nearly a hundred bus routes, establishing an extensive transit network for residents. Metra, the regional (Northeast Illinois) commuter rail system, connects suburban communities with the city of Chicago. Figure 1 shows the study area and its transit network.

The geographic unit of our analysis is the census tract, and the city of Chicago has 801 census tracts. Census tracts are proxies for neighborhoods with a relatively homogenous population of between 2,500 and 8,000 (U.S. Bureau of the Census, 1994, pp. 10–11). Census tracts are also the smallest spatial unit of the different data sources used in this research.

3.2 Data

We gathered data from multiple sources. Bike-sharing data were obtained from the City of Chicago (Chicago Data Portal, 2020b). The Chicago Department of Transportation launched its stationed bike-sharing system (Divvy) in June 2013. After several expansions, the Divvy system consisted of a total of 654 stations within 367 census tracts by 2019. We focus on Divvy trips conducted during weekday peak hours (7:00 a.m. to 9:59 a.m.), defined by Chicago Metropolitan Agency for Planning (CMAP), during the 12-month period in 2019, before COVID-19. Divvy trip data are provided at the station level, and we aggregated the trip data by the origin census tracts.

Figure 2 shows the number of Divvy stations by census tract. In the 801 census tracts, 367 of them have Divvy stations. Tracts within and close to downtown Chicago tend to have more Divvy stations. Accordingly, Divvy trips tend to concentrate near downtown but are scarce in remote areas. Figure 2 also shows that Divvy trips departed from 590 stations in 313 census tracts with Divvy stations, which are located in the central and northern parts of the city, whereas the remaining 54 (= 367 − 313) census tracts with Divvy stations, mostly in the south side Chicago, did not originate any trips. In the following analysis, trip data in the 367 tracts are used to examine the association between neighborhood characteristics and bike-sharing usage.

We collected census-tract level socioeconomic, employment, transportation services, and built environment data from five sources. (1) The socioeconomic data come from the 2013–2018 5-year average American Community Survey (ACS) (U.S. Bureau of the Census, 2018). (2) Employment data come from the 2017 LEHD Origin-Destination Employment Statistics (LODES) (U.S. Bureau of the Census, 2020). (3) Transportation services data, i.e., the number of bus stops and train stations, are calculated based on data from the City of Chicago data portal. (4) The built environment data come from the 2013 Smart Location Database (U.S. Environmental Protection Agency, 2013). (5) We calculated the straight-line distance between downtown (the Loop) and each census tract in GIS (Chicago Data Portal, 2020a).

Table 1 gives descriptive statistics for all tracts in the city of Chicago (801 tracts) and tracts with Divvy stations (367 tracts) and without Divvy stations (434 tracts). The number of Divvy stations ranges from 1 to 17. The last column shows that tracts with Divvy stations tend to have a larger population, more carless households, a higher percentage of the high-education population, and lower-income, compared with tracts without Divvy stations. Additionally, total employment, employment density, rail density, bike lane density, and road density in the 367 census tracts are also higher.

Table 1

Descriptive statistics
	City of Chicago N = 801			w/ Divvy stations N = 367	w/o Divvy stations N = 434	Diff between w/ and w/o Divvy stations
	Min.	Max.	Mean	Mean	Mean	Diff between w/ and w/o Divvy stations
Bike-sharing data
Origin trips (peak hours) for 2019	0	46,951	1,884	1,884
Stations	1	17	2	2
Docks	6	467	30	30
Socioeconomic characteristics
Population	0	19,889	3,422	3,520	3424	96^*
# of zero-vehicle households	0	3,473	354	460	264	196^***
Median household income ($)	9,787	178,750	57,298	52,152	52,618	-466^***
% ppl 25 years and over with a bachelor's degree or higher	0.5	0.94	0.36	0.45	0.27	0.18^***
Employment
Total employment	2	331,288	1,734	2,952	703	2249^***
Employment density (per km²)	6	279,308	2,908	4,631	1448	3183^***
Transportation services and infrastructure
Rail density (station/km²)	0	22.7	0.62	0.99	0.31	0.68^***
Bus density (station/km²)	0	723	33.8	35.8	32.2	3.0
Bike lane density (km/km²)	0	11.9	1.7	2.3	1.2	1.1^***
Built environment
Road density (km/km²)	4.5	64.1	24.0	24.9	22.2	2.7^***
Distance to downtown (km)	0	36.1	14.6	12.6	16.2	-1.6^***
(* significant level < 0.01 based on two-sample t-tests; 0.01 < significant level < 0.05; * 0.05 < significant level < 0.1)

3.3 Methodology

We present our methodology in a two-step conceptual diagram (Fig. 3). This section explains the two major steps and several sub-steps in detail and how each step leads to the final goal—predicting bike-sharing usage with an equitable supply of bike-sharing services.

Both major steps consider spatial autocorrelation. We construct the spatial matrix based on distance, using Geoda (Anselin et al., 2006). Note that there are two matrices: one for actual bike-sharing data among the 367 census tracts with Divvy bike-sharing stations, and the other one for all 801 census tracts in Chicago.

The first step is to predict bike-sharing station locations across the 801 census tracts in the whole city of Chicago, based on neighborhood characteristics. We investigate two spatial models: spatial lag and spatial error. The spatial lag model assumes that the number of Divvy stations, i.e., the dependent variable, in one tract is influenced by Divvy stations in its neighboring tracts, whereas the spatial error model assumes there is a correlation between the error terms of different census tracts in predicting Divvy station numbers. We use the open-source software GeoDa (Anselin et al., 2006) to conduct spatial modeling and then predict the number of Divvy stations for the whole study area, including the census tracts that currently have no Divvy stations available.

The spatial lag model is expressed as:

Station_i=f (SE_i, EMP_i, T_i, BE_i, W_{i −1, station}) (1)

The spatial error model is expressed as:

Station_i=f (SE_i, EMP_i, T_i, BE_i, W_{i −1, ε}) (2)

Where

Station _i is the number of Divvy stations in census tract i (i ∈801 tracts).

SE _i denotes the socioeconomic characteristics of census tract i. Socioeconomic variables include the total population, household income, the number of zero-vehicle households, and educational attainment.

EMP _i represents the total employment and employment by income level of census tract i.

T _i is transportation services and infrastructure, including bike lanes, the number of transit stations, and rail density in census tract i.

BE _i denotes built environment characteristics of census tracts i, such as intersection density and distance to downtown.

W _i is the spatial matrix of the 801 census tracts.

W _{i −1, station} and W_{i −1, ε} are the lag term and error term of the number of Divvy stations, respectively.

Results of this step will show the over- and under-supply of bike-sharing stations based on the neighborhood characteristics. The predicted number of bike-sharing stations is used in Step 2 to predict potential bike-sharing usage.

To predict potential bike-sharing usage, we need to first specify models based on actual Divvy trip data in Step 2.1. This step includes two sub-steps (steps 2.1.1 and 2.1.2), and resultant models from these two sub-steps are used subsequently to predict potential bike-sharing usage.

The reason for the two sub-steps is that the Divvy system covers 367 census tracts, 313 of which have actual bike-sharing trips and 54 tracts do not. We tried a log-linear model to predict the number of trips based on the 367 observations, but the model yields many estimated negative usages due to the negative intercept of the model. Therefore, we adopt two sub-steps to address the problem of negative estimates.

The first sub-step is a binomial logit model to estimate the association between neighborhood characteristics and the generation of Divvy trips (having Divvy trips or not) in the 367 census tracts.

Non-ZeroUsage_k=f (SE_k, EMP_k, T_k, BE_k) (3)

Where

Non-ZeroUsage _k indicates whether census tract k generates Divvy trips (k∈367 tracts with Divvy stations). (Non-ZeroUsage_k =1when tract k has actual Divvy trips; otherwise, Non-ZeroUsage_k=0)

SE _k denotes the socioeconomic characteristics of census tract k.

EMP _k represents the total employment and employment by income level of census tract k.

T _k is transportation services and infrastructure in census tract k.

BE _k denotes built environment characteristics of census tracts k.

The specified binomial model is then used in the “predict trip generation step” in Step 2.2.1 to estimate whether the proposed bike-sharing service supply (results of Step 1) can generate bike-sharing trips. Only tracts that are predicted to have non-zero bike-sharing trips will be put in the final transformed spatial model to predict bike-sharing usage (in step 2.2.2).

The second sub-step in Step 2.1 is to use the 313 census tracts with actual Divvy trips to specify a transformed spatial model that predicts non-zero Divvy usage with neighborhood characteristics. The transformed spatial model considers spatial autocorrelation while still can predict a dependent variable, i.e., bike-sharing usage, for census tracts without actual data for the variable.

The transformed model is developed based on the conventional spatial lag model (Lan et al., 2019). Bike-sharing usage of census tract j (Usage_j) can be predicted in the following way:

Usage _j =f (SE_j, EMP_j, T_j, BE_j, W_j−1,_usage) (4)

Usage _j−1 = f (SE_j−1, EMP_j−1, T_j−1, BE_j−1, W_j−2,_usage) (5)

Therefore,

Usage _j = f (SE_j, EMP _j, T _j, BE _j, SE _j−1, EMP_j−1, T_j−1, BE _j−1, W _j−2,_usage) (6)

We treat W_j−2, _usage in Eq. (6) as a random error, as we expect an insignificant association between Usage_j and Usage_j−2. Eq. (7) is the final transformed spatial lag model to predict bike-sharing usage. The equation enables us to estimate the coefficients of each variable and test the robustness of the model before doing the usage prediction.

Usage _j = f (SE_j, EMP_j, T_j, BE_j, SE_j−1, EMP_j−1, T_j−1, BE_j−1, Station_j, Station_{j −1}) (7)

Where

SE _j, EMP_j, T_j, and BE_j represent neighborhood characteristics in census tract j.

SE _{j −1}, EMP_{j −1}, T_{j −1}, and BE_{j −1} represent the spatial lags of respective neighborhood characteristics in census tract j-1.

Station _j is the number of actual Divvy stations in census tract j.

Station _{j −1} is the spatial lag of actual Divvy stations in census tract j-1.

Note that Eq. (7) can be difficult to estimate, with many spatial lags of the independent variables. Therefore, neighborhood characteristics need to be carefully selected (please see Appendix for details).

Once the models based on actual Divvy data are specified, we conduct step 2.2, which first predicts whether the proposed number of bike-sharing stations can generate bike-sharing trips and second how many trips can be generated. In this way, potential bike-sharing usage with objectively planned bike-sharing stations can be estimated.

4.1 Planning for Bike-sharing Stations

The first step of the analysis is to propose the supply of bike-sharing stations based on neighborhood characteristics. We estimated both equations (1) and (2) and chose the spatial error model based on the statistical significance of Lagrange Multiplier (LM) scores. Table 2 reports the model results of the spatial error model.

The results are consistent with general findings in the literature. Socioeconomic characteristics and transportation services have positive associations with bike-sharing stations. Census tracts with a larger population and employment have more stations. Educational attainment also plays a role. The share of people aged 25 with a bachelor’s degree or higher is positively associated with the number of bike-sharing stations. Additionally, the presence of more bus and rail stations increases the number of stations. This is expected as shared bikes are often used as the first- and last-mile solution: bikes near transit stations are likely to yield more usage, which motivates operators to install more bike-sharing stations in areas with dense transit services.

Table 2

Modeling the number of bike-sharing stations (N = 801)
	Variables	Coefficient	Std. Error
Socioeconomic	Population (in thousands)	0.111***	0.021
	Employment (in thousands)	0.031***	0.004
	% Population 25 years and over with a bachelor's degree or higher	1.012***	0.240
Transportation services	Number of bus station	0.043***	0.005
	Number of rail station	0.341***	0.005
Spatial characteristics	Distance to downtown (km)	-0.035**	0.011
Spatial error	Spatial error-lambda	0.645***	0.061
Constant		-0.016*	0.242
R-squared		0.58
* significant level < 0.01; 0.01 < significant level < 0.05; * 0.05 < significant level < 0.1

GeoDa can automatically estimate predicted values of the dependent variable for all observations in the regression model. The estimation yields 913 stations, which are located across the city, covering 638 census tracts. Nearly 40% of the study area (349 census tracts) should be supplied with bike-sharing stations, based on our analysis, but not. Figures 4a-b show the distribution of predicted and existing bike-sharing stations. Census tracts with predicted stations but without existing stations tend to be in the western part of the city (tracts with the red boundaries), while 42 tracts on the south side have existing stations but no predicted stations.

We compare the errors between the actual and the predicted bike-sharing stations for the 367 tracts that currently have Divvy station. Figure 5 presents the distribution of the errors. The errors appear randomly distributed, although they tend to be large near downtown Chicago. The model estimation is more accurate for census tracts with the number of stations at the lower end of its range. Specifically, for census tracts with five Divvy stations or less, 44.1% have no differences between the actual and predicted numbers of stations, and another 44.6 % have an error of one. Only 22 census tracts currently have more than five Divvy stations, and they concentrate near downtown. Nevertheless, about 80% of these tracts have an error greater than three. Divvy stations can be perceived as over-supplied in downtown Chicago relative to the rest of the city, and this over-supply is common with the expectation of peak demand during peak hours at these locations.

4.2 Predicting Bike-sharing Usage

4.2.1 Specify Models

Step 2.1 is to specify models to predict bike-sharing usage. Since some census tracts with Divvy stations did not generate any Divvy usage, we first model trip generation (zero or not) with neighborhood characteristics for all tracts with Divvy stations (367 census tracts) based on Eq. (3) (step 2.1.1).

Table 3 reports the model results. Census tracts with more Divvy stations, more no-vehicle households, and higher median household income tend to have higher probabilities of generating Divvy trips. Locating closer to downtown also increases the probabilities of Divvy trip generation. Employment and transportation services are statistically insignificant and hence are removed in the final model.

Table 3

Modeling bike-sharing usage (zero or not) with neighborhood characteristics (N = 367)
	Variables	Coefficient
Bike-sharing facilities	Number of stations	1.242**
Socioeconomic	Median Household income (log)	1.461**
Socioeconomic	Number of no-vehicle households (log)	1.411***
Employment	Employment density (log)	-
Transportation services	Bike lane density (log)	-
Transportation services	Rail density (log)	-
Spatial characteristics	Distance to downtown	-0.540***
Constant		-13.72*
Model fit	Pseudo R-squared	0.727
* significant level < 0.01; 0.01 < significant level < 0.05; * 0.05 < significant level < 0.1

Then, we conduct step 2.1.2 to specify a transformed spatial lag model to predict Divvy usage based on neighborhood characteristics. As we cautioned in the methodology section, neighborhood characteristics need to be carefully selected because these characteristics and their respective spatial lags are both included in the model (Eq. 7). After careful examination, we chose three key neighborhood characteristics: median household income, the number of households without vehicles, and the number of bike-sharing stations. Please see Appendix for the model specification process. We estimate the transformed spatial model with these three neighborhood characteristics and their spatial lags in Eq. (7).

Table 4 reports the results of the transformed spatial lag model. All independent variables are significant except the spatial lag of zero-vehicle households. All coefficients are significant at the 10% level, and the R-squared of the model is 0.72.

Table 4

Transformed spatial lag model to predict Divvy usage (log) (N = 313)
	Variables	Coefficient	Std. Error
Socioeconomic	Median household income (log)	0.664**	0.227
	Median household income (log) spatial lag	2.845***	0.271
	Number of no-vehicle households (log)	0.203**	0.096
	Number of no-vehicle households (log) spatial lag	-	-
Bike-sharing facilities	Number of Divvy stations	0.264***	0.049
	Number of Divvy stations_spatial lag	0.207*	0.116
Constant		-34.68***
R-squared		0.72
* significant level < 0.01; 0.01 < significant level < 0.05; * 0.05 < significant level < 0.1

Specifically, Divvy usage is positively associated with the income level of this census tract and neighboring census tracts. Places with higher household income or surrounded by high-income neighborhoods tend to generate more Divvy trips, suggesting the popularity of using bike-sharing in affluent neighborhoods. Similarly, the number of Divvy stations in one census tract and the average number of Divvy stations nearby both play a positive role in encouraging Divvy usage. Naturally, clusters of stations can bring great convenience for checking out and returning shared bikes. Additionally, the presence of more carless households increases Divvy usage, as autoless households, by choice or by constraints, need alternative transportation modes, including bike-sharing.

4.2.2 Predicting Bike-sharing Usage

The usage prediction includes two sub-steps. Using the specified model reported in Table 3, we first calculate the probability of having non-zero bike-sharing usage based on neighborhood characteristics for the 638 census tract with proposed bike-sharing stations. We define that census tracts with a probability higher than 0.5 are forecast to have non-zero usage, and 545 census tracts with proposed bike-sharing stations fall into this category. The remaining 93 census tracts have hypothetically planned stations but are not likely to produce bike-sharing trips. With the specified model presented in Table 4, we calculate predicted bike-sharing usage for the 545 census tracts based on the predicted number of bike-sharing stations from Step 1.

Figure 6 compares the actual Divvy usage for the 367 tracts (Fig. 6a) with the predicted bike-sharing usage of the hypothetical bike-sharing stations in the 545 census tracts (Fig. 6b). The hypothetical bike-sharing stations generate 1,034,914 trips, nearly 80% of which are from census tracts that are currently served by the Divvy system and the rest are from tracts without actual bike-sharing stations. The results suggest a potential under-supply of the current Divvy system, resulting in unmet bike-sharing demand.

Still, Spatial patterns of the distribution of bike-sharing usage are similar. For both actual and predicted usages, census tracts in downtown show the highest amount, and the number of bike-sharing trips decreases as the distance to downtown increases. Meanwhile, with the hypothetical, objectively planned bike-sharing stations, the spatial coverage of the predicted bike-sharing trips is more extensive than the actual Divvy trips. Additionally, both scenarios have zero bike-sharing usage in census tracts on the far south side.

To test the validity of the transformed spatial lag model, we also use the model to calculate predicted trips with existing Divvy stations and then compare differences between the actual and the predicted bike-sharing trips for the 313 census tracts that have actual Divvy usage. The differences are presented in Fig. 7. Note that the positive and negative signs represent over-estimation and under-estimation of bike-sharing trips; 59 census tracts have negative errors, indicating that predicted numbers are lower than the actual, and in the other census tracts, actual Divvy trips are fewer than the predictions. Our estimation is more accurate for census tracts located at the center of Chicago than for remote areas. More than half, 177 (= 59 + 118) of the 313 census tracts have errors within 25% (-25–24.9%) difference from the actual numbers, and the spatial patterns appear random, without clear spatial patterns.

We summarize two major findings. First, unmet demand exists, and an equitable supply of bike-sharing services is needed. Our analysis predicts that about 20% of potential demand is not met by the current Divvy system. Recognizing that the unmet demand occurs mainly in peripheral locations with relatively low, although non-zero, potential usage, we suspect that providing bike-sharing stations in these neighborhoods might not be cost-efficient if economic rationale is the main decision factor. Nevertheless, we argue that an equitable and just bike-sharing system needs to consider the unmet bike-sharing demand. Second, there are gaps between the predicted demand and the actual bike-sharing usage. However, we cannot detect clear spatial patterns of the gaps, in terms of where gaps are large or small or whether we over- or under-estimate the demands. Some idiosyncratic, unsystematic factors affect bike-sharing usage.

In this research, we develop a methodological approach to plan for bike-sharing stations based on neighborhood characteristics and predict bike-sharing usage with the hypothetically planned bike-sharing stations. The approach connects service design with meeting potential demand.

We use Divvy, a station-based bike-sharing system, in the city of Chicago as the case study for this research, and we reach two conclusions. First, the current bike-sharing system might not meet potential demand, as the spatial range of the objectively planned bike-sharing system and the predicted bike-sharing usage based on the proposed service scenario are both greater than the actual situations. Specifically, many peripheral areas of the city need bike-sharing services but are under-served by the existing bike-sharing system. Second, providing new stations can be an effective way to increase bike-sharing usage in some neighborhoods. Our analysis predicts 20% (171,249 trips) new, unmet demand. Although it may be argued that it is cost-inefficient to serve the unmet demand, as the demand spreads across 276 census tracts that are mainly in peripheral locations and do not currently have bike-sharing stations, we believe that the new service provision could promote broad adoption of bike-sharing among diverse users.

This research has some limitations. First, we develop this approach based on a station-based bike-sharing system, which may constrain the applicability to dockless shared mobility services. Second, our prediction is solely based on neighborhood characteristics in the city of Chicago, and we acknowledge that other factors like community culture or the topography of other study areas can influence the accuracy of the usage prediction.

Still, we believe that this research introduces a practical and objective methodology for planning bike-sharing systems. We use neighborhood characteristics to propose an equitable supply of bike-sharing stations and then predict potential bike-sharing usage with the proposed service scenario. To our knowledge, few studies have done so. Using conventional and transformed spatial models, the two-step approach considers endogeneity between bike-sharing stations and their usage, as well as spatial autocorrelation of bike-sharing data.

The approach has policy and planning implications. Objectively estimating potential demand with equitable service provision can assist decision-making on the coverage and expansion of bike-sharing systems. Our analysis also shows gaps between potential and actual levels of bike-sharing usage. Policy-makers and planners might want to focus on the neighborhoods that show potential demand but do not produce much actual usage and investigate barriers for bike-sharing usage. Potential barriers can include social norms, built environment, or relative affordability of bike-sharing.

A1 Selecting neighborhood characteristics for the transformed spatial lag model

We rely on conventional spatial models to explore the association between neighborhood characteristics and actual bike-sharing usage. Neighborhood characteristics that are significantly associated with actual bike-sharing trips are used to predict potential bike-sharing usage using the transformed spatial model.

We estimate both spatial error and spatial lag models to model actual Divvy usage with neighborhood characteristics for the 313 census tracts that generate Divvy trips. The spatial lag model is chosen because its Lagrange Multiplier (LM) statistics is more significant and we will apply a transformed spatial lag model.

Table A1 gives the spatial lag model results. Four neighborhood characteristics significantly explain Divvy trips: median household income, the number of households without vehicles, the number of Divvy bike stations, and distance to downtown. Specifically, the median household income is positively associated with Divvy usage. Similarly, increases in the number of no-vehicle households in census tracts can improve Divvy usage. Distance to downtown (the Loop) has a negative effect: Divvy usage decreases as the distance to downtown increases. After controlling for socioeconomic and a series of spatial and transportation characteristics, the number of bike stations has positive effects on Divvy usage. This finding is consistent with existing literature that bicycling infrastructure is an important factor that affects bike-sharing usage (Conrow et al., 2018; Faghih-Imani et al., 2014; Mateo-Babiano et al., 2016; Noland et al., 2016).

Table A1 Modeling bike-sharing usage (log) with neighborhood characteristics (N=313)

	Variables	Coefficient	Std. Error
Bike-sharing facilities	Number of stations	0.233***	0.036
Socioeconomic	Median Household income (log)	0.659***	0.128
Socioeconomic	Number of no-vehicle households (log)	0.189**	0.067
Employment	Employment density (log)	0.043	0.054
Transportation services	Bike lane density (log)	0.175	0.095
Transportation services	Rail density (log)	-0.066	0.083
Spatial characteristics	Distance to downtown	-0.013*	0.011
Constant		-7.317***
Model fit	Akaieke info Crieterion (AIC)	854.5
	Schwarz Criterion (SC)	888.2

*** significant level <0.01; ** 0.01 < significant level <0.05; * 0.05 < significant level <0.1

We also tried other variables but omitted them in the final model because of the multicollinearity among some variables. For example, the population with a bachelor’s degree or above is associated with the number of carless households and median household income. The total employment and employment by income levels are correlated with employment density, but they do not have significant effects on bike-sharing usage in the study area.

Finally, three neighborhood characteristics (median household income, the number of households without vehicles, and the number of Divvy bike stations) can be used to specify the transformed spatial model and subsequently predict bike-sharing trips. Distance to downtown is removed to avoid multicollinearity since it has been used to predict Divvy stations in step 1.

Acknowledgments

We are grateful to Dr. Xiao Huang for his assistance on the transformed spatial lag model and Dr. Jie Yu and Xinyu Liu for providing Divvy data.

Statements and Declarations

The authors have no conflicts of interests to declare.

Author Contributions:

Sai Sun did data preparation and analysis and wrote the original draft. Lingqian Hu revised the manuscript. All authors reviewed the final manuscript.

Anselin, L., Ibnu, S., Youngihn, K.: GeoDa: An Introduction to Spatial Data Analysis. Geographical Anal. 38(1), 5–22 (2006)
Cheng, L., Yang, J., Chen, X., Cao, M., Zhou, H., Sun, Y.: How could the station-based bike sharing system and the free-floating bike sharing system be coordinated? J. Transp. Geogr. 89, 102896 (2020). https://doi.org/10.1016/j.jtrangeo.2020.102896
Chicago Data Portal:. Boundaries—Census Tracts—2010 | City of Chicago | Data Portal. Chicago. (2020a). https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Census-Tracts-2010/5jrd-6zik
Chicago Data Portal:. Divvy Trips | City of Chicago | Data Portal. (2020b). https://data.cityofchicago.org/Transportation/Divvy-Trips/fg6s-gzvg
Conrow, L., Murray, A.T., Fischer, H.A.: An optimization approach for equitable bicycle share station siting. J. Transp. Geogr. 69, 163–170 (2018). https://doi.org/10.1016/j.jtrangeo.2018.04.023
Corcoran, J., Li, T., Rohde, D., Charles-Edwards, E., Mateo-Babiano, D.: Spatio-temporal patterns of a Public Bicycle Sharing Program: The effect of weather and calendar events. J. Transp. Geogr. 41, 292–305 (2014). https://doi.org/10.1016/j.jtrangeo.2014.09.003
El-Assi, W., Salah Mahmoud, M., Nurul Habib, K.: Effects of built environment and weather on bike sharing demand: A station level analysis of commercial bike sharing in Toronto. Transportation. 44(3), 589–613 (2017). https://doi.org/10.1007/s11116-015-9669-z
Faghih-Imani, A., Eluru, N.: Incorporating the impact of spatio-temporal interactions on bicycle sharing system demand: A case study of New York CitiBike system. J. Transp. Geogr. 54, 218–227 (2016). https://doi.org/10.1016/j.jtrangeo.2016.06.008
Faghih-Imani, A., Eluru, N., El-Geneidy, A.M., Rabbat, M., Haq, U.: How land-use and urban form impact bicycle flows: Evidence from the bicycle-sharing system (BIXI) in Montreal. J. Transp. Geogr. 41, 306–314 (2014). https://doi.org/10.1016/j.jtrangeo.2014.01.013
Fishman, E., Washington, S., Haworth, N.: Bike Share: A Synthesis of the Literature. Transp. Reviews. 33(2), 148–165 (2013). https://doi.org/10.1080/01441647.2013.775612
Frade, I., Ribeiro, A.: Bike-sharing stations: A maximal covering location approach. Transp. Res. Part A: Policy Pract. 82, 216–227 (2015). https://doi.org/10.1016/j.tra.2015.09.014
Ghaffar, A., Mitra, S., Hyland, M.: Modeling determinants of ridesourcing usage: A census tract-level analysis of Chicago. Transp. Res. Part C: Emerg. Technol. 119, 102769 (2020). https://doi.org/10.1016/j.trc.2020.102769
Guidon, S., Reck, D.J., Axhausen, K.: Expanding a(n) (electric) bicycle-sharing system to a new city: Prediction of demand with spatial regression and random forests. J. Transp. Geogr. 84, 102692 (2020). https://doi.org/10.1016/j.jtrangeo.2020.102692
Hoe, N., Kaloustian, T. Bike Sharing in Low-Income Communities: An Analysis of Focus Groups Findings. Institute for Survey Research: Temple University. (2014). http://static.peopleforbikes.org.s3.amazonaws.com/REPORT_Low%20Income%20Bike%20Share%20Focus%20Groups_FINAL.pdf
Hosford, K., Winters, M.: Who Are Public Bicycle Share Programs Serving? An Evaluation of the Equity of Spatial Access to Bicycle Share Service Areas in Canadian Cities. Transp. Res. Rec. 2672(36), 42–50 (2018). https://doi.org/10.1177/0361198118783107
Hu, S., Xiong, C., Liu, Z., Zhang, L.: Examining spatiotemporal changing patterns of bike-sharing usage during COVID-19 pandemic. J. Transp. Geogr. 91, 102997 (2021). https://doi.org/10.1016/j.jtrangeo.2021.102997
Lan, M., Liu, L., Hernandez, A., Liu, W., Zhou, H., Wang, Z.: The Spillover Effect of Geotagged Tweets as a Measure of Ambient Population for Theft Crime. Sustainability. 11(23), 6748 (2019). https://doi.org/10.3390/su11236748
Lee, R.J., Sener, I.N., Jones, S.N.: Understanding the role of equity in active transportation planning in the United States. Transp. Reviews. 37(2), 211–226 (2017). https://doi.org/10.1080/01441647.2016.1239660
Ma, X., Ji, Y., Jin, Y., Wang, J., He, M.: Modeling the Factors Influencing the Activity Spaces of Bikeshare around Metro Stations: A Spatial Regression Model. Sustainability. 10(11), 3949 (2018). https://doi.org/10.3390/su10113949
de Médard, C.: The contradictions of bike-share benefits, purposes and outcomes. Transp. Res. Part A: Policy Pract. 121, 401–419 (2019). https://doi.org/10.1016/j.tra.2019.01.031
NACTO Bike Share Initiative:. Bike Share in the U.S.: 2017. National Association of City Transportation Officials. (2018). https://nacto.org/bike-share-statistics-2017/
National Association of City Transportation Officials:. Walkable Station Spacing Is Key to Successful, Equitable Bike Share. National Association of City Transportation Officials. (2015)., April 28 https://nacto.org/2015/04/28/walkable-station-spacing-is-key-to-successful-equitable-bike-share/
Noland, R.B., Smart, M.J., Guo, Z.: Bikeshare trip generation in New York City. Transp. Res. Part A: Policy Pract. 94, 164–181 (2016). https://doi.org/10.1016/j.tra.2016.08.030
Orvin, M.M., Fatmi, M.R.: Why individuals choose dockless bike sharing services? Travel Behav. Soc. 22, 199–206 (2021). https://doi.org/10.1016/j.tbs.2020.10.001
Qian, X., Jaller, M.: Bikesharing, equity, and disadvantaged communities: A case study in Chicago. Transp. Res. Part A: Policy Pract. 140, 354–371 (2020). https://doi.org/10.1016/j.tra.2020.07.004
Raux, C., Zoubir, A., Geyik, M.: Who are bike sharing schemes members and do they travel differently? The case of Lyon’s “Velo’v” scheme. Transp. Res. Part A: Policy Pract. 106, 350–363 (2017). https://doi.org/10.1016/j.tra.2017.10.010
Schimohr, K., Scheiner, J.: Spatial and temporal analysis of bike-sharing use in Cologne taking into account a public transit disruption. J. Transp. Geogr. 92, 103017 (2021). https://doi.org/10.1016/j.jtrangeo.2021.103017
Scott, D.M., Ciuro, C.: What factors influence bike share ridership? An investigation of Hamilton, Ontario’s bike share hubs. Travel Behav. Soc. 16, 50–58 (2019). https://doi.org/10.1016/j.tbs.2019.04.003
Shen, Y., Zhang, X., Zhao, J.: Understanding the usage of dockless bike sharing in Singapore. Int. J. Sustainable Transp. 12(9), 686–700 (2018). https://doi.org/10.1080/15568318.2018.1429696
Smith, C.S., Oh, J.S., Lei, C.: Exploring the equity dimensions of US bicycle sharing systems. No. TRCLC 14 – 01 (2015). https://ntl.bts.gov/public-access Western Michigan University
Sun, F., Chen, P., Jiao, J.: Promoting public bike-sharing: A lesson from the unsuccessful Pronto system. Transp. Res. Part D: Transp. Environ. 63, 533–547 (2018). https://doi.org/10.1016/j.trd.2018.06.021
Ursaki, J., Aultman-Hall, L. Quantifying the equity of bikeshare access in US cities (No. TRC Report 15 – 011). University of Vermont. Transportation Research Center. (2015). https://ntl.bts.gov/public-access
U.S. Bureau of the Census:. Chapter 10 Census Tracts and Block Numbering Areas. In Geographic Areas Reference Manual. (1994)
U.S. Bureau of the Census:. 2013–2018 American Community Survey 5-Year Estimate. (2018)
U.S. Bureau of the Census: LEHD Origin-Destination Employment Statistics Data. U.S. Census Bureau (2020). https://lehd.ces.census.gov/data/#lodes Longitudinal-Employer Household Dynamics Program
U.S. Environmental Protection Agency:. Smart Location Database. (2013)
Wang, H., Noland, R.: Changes in the Pattern of Bikeshare Usage due to the COVID-19 Pandemic. Findings. 18728 (2021). https://doi.org/10.32866/001c.18728
Wang, K., Akar, G., Chen, Y.-J.: Bike sharing differences among Millennials, Gen Xers, and Baby Boomers: Lessons learnt from New York City’s bike share—ScienceDirect. Transp. Res. Part A: Policy Pract. 116, 1–14 (2018). https://www.sciencedirect.com/science/article/abs/pii/S0965856417306419
Wang, X., Lindsey, G., Schoner, J.E., Harrison, A.: Modeling Bike Share Station Activity: Effects of Nearby Businesses and Jobs on Trips to and from Stations. J. Urban. Plan. Dev. 142(1), 04015001 (2016). https://doi.org/10.1061/(ASCE)UP.1943-5444.0000273
Wee, B., Witlox, F.: COVID-19 and its long-term effects on activity participation and travel behaviour: A multiperspective view—ScienceDirect. J. Tranport Geogr. 95, 103144 (2021). https://www.sciencedirect.com/science/article/pii/S0966692321001976

No competing interests reported.

Planning for Bike-sharing System: Predicting Potential Usage with Spatial Regression Models

Status:

Version 1

Abstract

Figures

1. Introduction

2. Literature Review

2.1 Planning Bike-sharing Services and Predicting Usage

2.2 Methodological Considerations

3. Study Area, Data, And Methodology

3.1 Study Area

3.2 Data

3.3 Methodology

4. Results

4.1 Planning for Bike-sharing Stations

4.2 Predicting Bike-sharing Usage

4.2.1 Specify Models

4.2.2 Predicting Bike-sharing Usage

5. Discussion And Conclusion

Appendix

A1 Selecting neighborhood characteristics for the transformed spatial lag model

Declarations

Acknowledgments

Statements and Declarations

Author Contributions:

References

Additional Declarations

Status:

Version 1