Our method consisted of three steps: identification of the urbanized rural areas, analysis of spatiotemporal patterns and socioeconomic characteristics.
Identification strategy. This study proposes three indicators for identifying the urbanized rural areas: population, GDP, and built-up land area. Justification for choosing these indicators is as follows: (1) the urbanized rural areas have a bidirectional population mobility. Thus, population in the urbanized rural area are higher than rural areas or even urban areas; (2) the urbanized rural areas display economic prosperity (e.g., tourism, e-commerce, and township enterprises) and local people enjoy affluency, as reflected by economic output (e.g., GDP); (3) the urbanized rural areas remain the original countryside landscape, but land conversions toward built-up land are prevalent in need of non-agricultural activities.
After selecting the identified indicators, we need to define the urban and rural areas, that is, we need to divide 14,136 village-level administrative units into urban and rural areas. The village-level administrative units are legalized grassroots governance units which has a defined boundary and a unique administrative division code that consists of 12 digits. The division code is published by the National Bureau of Statistics of the People's Republic of China (NBS) and is used for census. Based on the 12 digits’ division code, we can search 3 digits’ urban-rural continuum code from the NBS (http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2010/index.html). There are 7 types of the urban-rural continuum codes: 111 means that the village-level administrative unit belongs to urban districts of the prefectural city; 112 means the urban-rural conjunction areas; 121 means the central areas of the town; 122 means the town-rural conjunction areas; 123 means the special urban areas; 210 means the central areas of the rural town; 220 means the rural areas. The first digit of the urban-rural continuum code distinguishes urban areas (1) and rural areas (2). Finally, we defined 10,926 village-level units as rural areas and 3,120 village-level units as urban areas (Supplementary Fig.1).
The last challenge is how to identify the urbanized and ordinary rural areas, that is, we need to know how high is the population, GDP, and built-up land area of the urbanized rural areas. We used spatial statistics tools in ArcGIS 10.6 to extract population, GDP, and built-up land area for 14,136 village-level units in 1995, 2005, and 2015. The 3,120 village-level units in urban areas were used as contrast sample, and their mean values of the three indicators provided the standard of identification. In order for a village-level administrative unit to be classified as the urbanized rural areas, its population, GDP, and built-up land area must all be greater than the corresponding mean values of urban areas. If any one of the three indicators was not satisfied, the village-level administrative unit was classified as the ordinary rural areas.
Analysis of spatiotemporal patterns. For looking for the spatial factors influencing the development of the urbanized rural areas, we investigated the spatiotemporal patterns of the urbanized rural areas identified in 1995, 2005, and 2015. In doing so, we focused on three aspects: (1) the average nearest neighbor ratio; (2) urban proximity; (3) transportation accessibility.
The average nearest neighbor ratio calculated a nearest neighbor index based on the average distance from each urbanized rural area to its nearest neighboring feature. When the ratio is less than 1, the pattern of the urbanized rural areas exhibits clustering at the provincial level, and when the ratio is greater than 1, the pattern is dispersed.
Urban proximity represents that the Euclidean distance between the urbanized rural areas and the nearest urban centers. The urban centers refer to the government’s seats of 9 prefectural cities and 84 counties in Fujian Province.
Transportation accessibility calculated the Euclidean distance between the urbanized rural areas and main roads. We selected national highways, provincial roads, and county roads. These roads are a network of trunk roads, connecting all capitals of provinces, prefectural cities, and most of counties across China.
Analysis of socioeconomic characteristics. The third aim in our research was to investigate the socioeconomic characteristics of the urbanized rural areas. To doing so, we used the POI data which has been commonly used as a reliable data to measure regional socioeconomic activities42–44. The POI is a specific point location containing the name, category, latitude, and longitude. The obvious application of the POI data is to mark user’s location or destination in Google Maps. Firstly, we summarized the frequency of each third-level POI category within the urbanized rural areas, ordinary rural areas, and urban areas. And then we divided the POIs frequencies by the number of the urbanized rural areas, ordinary rural areas, and urban areas. Finally, we selected the top 25 POIs categories according to the average frequency.
Data on the village-level administrative unit. We used the 14,136 village-level administrative units in the study area in 2010 as research units which were divided into two categories: rural areas and urban areas. The data was collected from the local government.
Data on population, economy, and built-up land. By population and economy in our case, we used raster data with spatial resolution of 1×1 km in 1995, 2005, 2015, in which the cell’s value represents population or GDP. The data was interpolated based on the economic and demographic census data conducted at the county level, land use data, nighttime light data, and settlement distribution data. To validate the data accuracy, we compared the population and GDP calculated by the raster data with those reported in the statistical yearbook of Fujian Province (Supplementary Table 2). Expect for the population in 1995, the differences between the raster data and statistical yearbook are under 5%. The data accuracy could be acceptable for investigating population distribution and economic development at the village-level unit.
We used land-use land data that was manually interpreted from Landsat Thematic (TM) images from 1995, 2005, and 2015. Image pre-processing, classification, and classification accuracy assessment were completed by Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences. The land-use data contains built-up land, arable land, forest land, grassland, water and bare land. Built-up land contains urban land, rural residential land and other independent built-up land, which is used in our study. We obtained the three data from the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (http://www.resdc.cn).
Data on POI. The POI data contains 920,569 records in 2015. Each record has the longitude, latitude, name, and the 3-level categories. There are 20 first-level categories, 185 second-level categories, and 578 third-level categories in the POI data. We used the third-level POIs category which enables us to investigate the more detail socioeconomic characteristics. We collected the POI data from NavInfo company (http://www.navinfo.com/en/index.aspx), which is the largest digital map provider in China.