Study area
Lujiang County is located in the central part of Anhui Province, China, between 30°57′~31°33′ N and 117°01′~117°34′E (Fig. 1). It has a humid monsoon climate in the northern subtropics. There are low mountains, hills, polder areas, and lakes in the territory, and the terrain is high in the southwest and low in the northeast. The rivers in the territory belong to the Yangtze River system. The total area of the region is 2343.7 km2, with 17 townships and 1 economic development zone under its jurisdiction, and its registered population is approximately 1.2 million. The area is rich in mineral resources such as lead, zinc, copper, and aluminum.
Data sources and processing
In this study, residents with household registration in Lujiang County were used as the research object. Based on the registration data of DTC (including liver cancer, gastric cancer, and esophageal cancer here) of residents in Lujiang County from 2012 to 2017, a database of the incidence of DTC was established. The demographic data were sourced from the demographic department, and the case data were sourced from the Center for Disease Control and Prevention in Hefei, Anhui Province, China (http://www.hfcdc.ah.cn/). Case data are statistically coded using the International Classification of Diseases (ICD-10). After reviewing the completeness and validity of the data with reference to relevant standards [15], the registration data that meet the standards are selected for statistical analysis. The administrative division map used comes from the resource and environment center data cloud platform (http://www.resdc.cn/), and adopts the administrative division of Lujiang County in 2018, including 17 townships and 1 economic development zone. Since the case registration data did not distinguish between the economic development zone and Lucheng town, the administrative divisions of Lujiang County were merged in ArcGis 10.6. The final administrative divisions adopted include 17 townships. The 1:300,000 Lujiang County sub-township layer is selected as the base map data. To ensure accurate overlay analysis of the layers, all layers use the same geographic coordinate system and projected coordinate system, which is GCS_WGS_1984 and WGS_1984_UTM, respectively (with the units of degrees and meters, respectively). The geodetic datum is D_WGS_1984.
Statistical methods
Spatial empirical Bayes smoothing (SEBS) analysis
Based on the spatial empirical Bayesian smoothing model, K neighborhoods (representing the spatial weights matrix) were defined for each township in Lujiang County. Then the population was accumulated according to the distance from the study area, and finally smoothed according to the rate of the neighboring townships [11-12].
Spatial autocorrelation analysis
The spatial correlation strength of DTC in Lujiang County was evaluated by spatial autocorrelation analysis using GeoDa 2.0. The Global Moran's I index calculated by GeoDa 2.0 software was used to describe the global autocorrelation among all 17 township administrative regions. After 9,999 Monte Carlo simulation tests, the standardized statistic Z was used for the statistical test, and the test level was α=0.05. In general, the Moran's I index value ranged from -1.0 to +1.0. A Moran's I index value close to +1.0 indicated that the distribution of cancer patients in Lujiang County is more clustered; whereas a Moran's I index value close to −1.0 indicated that the distribution of cancer patients in Lujiang County is more discrete. A Moran's I index value close to zero indicated that the overall distribution of cancer patients is randomly distributed over space without any spatial clustering [15-17]. The Local Indications of Spatial Association (LISA) was used to describe the local autocorrelation among all 17 township administrative regions. GeoDa 2.0 software was used to calculate the local spatial autocorrelation index LISA to detect the specific location and type of cancer incidence areas at the township level. Then the identified local space types were exported to ArcGis 10.6 to make a local LISA aggregation map to determine the spatial aggregation type.
Hot spot analysis
This tool can identify statistically significant spatial clusters of high values (i.e., hot spots) and low values (i.e., cold spots) using the Getis-Ord Gi* statistic. Here we applied the Gi* statistic to estimate the degree of spatial clustering of DTC in Lujiang County. Particularly, the Z value of the hot spot analysis Gi* statistic was used to identify the cold and hot spots in the distribution of cancer cases among residents in Lujiang County. For example, the value of Z(Gi*)>1.96 indicates a high-value spatial cluster or hot spots, while Z(Gi*)<-1.96 indicates a low-value spatial cluster or cold spot [18,19].
Retrospective spatiotemporal scan statistical analysis
Kulldorff’s space-time scan statistical analysis was used to explore the spatial and temporal clustering of DTC in Lujiang County from 2012 to 2017 on the township scale. By building a moving cylinder, the radius of the circular window at the bottom varies from 0 to 50% of the total population, and the height corresponds to the study time of the area. The difference in incidence between the inside and outside of the window was calculated [20]. The window with the maximum likelihood was defined as the most likely cluster area, and other windows with statistically significant likelihood ratios (LLR) were defined as the secondary potential clusters. In addition, the relative risk (RR) of the area was calculated, and 9,999 Monte Carlo simulations with the test level being a=0.05 were used to test whether the difference is statistically significant. Since the incidence of cancer is a small probability event, the discrete Poisson probability model was used for scanning. In this study, the maximum spatial scanning area was set to 50% of the total population of Lujiang County, the scanning period was 1 year, and there was no area overlap. We entered the time of onset and the actual number of cases as basic information and then calculated the LLR value and RR value to determine the high-incidence time and high-incidence area. Finally, we used ArcGIS 10.6 to visualize the relative risk of DTC in high-risk cluster areas.
Standard deviational ellipse (SDE) analysis
SDE is a versatile GIS tool for delineating the geographic distribution of the research target. SDE mainly uses the center, major axis, and minor axis as basic parameters to quantitatively describe geographic elements. The center of the SDE reflects the spatial distribution characteristics and relative positions of the ecological elements, and the major and minor axes can reflect the spatial distribution of elements[21, 22]. Therefore, the annual metastatic trajectory of the onset of DTC in Lujiang County can be generally revealed.
GM (1,1) model analysis
The GM (1,1) model, which is a time series forecasting model, has the advantages of simple principles and high prediction accuracy, and can preprocess the original data to obtain better smoothness, making the prediction more effective [23]. This study used the grey GM (1,1) model to predict the incidence of DTC in each township in Lujiang County and explore the future evolution of the spatial pattern of the DTC cases in the county.