Composite characteristics
Inflow and outflow of population promoted the spread of the epidemic, and the type of occupation largely influences the population's mobility. In terms of vocation, workers, waiters, staff and students are most infected, accounting for nearly 80% of confirmed cases. Additionally, the highest proportion of nonlocal cases comes from the worker occupation, followed by waiters and students. For local occupations, the staff is the mainstay, followed by workers and waiters (Fig. 1A). The number of confirmed cases who worked in commercial service was the largest, accounting for nearly 55%, followed by sales service (22%) and transport service (13.6%) (Fig. 1B). More people are generally infected with COVID-19 in nonlocal industries, which have high exposure and frequent contact with others, such as workers, staff, and servicers, especially business servicers. This servicer typically works in shopping malls or supermarkets, which are the main infection hubs of COVID-19. As a large group returning home during the winter vacation, students also demonstrate a very high risk of infection.
Furthermore, keywords are extracted from the manually collected text of 8,911 confirmed cases with detailed information, removing "patient", "diagnosis", gender, origin, and other irrelevant messages. Jieba is used to generate the top 100 keywords and word frequency after semantic combination screened relevant feature words, permitting the construction of the co-occurrence matrix of high-frequency words. Where the frequency is 0, there is no co-occurrence relation between such keywords. Gephi is used to describe the degree of dependence between these keywords, and the Force Atlas2 algorithm is utilized to construct the co-occurrence map of keywords. The node represents a high-frequency word. The larger the node, the greater the centrality of the keyword (Fig. 2). Among them, the average degree of each node is 28.222; the average path length is 1.194, and the average clustering coefficient is 0.868, indicating a high degree of connection among the keywords; that is, there is a susceptible population. Among them, "Wuhan", "close contact", "quarantine" (15.31) and "fever" (13.43) were the highest centers; "afternoon" (12.85), "consult" (10.95), "people's hospital" (10.95), "ante meridiem" (10.64) followed.
Thus, it can be observed that most of the confirmed cases are geographically related to "Wuhan", most of them are "returning home" people or a "permanent resident" of a nonlocal area, and most of the infected are "relatives" and "husbands" within the family, especially small core families, commonly composed of "couples". Most of the infected outside the family are acquaintances of the confirmed cases, such as "friends" and "colleagues".
The infection route is mostly "close contact", and the symptoms are mainly "fever", "positive", and "cough", which is consistent with the epidemiological characteristics of COVID-19. The disease symptoms are closely related to the patient's age, gender, and overall health status[25]. Thus, the age of each confirmed case and its clinical symptoms are combined (based on 4,859 records), irrelevant words are removed, and synonyms merged to calculate the frequency of each symptom keyword (the proportion of them accounting for all symptoms in a specific age group), and, finally, the keywords with a frequency of more than 0.01 in each age group are selected and united (Fig. 3). The results show that fever and cough were the core symptoms in all ages. In addition, cases in those aged from 0 to 19 years old also have multiple other symptoms, including diarrhea, rhinorrhea and feebleness. Feebleness, headache and ache are common in patients aged from 20 to 59 years old. Patients over 60 years of age have similar symptoms to those between 20 and 39 years of age, but "headache" is relatively rare.
Most confirmed cases are "centralized" in "community", "street" and "supermarket", and are associated with activities in the "morning" or "afternoon" like having "dinner" or going "shopping". Relatively independent transportation modes such as "self-driving" are used, which reduces the risk of re-transmission compared with "bus" and "train". People who have a distinct "onset" process, always "walk" to "people's hospital", "designated medical institutions" or "outpatient" for "consult", and "nucleic acid detection" and "medical quarantine observation" are the main means of diagnosis and treatment. A few patients without obvious symptoms are taking "living at home" and other relevant measures; COVID-19 is highly infectious but has a low fatality rate, so patients' state is mostly "stable". Designated hospitals are usually arranged in the "people's hospital" and other types of medical institutions.
Temporal evolution properties
The number of cumulative confirmed cases showed a peaking trend over time. The number of cases is high from January 26 to February 6, 2020, with the number of cases on February 2 being the largest. The sex ratio is stable but exhibits significant fluctuation at early and late stages. The number of nonlocal cases is higher from January 26 to February 6, while the peak period of local infections is from February 1 to February 11, the lag between them, at 7 days, is equal to a half incubation. Overall, infected areas and orders have experienced a transition from the nonlocal first to the nonlocal second group, and then the local infection group.
Based on the above analysis, the temporal evolution is divided into the following four stages.
Ⅰ. Sporadic appearance period (December 30, 2019 to January 17, 2020). The COVID-19 pandemic had just spread from Wuhan to other areas before January 17, 2020. Most infected people were in the incubation stage, so the number of confirmed cases in this period is small, less than 20, and most are nonlocal infections with almost no local infection. There is no significant difference in sex ratio, so it is not representative.
Ⅱ. Fluctuating rise period (January 18 to January 25, 2020). The sex ratio of confirmed cases fluctuates in different degrees during the fluctuating rise period and the fluctuating decline period, but the fluctuations in the fluctuating rise period are relatively greater. More males than females were confirmed except for the days before January 13 and January 17; From January 18 to February 5, there were many people infected nonlocally, and the first infection group before January 25 had the highest proportion.
Ⅲ. Stable high-risk period (January 26 to February 6, 2020). The sex ratio in this period remained stable at 1, and the confirmed cases were gender-balanced. From January 26 to January 30, the nonlocal second group's proportion gradually increased, and the transition from the nonlocal first to the second group began; from January 31 to February 5, the trend began to change from nonlocal to local infections.
Ⅳ. Fluctuating decline period (February 7 to February 27, 2020). The sex ratio was relatively small during this period. Except for February 17, the confirmed cases were mostly females. After February 6th, the local-dominated, especially the third-group-dominated situation appeared.
Overall, the spread of the COVID-19 epidemic in China has experienced a transition from male-dominated cases in the early stage to mostly female cases in the later stage. At the same time, it has also transitioned from nonlocal to local infections. The specific manifestation is from the nonlocal first to the nonlocal second group, finally transitioning to the local third generation.
Spatial flow network
Source and target provinces. The movement of people dramatically affects the spread of the epidemic, so the flow path information of confirmed cases is extracted from detailed data (3770 cases), using the degree centrality to identify the main inflow and outflow provinces in different periods and to explore the spread patterns of the epidemic in China (Fig. 5). In general, interprovincial flow accounts for many reported cases, as much as 86.7%. Within-province flows occur mostly in Hubei (17.9%), Guangdong (12.5%) and Anhui (11.8%). Standardized by the total confirmed cases, the provinces with more than 50% intra-provincial flows in descending order are Shanxi, Hainan, Heilongjiang, Gansu, Anhui, Jilin, Guangxi and Sichuan (there is no data for Qinghai). These provinces are low-risk in this pandemic and mostly located in central, western, and northeast China. Interprovincial flow is concentrated in the provinces with more confirmed cases, like Hubei, Jiangxi, Zhejiang, etc. As a whole, the outflow areas are distributed in a "one point and one area" pattern, concentrated in the southeast, where the "one point" is Beijing (1.4%) and the "one area" is a sector-shaped area centered on Hubei, which accounts for 70%, followed by Guangdong (3.6%), Henan (2.3%), Zhejiang (2.0%), Jiangsu (1.8%), Hunan (1.8%) and Jiangxi (1.4%). The case outflow from these provinces (except for Hubei) made up 14.2%. The main inflow areas are presented as "one point and one line". The “one point” is Guangdong (18.8%), and the "one line" is Anhui (10.8%), Henan (7.5%), and Zhejiang (6.5%) in descending order. 43.7% of the cases were imported into these four provinces in total, demonstrating that the spread of the COVID-19 has prominent regional agglomeration characteristics. Guangdong, Henan, and Zhejiang are provinces with both high out-degree and indegree, requiring special attention.
In order to identify the diffusion process of COVID-19, the specific inflow and outflow provinces are observed in the different periods. Because of the limited cases counts, the situation in the sporadic appearance period is not discussed. ①The diffusion path during the fluctuating rise period is represented by "one source and two sinks". Among this, "one source" is Hubei and its surrounding Anhui and Zhejiang provinces, with Hubei as the main outflow province. The "two sinks" have begun to take shape, mostly consisting of north and south. The northern sink is primarily composed of two provinces near Hubei, Anhui, and Zhejiang. The south sink is an area centered on Guangdong, including Guangdong, Guangxi and Hainan. ②During the stable high-risk period, the diffusion path changed to "two sources and two sinks". Compared with the previous period, the other source is Guangdong, with Guangdong only in the south sink and the north sink adding the province of Henan. ③Finally, in the fluctuating decline period, the source gradually spreads from the core provinces to southeast China, which becomes the high-frequency outflow area while the sinks add Sichuan located in the southwest.
Therefore, it is concluded that the overall diffusion path is represented by the distribution of “one area and two sinks”, the “one area” being a southeast sector-shaped area with Hubei as the center, and the “two sinks” are the provinces of Guangdong and provinces north of Hubei separately. Hubei and Guangdong are the chief outflow and inflow provinces.
City-level properties. Key intermediary cities are also identified by calculating the node betweenness (that is, the number of the shortest paths between any pairs of nodes that pass through an intermediary node.), as presented in Fig. 6 and Table 4. Affected by this pandemic's widespread, nearly all the cities in China have been involved, but the node betweenness is generally low. Wuhan and Shenzhen had the most substantial influence on the flow, with node betweennesses at 19.9% and 10.4%, respectively. Except for the southeastern coastal cities like Shenzhen, Wenzhou, and Suzhou, and Hubei neighboring cities, like Xinyang, the remainder of the top 20 cities are all provincial capitals or municipalities.
The edge betweennesses (that is, the number of shortest paths between any pairs of nodes that run along a path) of the key paths are listed in Table 4. The top 20 paths controlled only 21% of the total number of paths. In general, the out-flow cities were mainly distributed southeast of the Hu line, which is consistent with China's population distribution characteristics. The source is cities usually in Hubei or provincial capital cities, and the destination is mostly cities radiated by a sector-shaped formed with Wuhan as the center of the circle (Table 4). For example, the main path often originated in Wuhan, Beijing, Nanchang, Changsha and other cities. However, the destinations are most densely populated and economically developed cities such as Wuhan's neighboring cities, and the cities located along the southeast coast.
Table 4
Key confirmed cases’ paths with the highest edge betweennesses in the city-level informal
Ranking
|
Source
|
Target
|
Edge betweenness(%)
|
1
|
Wuhan
|
Xinyang
|
2.41%
|
2
|
Wuhan
|
Wenzhou
|
1.92%
|
3
|
Wuhan
|
Nanning
|
1.52%
|
4
|
Xiangyang
|
Shenzhen
|
1.32%
|
5
|
Shanghai
|
Suzhou
|
1.31%
|
6
|
Wuhan
|
Nanchang
|
1.18%
|
7
|
Taiyuan
|
Jincheng
|
1.14%
|
8
|
Huanggang
|
Anqing
|
1.02%
|
9
|
Wuhan
|
Taiyuan
|
1.01%
|
10
|
Beijing
|
Zhengzhou
|
1.01%
|
11
|
Shenzhen
|
Haikou
|
1.00%
|
12
|
Changsha
|
Fuyang
|
0.93%
|
13
|
Shenzhen
|
Huizhou
|
0.89%
|
14
|
Zhengzhou
|
Shangqiu
|
0.89%
|
15
|
Changsha
|
Shenzhen
|
0.53%
|
16
|
Nanchang
|
Maanshan
|
0.53%
|
17
|
Shanghai
|
Xianyang
|
0.53%
|
18
|
Nanchang
|
Zhanjiang
|
0.51%
|
19
|
Wuhan
|
Chuzhou
|
0.51%
|
20
|
Haikou
|
Suzhou
|
0.51%
|
In this study, cities in bold are the provincial capital cities. |
The three major categories are short-term (11.3%), long-term (45.1%), and other flows (43.6%). Excluding other flows, returning home for visiting relatives ranked first (30.4%), followed by migrant workers returning home (13.2%), and returning home after traveling (8.3%). A visual analysis of short-term and long-term mobility was conducted for returning home to visit relatives and migrant workers returning home. No separate explanation is provided for returning home after travel, but only an overall analysis of short-term flows (Fig. 7). Inherently, compared with long-term flows, short-term flows are mostly over in short distances, mainly to tourism cities like Sanya, Kunming, and Guilin, some developed cities and some cities near with Wuhan while long-term flows scattered outwards with Wuhan as the center, always over long-distance and involving most cities in China. At the same time, we also found that about half of confirmed cases migrated with their families while traveling, returning home to visit relatives, or attending gatherings. The ratio of the number of cases who migrated with their families to the total number of confirmed cases in each category is 43.3%, 25% and 17%, respectively. The average number of migrants per household is 3, 2.8, and 4.4, respectively.
There are also significant differences in the flows of visiting relatives and migrant workers returning home. For example, flows for visiting relatives are concentrated in the east with Wuhan as the boundary. The first four paths are Wuhan-Shenzhen, Wuhan-Xinyang, Wuhan-Haikou, and Wuhan-Nanyang, while the flows for returning home from work are concentrated in the north of Wuhan. Wuhan-Yichun, Wuhan-Xinyang, Wuhan-Anqing, and Wuhan-Chongqing, the top four flows are all cities near Hubei. Flows of visiting relatives are farther than returning home from work; 55% and 66% of the paths are from provincial capital cities respectively, which opposes the Chinese population's flow pattern.