Enhanced surveillance in Guangzhou
Guangzhou, the capital city of Guangdong Province, is a transport hub in South China with over 15 million inhabitants. From February 27 to April 11, there were 443 inbound flights, with more than 56,000 passengers arriving at Baiyun Airport, Guangzhou. As local infections associated with overseas arrivals continued to increase, Guangzhou took stringent measures to trace and control the cases. The city has been implementing a “five-shield” system to prevent the spread of SARS-CoV–2 from passengers arriving in Guangzhou. The system includes (1) status check at checkpoints; (2) medical observation and isolation at designated venues; (3) community screening; (4) close contact screening; and (5) fever clinics and treatment at hospitals.
Since March 21, all travelers entering Guangzhou have been required to undergo central quarantine by taking nucleic acid tests for SARS-CoV–2 and undergoing a 14-day quarantine at designated venues. The city has also conducted screening that involved people who arrived before the central quarantine measure was put into place, as well as people from high-risk regions and via high-risk flights.
As of April 5, Guangzhou has screened 6,353 people. Between March 15 to 12 April, 119 imported cases have been confirmed. 25 of them were of foreign nationality and 94 were of Chinese nationals. This was coincident with an increase in SARS-CoV–2 infection in the local communities (Figure 1a).
Patients’ characteristics and contact history
We obtained nasopharyngeal samples between March 12 and April 15 in this study. These included 109 (109/119, 91.6%) imported cases, 69 local cases and 5 information missing cases. The imported cases were from 25 countries, including Asia (n = 26), Africa (n = 28), Europe (n = 36), North and South Americas (n = 19) (Stab 1). Most of the local cases had been in close contact with imported SARS-CoV–2 cases or related infections17. Additionally, we included 17 samples, collected between January 23 and February 22, from local cases who were confirmed to have contracted the virus from Hubei cases. All the 200 cases included in this study were classified for the severity of the disease according to accepted criteria. The majority (98%) were classified as mild, and only three cases were critical or severe (Stab 1).
Viral genome sequencing
We conducted multiplex SARS-CoV–2 specific amplification followed by next generation sequencing (NGS) to obtain viral genome sequences. We reported the results of a high coverage genome analysis (at least 10X sequence coverage for more than 90% of the genome nucleotides) for 73% (n = 146) of cases, each sequenced to a mean depth of > 12,000-fold (SD = 5,483). The rest of the cases (27%) were sequenced to a median genome coverage of 67% (Figure 1b and Stab 1). The genome coverage was correlated with viral loads, which were quantified by Ct values of real-time reverse transcription-polymerase chain reaction (qRT-PCR) assay18 (Figure 1c). Using genome sequence of the Wuhan-hu–1 strain as the reference, we found that the density of single nucleotide polymorphisms (SNPs) of the virus was ~0.2 nucleotide (nt) per 1,000 nts, which showed no significant correlation with Ct values (Fig 1c). C-to-U substitutions dominated the variations. In addition, 56.2% of the SNPs were amino acid-changing (non-synonymous, Figure S1).
Lineages of imported strains within the global phylogeny
To trace the foreign imported SARS-CoV–2 strains, we constructed a maximum likelihood (ML) phylogenetic tree of high-coverage genomes from 77 foreign imported strains, in combination with the data from 6,453 genomes shared by researchers worldwide (till Apr 20, 2020) (Figure 1e). These viral strains were widely distributed in the global SARS-Cov–2 phylogeny and were highly diverse. With the GISAID nomenclature, 49 of the imported cases could be identified as G clade (characterized by D614G in S protein, A23403G in genome), followed by 6 in S clade (L845S in ORF8, U28144C) and 4 in V clade (G251V in ORF3a, G26144U). The rest were assigned to other clades, including 17 from Asia and 1 from Europe. With another nomenclature suggested by ref.10, 3 of 77 imported strains (3.9%) could be classified as lineage A that shared nucleotides at position 8782 (U) and 28144 (C) with the closest known bat virus RaTG136, and 71 of 77 were from lineage B with different nucleotides (CU). Hereafter we adopted the lineage A/B nomenclature.
Closed related strains of the imported SARS-CoV–2 could be identified and assigned to detailed lineages (SFig2–3 and Stab3), as numerous public genomes from European and North America are now available. 41 of 77 imported infections were assigned to two sub-lineages mainly sampled in Europe/North America (B.1) and Southeast/West Asia (B.6), in concordant with countries/regions where they were traveling from. A further analysis showed that the B.1 viral strains, imported from Europe (n = 18), North America (9) and South America (1), shared the SNPs C241U, C3037U, C14408U, and A23403G.
Our viral genomic data in the imported cases have broaden the phylogeny of the SARS-CoV–2, and expanded the global coverage in areas where the viral genomic surveillance data are currently limited. We found that many genomes were closely clustered within country, including the ones from Philippines (n = 11), Pakistan (5) and Thailand (2); whereas the three genomes from Dubai in United Arab Emirates diverged into two sub-clusters (Figure 2f). On the other hand, the strain from Ethiopia were heterogeneous, possibly due to the fact that Ethiopia has a major hub airport in Africa. In particular, five of these Ethiopia strains could be split into four sub-clusters in the phylogenetic tree of Africa, compared to the fewer genetic branches from Nigeria strains (Figure 1g).
The phylogenetic analyses also offer a unique opportunity to investigate how the virus is transmitted among passengers on the same flight and among family members (Stab3). For instances, among the ten imported cases from five countries arriving at Guangzhou on the same airplane, the strains could be assigned to two haplotypes (n = 5 and n = 4) of B.1 lineage and one haplotype of A lineage (n = 1). A couple travelled together were infected by different viral strains; one of them lived in Ethiopia and the other in The Republic of Congo.
Phylogenetic analysis of imported and local cases
Next, we investigated the relationships between foreign imported and local spreading SARS-CoV–2. With a cutoff of >90% genome coverage, we included 77 imported and 52 local cases (March - April) in the ML phylogenetic analysis.
Despite the diversity of foreign viral strains, most of the local infections (50 of 52) were related to two specific lineages imported from Africa countries (Figure 2). The first lineage, denoted as L1, included 38 local cases and 5 imported cases from Nigeria or Ethiopia. L1 was characterized by C15324U (Asn5020Asn of ORF1b) and descended from the B.1 lineage. About 4% (260 of 6453) of the publicly available SARS-CoV–2 genomes share the same haplotype with L1 at these positions. These genomic data were reported in five continents (including Europe, Africa, Oceania, North and South America), but had not been previously identified in Asia (Stab 2).
The second lineage circulating in local cases was a descendant from L1 (denoted by L2) that harbored an additional C19524U (Leu6420Leu of ORF1b). Twelve local cases and three imported cases, from Uganda, Tanzania, and Ivory Coast, respectively, could be assigned to L2, as well as one case from Nigeria with a low genome coverage. Up to this writing, the L2 has not been reported in any public genome datasets.
Intriguingly, two local cases belonged to a new lineage characterized by G25563U (denoted as L3). Although many imported cases from Europe (n = 7), North America (n = 7), Africa (n = 5) also harbored the G25563U, they shared no other SNPs (such as C2416U or C1059U) with the two local cases (Figure 2). Thus, there was a lack of evidence that the L3 virus passed from international travelers to the local patients. This was consistent with their exposure and contact history (See Methods).
Notably, we found that four imported strains from Nigeria (n = 2), Ethiopia (n = 1), and Angola (n = 1) with G25563U were descended from the lineage with G25563U and C2416U. These strains need to be considered as novel ones in the current global phylogeny, as they contained a haplotype of two rare SNPs (C5654U and C16846U) (Figure 2). A careful examination of the global phylogeny suggested that this haplotype is likely to be the result of a recombination between strains sampled in European (GISAID accession, EPI_ISL_428358 and EPI_ISL_420045) and Asian (EPI_ISL_420084) (SFig 4).
To explore the genetic characteristics of viruses among different waves of COVID–19 outbreak in China, we conducted a separate analysis to include 12 strains sampled (high genome coverage) from January to February 2020, seven of which were importation from Hubei province. Comparison of the phylogenetic information showed that the viral strains obtained in January were distinct from the imported strains identified in March and April (Figure 2).
The transmission in local community
To elucidate the spread of imported strains in local communities from March to April, we conducted a detailed analysis by including both viral genomic data and contact tracing information.
Besides the 51 local cases with high-coverage viral genome, 13 other local cases, in despite of the < 90% viral genome coverage, could be assigned to each of the L1 to L3 lineage based on characteristic variants.
Finally, 47 of 64 local infections were predominantly assigned to L1, followed by 13 to L2 and 4 to L3. It should be noted that the sample collection date for the first L3 case was prior to the ones for their corresponding imported cases, and it is likely that the genome sequencing data was not available for a few imported cases (Figure 3a-b).
38 of these 64 local infections were visitors to each of the three locations, including a restaurant, a tavern, and a trading market (Figure 3c). Before the onset of the disease, L1 local cases (n = 5) had a history of close contact with the L1 imported cases (n = 3) in the restaurant, and L1 imported cases (Trading market: n = 2, Tavern: n = 1) were visitors to all the above-mentioned locations. L3 local cases did not have a direct contact history with any of the imported cases, but they had previous contacts with other visitors to these locations.
One household and two close contacts of visitors to these locations were infected by L1 or L2 viral strains. However, a total of 20 local cases had neither history of visiting these locations, nor any contact with visitors in these areas. They were assigned to lineages L1 (n = 13) or L2 (n = 7), of which there is one L1 lineage household (n = 3) and one L2 lineage household (n = 2).
In sum, imported viral strains from all the three lineages circulated in local communities, but with distinguished scale. Of all the local cases, we did not identify any other SNPs that could be utilized to form new sub-lineage (Fig 2), suggesting that the circulation of imported SARS-CoV–2 in
local community might have been limited.
Genomic deletions and intra-host variations
Among the genomes that we studied, 10 harbored 12 events of viral in-frame deletions on SARS-CoV–2 genome, many of which were intra-host variations. As deep sequencing was error-prone for deletions and insertions, we filtered and selected deletions with a >30% mutated allele frequencies (MuAFs), and validated the results by Sanger sequencing (SFig 5a). These deletions were located at seven loci of the viral genome, with a length ranged from one to 14 amino acids (AAs, median 3). At the 82–86 AAs of the ORF1a protein, four cases, including importations from different countries, had 3–5 AA deletions. Other in-frame deletions located in the S, M and N proteins. In the S protein, one in-frame deletion P589del (MuAF 34.28%) was identified in a Europe-imported case. This deletion was not close to the junction site of S1 and S2 sub-units, and was predicted functionally conservative. No enrichment of these events was found in the S protein. We detected shared frame-shift deletions in the ORF8 for three cases, who were households (SFig 5b).
To further investigate the intra-host selection of SARS-CoV–2, we analyzed the intra-host single nucleotide variants (iSNVs) with a MuAF ranged from 5% to 95%. Given the noise in iSNV when viral loads are low, we included only the samples with a <29 Ct value that exhibited no-significant correlation between iSNV density and Ct value (SFig 6 a-b). The results showed a strong purifying selection of the intra-host SARS-CoV–2. However, we found that iSNVs, whose loci were shared among individuals, were under a stronger purifying selection than those sporadic singleton iSNVs. The MuAFs of shared non-synonymous iSNVs were significantly lower than the synonymous shared iSNVs (two-side Wilcoxon rank-sum test, P = 0.037), as with non-synonymous singleton iSNVs (P = 0.0163, Figure 4c). The MuAF spectrum of shared non-synonymous exhibited the largest deviation from the expectation under neutral selection (Figure 4d-e).
We next examined the substitutions of iSNVs, and found that the distribution of the shared iSNVs was not concordant with that of SNPs (Figure 4f-g and SFig 1). Both the singleton iSNVs and SNPs had many G-to-U. Shared SNPs were dominated by C-to-U while shared iSNVs showed a G-to-U and A-to-G pattern. Interestingly, singleton iSNVs were dissonant with shared ones in terms of MuAFs distribution (Figure 4h-i). For the shared iSNVs, both C-to-U and U-to-C had significantly higher MuAFs than the other substitutions (two-side Wilcoxon rank-sum test, P < 0.001). These findings support a potential selection advantage of C-to-U during viral transmission.