Sampling information of 603 bats in Southern China
In this study, a total of 603 anal swabs from distinct bat individuals were collected from southern China (Fig. 1a, Supplementary Table 2). The samples represented a diverse set of 13 bat species spanning seven genera across four families (Fig. 1b). The samples were collected from 13 locations affiliated with Guangzhou, Foshan, Huizhou, Shenzhen city in Guangdong Province and Qiongzhong city in Hainan Province (Fig. 1c) from 2008 to 2021 (Fig. 1d). Particularly, the majority of Rhinolophus bat individuals (299 out of 351) were collected from two caves (Site6_HZ and Site1_GZ) in Huizhou and Guangzhou from 2012 to 2021, respectively (Fig. 1e). Within Site6_HZ and Site1_GZ, 154 anal swabs of R. affinis and 145 anal swabs of R. sinicus were collected, respectively.
Virome diversity and their host-switching patterns related to Rhinolophus bats
Metatranscriptomic sequencing of the 603 individuals generated 18.4 Tb raw data, with an average data size of 30.6 Gb per individual (Supplementary Table 2). The filtering of low quality, ribosomal RNA (rRNA) and duplicated reads yielded an average of 10.3 Gb clean data per individual. The following de novo assembly and viral genome annotation of the 603 metatranscriptomes revealed 133 vertebrate-infecting viral clusters with 80% average nucleotide identity (hereafter vANI80). The 133 vANI80 clusters comprised 104 RNA vANI80 clusters and 29 DNA vANI80 clusters from 15 families (Supplementary Table 3), including 74 vANI80 clusters identified in R. affinis and R. sinicus (Fig. 2a, Supplementary Table 4). Interestingly, despite the viruses shared between Rhinolophus species, none of the vANI80 clusters were shared between Rhinolophus bats and other bat genera (Supplementary Fig. 1a). The vANI80 representative sequences (n = 133) included 93 medium (completeness > 50%) and 65 complete/high quality (completeness > 90%) genomes (Fig. 2b). The major viral families included Adenoviridae (n = 5), Astroviridae (n = 55), Caliciviridae (n = 5), Circoviridae (n = 6), Coronaviridae (n = 9), Paramyxoviridae (n = 4), Parvoviridae (n = 12), Picornaviridae (n = 24) (Fig. 2a).
Using the RAPs as evolutionary markers, we inferred phylogenies of major viral families. Only 21.8% of the vANI80 representatives were closely related to (> 90% average amino-acid identity, RAP-AAI) the known viruses (Supplementary Table 3), reflecting that most viral diversity in bats remains to be described. For R. affinis and R. sinicus, the comparison against public records from the VIRION database [44] showed that only 21/74 of vANI80 clusters were identified previously (Supplementary Fig. 1b), reflecting the strong specificity of viral distribution. The novelty of the 133 vANI80 representatives also varied across viral families. Among major viral families with more than five vANI80 representatives, the median RAP-AAI against the known viruses was relatively high in Coronaviridae (99.51%) (Fig. 2c). In contrast, the median RAP-AAI of the rest major viral families (n > = 5) were all less than 80%. Several viral families (Astroviridae, Picornaviridae, Parvoviridae) comprised phylogenetic clades with at least five novel vANI80 clusters. Notably, a group of 46 vANI80 clusters of Mamastrovirus was newly identified in Astroviridae (Fig. 2a). Particularly, all the astroviruses in R. affinis and R. sinicus were newly identified (Supplementary Fig. 1b). We also identified two complete viral genomes with high novelty (< 30% RAP-AAI) to known viruses (Supplementary Fig. 2). In Hepeviridae, we identified a virus (HepV-1) in R. sinicus with only 29.7% RAP-AAI to known hepevirus (Supplementary Fig. 2a). The infection of hepevirus causes hepatitis in mammals and birds. The deep phylogenetic divergence between HepV-1 and other hepeviruses suggests that bats may have played an important role in the ancient introduction of hepeviruses into mammals and other animals. In Parvoviridae, we identified a virus (ParV-14/CH035) encoding four open reading frames (ORFs), which is unusual compared to the typical nonstructural and structural ORFs found in classic parvoviruses (Supplementary Fig. 2b). Taken together, the undescribed viral diversity here reflected an imbalanced effort across viral groups of previous surveillance.
Using the RAP-AAI between the vANI80 representatives and their nearest neighbor from other host taxa, our results showed that host-switching tend to occur at different phylogenetic depth across viral families (Fig. 2d), which might relate to varied extent of host specificity. Most (42/74) of the Rhinolophus vANI80s showed close relatedness (> 75% RAP-AAI) to the viruses of other mammalian genera. Among them, 21 Rhinolophus vANI80s showed the closest relatives from the bat genus Hipposideros, mainly distributed in Astroviridae, Parvoviridae and Picornaviridae, reflecting virus-host co-divergence during long-term evolution (Supplementary Table 4). Nonetheless, it seems that host-switching does not always occur between mostly related host-taxa. Among bats, 9 Rhinolophus vANI80s showed close relatedness to viruses from the bat genera (Myotis, Miniopterus, Eptesicus, Scotophilus) of a distinct suborder (Yangochiroptera). Furthermore, 11 Rhinolophus vANI80s were closely related to the viruses of humans (CoV-4), pigs (CoV-8), dogs (CV-8), non-human primates (ParV-2,4,5,12), and Beluga whale (PicoV-2,4,8,13). These observations underscore the frequent switching across host taxa with distant evolutionary distance.
Viruses related to human or livestock infection in Rhinolophus bats
Among the identified viruses, the ones with at least 90% AAI to the known viruses were detected in a considerable proportion of the viral positive Rhinolophus bats (64/100 for R. affinis and 58/79 for R. sinicus), reflecting their broad distribution among natural hosts (Supplementary Table 5). Among them, nine vANI80 clusters were closely related to human or livestock pathogens (Supplementary Table 6).
For SARSr-CoV (CoV-4), our study revealed 10.1% (35/351) positive rate of SARSr-CoVs among Rhinolophus individuals (Supplementary Table 5). The signature of two key deletions[45] within the receptor binding domain (RBD) suggests non-ACE2 usage of the identified SARSr-CoVs (Supplementary Fig. 3a). The RdRp (nsp12) phylogeny showed that the identified SARSr-CoVs grouped with the public SARS-CoVs from Guangdong (Fig. 3a). The genomes of SARSr-CoVs exhibited 93.5% average nucleotide identity between the members of Guangzhou and Huizhou. This contrasted with the 97.5% average nucleotide identity among members of the same bat roosts, suggesting spatial heterogeneity of viral diversity within the same province (Supplementary Fig. 3b).
Another viral group with major spillover risk concern is the SADSr-CoV (CoV-8), which was detected in both R. affinis and R. sinicus. The RdRp phylogeny of SADSr-CoVs was structured corresponding to their host species. In Huizhou, one SADSr-CoV carried by R. affinis (strain SADSr-CoV/HZ/200604) showed the closest genetic relationship with SADS-CoV isolated from swine (Fig. 3b). The S-gene of SADSr-CoV/HZ/200604 showed 96.6% nucleotide similarity to that of SADS-CoV from piglet, with completely identical amino acid sequence in the CTD region (Supplementary Fig. 4a). To verify the infectivity, we constructed the pseudovirus encoding the S-gene of SADSr-CoV/HZ/200604. The results showed that the pseudovirus could infect porcine kidney (PK-15) cells (Fig. 3c), indicating the spillover potential for domestic pig infections. Further recombination analyses showed that SADSr-CoVs have undergone recombination with SADSr-CoV/HZ/200604 and SADSr-CoV/162140 strain (Supplementary Fig. 4b, Supplementary Table 7), with the recombination hotspot surrounding the S-gene (Supplementary Fig. 4c). Remarkably, genome comparisons of SADSr-CoVs (CoV-8) revealed exchange of genomic segments between viruses from distinct host species, suggesting ongoing host sharing during host-specific viral diversification (Supplementary Fig. 4d).
Some other viruses from Rhinolophus bats showed evidence of bat-like origin of human pathogens. In Picornaviridae, one vANI80 cluster (PicoV-12) from both R. affinis and R. sinicus showed close relatedness to (92.7% RAP-AAI) the Aichivirus A (AiV-A) (Fig. 3d), a foodborne pathogen associated with gastrointestinal illnesses. The clade of AiV-A comprised viruses of a wide host range, including kobuviruses in humans (AiV-A1), canines (AiV-A2), sewage (AiV-A3), felines (AiV-A4), birds (AiV-A5), and rats (AiV-A6). Particularly, the viral phylogeny here showed the first-time detection of AiV-A in Rhinolophus bats, expanding the host range among bat taxa. However, it is still hard to infer the direct animal origin of AiV-A1 in humans.
In Caliciviridae, two vANI80 clusters (CalV-1 and CalV-3) from R. affinis and R. sinicus showed close relatedness to (75.6% RAP-AAI) the human Norovirus (Fig. 3d), a foodborne viral agent causing gastroenteritis and diarrhea outbreaks worldwide. For the norovirus-related bat viruses, CalV-1 and CalV-3 and other Rhinolophus viruses formed distinct clades corresponding to their host species, suggesting strict specificity among bat species. The phylogeny of diverse norovirus-related Rhinolophus viruses surrounding the human and canine noroviruses suggested the possible bat origin of the Norovirus.
In Parvoviridae, four vANI80 clusters (ParV-2, ParV-12, ParV-5, ParV-4) were mostly related to Adeno − associated viruses (AAVs, genus: Dependoparvovirus) from humans with the RAP-AAI ranging from 79.8–77.7%. The phylogenetic relationship between Bat AAVs and other mammalian AAVs suggests that bats are the origin of mammalian AAVs (Fig. 3d). The four clusters of Bat AAVs detected in the present study, and the previously reported Bat AAVs, appeared to form an ancestral branch for all known mammalian AAVs, including Adeno-associated virus A and Adeno-associated virus B, which consist of all known human AAVs. Due to the deficiency in replication, Adeno-associated viruses need to co-infect with a helper virus, including adenoviruses or other DNA viruses[46]. The higher diversity across bat species suggests a bat origin of Adeno-associated viruses. Interestingly, our data showed co-occurrence of Adeno − associated viruses and adenoviruses. Most (7/8) AAV positive samples also carried Adenovirus in Rhinolophus bats (Fig. 3e), supporting the biological association between the two viral groups.
Distinct viral community dynamics between R. affinis and R. sinicus
Systematic viral surveillance requires comprehensive understanding of viral community structures. We elected to compare the core viral communities (vANI80s occurred in at least two samples) of R. affinis (n = 154) and R. sinicus (n = 145) in Site1_GZ and Site6_HZ, considering the sufficient sample size in these two regions. The two Rhinolophus species showed some seemingly similar viral community features, including viral family composition (Fig. 4a), viral positive rate (Supplementary Fig. 5a) and high occurrence of coronaviruses (Fig. 4a). Nonetheless, R. affinis and R. sinicus were distinct in the composition of vANI80 clusters and the diversity of their viral communities. The two Rhinolophus species only shared 9/51 of the core vANI80s (Fig. 4a), reflecting a maintained host specificity regardless of their ecological overlaps. In Site1_GZ and Site6_HZ, 36 core vANI80s were detected in R. affinis, while only 24 were detected in R. sinicus. The higher viral diversity in R. affinis was mainly contributed by the high prevalence (Fig. 4b) and higher diversity (Fig. 4a) of Astroviridae and Picornaviridae (Supplementary Table 8). For example, 16 vANI80 clusters of astroviruses were specifically identified in R. affinis while only two astroviruses were specifically identified in R. sinicus, reflecting the distinct viral family competence between R. affinis and R. sinicus despite their close evolutionary relatedness. We compared the number of vANI80s for each bat individual. At individual level, R. affinis showed significantly higher number of vANI80s than R. sinicus per individual in Site6_HZ (Fig. 4c). Among bats, R. affinis has a higher percentage (35.06%, 54/154) of individuals carried more than one core vANI80s compared to R. sinicus bats (24.83%, 36/145). Particularly, eight R. affinis individuals showed co-infection of different CoV vANI80s, while only two R. sinicus individuals showed CoV co-infection. To demonstrate the complexity of viral communities, we further compared viral nucleotide diversity between R. affinis and R. sinicus using the single nucleotide variant data per vANI80 cluster. Rhinolophus affinis showed a higher nucleotide diversity of intra-specific viral populations compared to R. sinicus (Fig. 4d, Supplementary Table 9), suggesting a more complex origin of viral populations in R. affinis. Using the iSNVs, we found out that R. affinis has a higher proportion of individuals with sufficient (> 30X) viral coverage (10/49) carrying genetically distinct viruses of the same vANI80 cluster compared to R. sinicus (4/35), suggesting higher potential for viral genetic exchange in R. affinis (Supplementary Fig. 5b).
For geographic differences among viral communities, R. affinis showed higher percentage (61.11% vs. 41.67%) of core vANI80 clusters shared across Site1_GZ and Site6_HZ, suggesting more frequent viral transmission across geography (Supplementary Fig. 5c). In general, R. affinis showed higher proportion of individuals carrying viruses across geography than that of R. sinicus, supporting the distinct capacity of viral transmission between host taxa (Fig. 4e). The vANI80s of R. sinicus also showed significantly higher relative genetic differentiation between Site1_GZ and Site6_HZ compared to that of R. affinis, suggesting a more geographically structured viral population (Fig. 4f).
For the dynamics of viral populations, the bat species R. affinis showed an elevated intra-specific nucleotide difference of vANI80 clusters within the same site across years. However, we did not observe significant genetic difference across years in R. sinicus. For each viral family, the family of Coronaviridae and Astroviridae showed the highest proportion of vANI80s detected across years (Supplementary Fig. 5d). In Site6_HZ, we could not compare the viral communities across time in R. sinicus, given that only four R. sinicus bats were viral positive in 2020 and 2021. For R. affinis in Site6_HZ, most (21/24) core vANI80 clusters in 2013 were detected in 2020 or 2021, suggesting long-term stability of viral communities (Fig. 4g). In contrast, we observed a small overlap of core vANI80 clusters between 2020 and 2021 of R. affinis in Guangzhou (5/10) and Huizhou (7/24), respectively, suggesting temporal changes of viral community.
Distinct host population patterns between R. affinis and R. sinicus
The activities of host population are expected to be informative when forecasting pathogen distribution. To investigate the geographic structuring of host populations, we detected the SNVs within the coding regions of single-copy orthologouses (SCOs) shared across R. sinicus and R. affinis. Due to the absence of a complete genome assembly for R. affinis, we employed the assembled transcriptomic data for SCO detection in this species. We identified 1,345 SCOs shared between R. sinicus and R. affinis, with over 70% of the samples showing read coverage in more than 50% of the coding regions. Following the filtering of missing data and SNV singletons, 13,302 SNVs from 149 R. affinis samples and 12,127 SNVs from 135 R. sinicus samples were used for subsequent comparison. Using principal component analysis (PCA), R. sinicus comprised two major ancestral populations, while R. affinis did not show geographic differentiation (Fig. 5a). We also estimated FST among bat populations. The genetic differentiation of host populations between and within geographic sites support the major differentiation in R. sinicus (Fig. 5b).
To investigate the host dynamics, we compared the nucleotide differences among samples within and between years. In general, R. affinis population and R. sinicus population showed similar levels of nucleotide diversity (mean: 0.00086 for R. affinis and mean 0.00088 for R. sinicus). For the comparison across time, R. affinis showed a decreased nucleotide diversity from 2013 to 2021, while R. sinicus showed an increased nucleotide diversity, suggesting a different trend of population dynamics within the two host taxa. Interestingly, for R. sinicus in Huizhou, the genetic differentiation between Guangzhou and Huizhou decreased from 2020 to 2021 (Fig. 5c), suggesting that the increased diversity of R. sinicus was mainly contributed by the increased gene flow across geography.
Genome comparison of SARSr-CoVs reveals modular viral evolution and dispersal among Rhinolophus species
Host specificity and geographic dispersal are expected to define the extent of genetic exchange among viral populations. Given the extensive availability of public sequences, we chose SARS-related CoVs (SARSr-CoVs) in China defined (L1 lineage) in a recent survey[6] as a representative viral group to compare patterns genetic exchange among Rhinolophus bat species and among provinces (Supplementary Fig. 6a, Supplementary Table 10).
After the removal of nearly identical genomes (> 99.9% ANI), the SARSr-CoVs showed major inter-host genetic differentiation (FST) among host species, especially when compared to the differentiation across provinces (Fig. 6a). This suggests a major impact of host specificity in structuring the viral population. We then estimated the recombination relationship among viral genomes (Supplementary Table 11). The major and minor parents contributing to the recombination event showed lower genome identity when their host taxa were different (Fig. 6b). Interestingly, edges of recombination were enriched among viruses from different host species (P-value = 3e-39), when compared to the null distribution of edges under a Bernoulli process within a random graph model (Fig. 6c). Our findings here suggest the underestimated genetic exchange among viral communities across host taxa.
Given the importance of pathogen surveillance, we tried to identify the host or geographic hotspot for recombination using the subsampled viruses from each group. Across host taxa, SARSr-CoVs of R. pusillus had the highest number of recombination connections and the highest nucleotide diversity of S-gene (Fig. 6d). Across geography, SARSr-CoVs from Hunan province showed the highest number of recombination connections (Supplementary Fig. 6a,b). Consistently, provinces with higher number of viral host species also showed higher level of viral diversity (Spearman, 0.6394, P-value = 0.02519) (Supplementary Fig. 6c), with Hunan showing the highest diversity of S-gene (Supplementary Fig. 6d). Those viral populations may therefore comprise higher genetic plasticity for inter-host recombination and the development of spillover risk.
To unravel how recombination shapes diversification across the viral genome, we compared FST value among genes. Here, we assume that host specificity represents a more significant force shaping population structure compared to geographic difference within the same host. Interestingly, we did not observe similar genome-wide profile of gene FST between inter-host populations and inter-geography populations (Supplementary Fig. 7a), suggesting distinct patterns of viral diversification between short-term versus long-term isolation. For example, regions encoding RdRp-related nonstructural proteins (nsp8-14) showed the lowest FST among host taxa, while they were consistently elevated across provinces within R. sinicus. The RdRp-related nsps showed a higher proportion of synonymous mutations compared to other genes (Supplementary Fig. 7a). The low FST observed in RdRp-related nsps might be driven by their high saturation of synonymous polymorphisms, potentially hindering fixation during long-term between-host isolation.
Across the viral genome, gene FST among host taxa negatively correlated with recombination density (Fig. 6e). Indeed, genome-wide distribution of recombination frequency showed a major elevation surrounding the S-gene region of the genome (Fig. 6f). The S-gene also showed the highest nucleotide diversity and highest proportion of non-synonymous mutations among genes (Supplementary Fig. 7a), reflecting adaptive evolution. Based on pairwise comparison of sequences, we observed valley between peaks in the identity distribution of RdRp (95% ANI) and S-gene (92% ANI), which typically represents the boundary between evolutionary clusters (Supplementary Fig. 7b, Supplementary Table 12) [47]. Using the cut-off of 95% ANI on RdRp (nsp8-14), the viral genome pairs showed higher divergence of S-gene when their viral host species were different (Fig. 6g). In contrast, S-genes with high similarity (> 92% ANI) were shared across Rhinolophus species (Supplementary Fig. 7c). The result suggests that the elevated recombination may introduce gene flow and, thus, reduce genetic differentiation and maintains an interconnected gene pool across viral populations[48]. In conclusion, our findings reveal a complex interplay between recombination and selective pressure, leading to diverse patterns of genetic differentiation across viral genome.