Characteristics of the study dataset
Ultimately, a total of 6578 HIV-1 pol gene sequences from treatment-naive patients from 24 provinces in China between 2000 and 2016 were obtained. Of these patients, 5155 were male (78.37%) and 313 were female (4.76%). The patient characteristics are shown in Table 1. The largest group of patients (3067, 46.6%) were infected with CRF01_AE, followed by CRF07_BC (2013, 30.6%) and subtype B (836, 12.7%) (Table 2). The remaining patients were infected with other pure subtypes or circulating recombinant forms (CRFs, 662, 10.1%). Subtype B infections decreased from 34.1% to 11.2% from 2000 to 2016 (p < 0.001). CRF01_AE and CRF07_BC infections increased over time (29.5% to 47.1%, p < 0.001; 11.4% to 31.6%, p = 0.0229) (Fig. 1). The composition of the risk behaviors was 4414 (67.1%) men having sex with men (MSM), 654 (9.9%) heterosexual sex, 622 (9.5%) injected drug use (IDU), and 888 (13.5%) other risk behaviors or data unrecorded (Table 2).
Prevalence and temporal trend of transmitted drug resistance
HIV with TDR was demonstrated in 274 (4.51%) patients in the period from 2000 and 2016. Table S1 shows information for all the HIV-1 sequences with TDR mutations. Mutations conferring resistance to NRTIs, NNRTIs, PIs, and multiclass drugs were found in 92 (1.40%), 100 (1.52%), 123 (1.87%), and 37 (0.56%) patients, respectively. The prevalence of TDR in both CRF01_AE (5.31%) and subtype B (6.58%) was higher than that in the other subtypes (Table 2). Among subtypes, CRF01_AE had the highest prevalence of PI-TDR, while subtype B had the highest prevalence of NRTI- and NNRTI-associated TDR. Among the different risk behavior populations, the prevalence of HIV with TDR in heterosexual individuals was relatively high (7.34%). MSM had the highest prevalence of PI-TDR among the risk groups, while the prevalence of NNRTI-TDR in heterosexuals was higher than that in IDU and MSM groups (Table 2).
The prevalence of drug-resistance mutations in all sequences is shown in Table 3. Overall, the most frequent mutation was M46L, which was found in 0.89% (n = 58) of patients, followed by K103N (0.55%, n = 36), M46I (0.55%, n = 36), and M184V (0.40%, n = 26). Furthermore, in patients infected with strain CRF01_AE, the most frequent mutation was M46L (1.76%), followed by M46I (1.04%), M184V (0.46%), and K103N (0.39%). The most frequent TDR mutation in patients infected with CRF07_BC was M184V (0.39%), followed by T215S (0.20%) and Y181C (0.15%). The most frequent TDR mutation in subtype B was K103N (2.27%), followed by G190A (0.96%), Y181C/Y188L/M41L/D67N (0.84%), and M184V (0.72%). The prevalence of TDR mutations in each HIV-1 subtype is shown in Fig. 2.
The prevalence of HIV with TDR varied throughout the study period, it was 2.73% in 2011–2012 and 9.09% in 2000–2003, with no clear trend observed (OR = 0.94, 95% CI 0.86–1.04, p = 0.233). The prevalence of NRTI (OR = 0.76, 95% CI 0.66–0.89, p <0.001) and NNRTI-associated (OR = 0.84, 95% CI 0.73–0.98, p = 0.020) TDR slightly decreased from 2000 to 2016, while the prevalence of PI-associated (OR = 1.09, 95% CI 0.95–1.27, p = 0.233) TDR was stable during the same period.
However, when we examined the trends in a smaller window, the prevalence of total TDR was found to decrease from 2000 to 2010 (OR = 0.83, 95% CI 0.73–0.95, p = 0.006), but it increased after 2010 (OR = 1.50, 95% CI 1.13–1.97, p = 0.004). The prevalence of NNRTI-associated TDR decreased from 2000 to 2010 (OR = 0.77, 95% CI 0.64–0.94, p = 0.007), and increased after 2010 (OR = 1.91, 95% CI 1.13–3.18, p = 0.014). PI-associated TDR was found to slightly increase after 2006, with no statistical significance, but it increased after 2010, with marginal significance (OR = 1.44, 95% CI 0.99–2.09, p = 0.055) (Fig. 3).
Clusters and transmission fitness of TDR mutations
Finally, Cluster Picker identified 788 transmission clusters containing 3189 of 6578 patients (48.4%). Strains CRF01_AE and CRF07_BC had higher rates of phylogenetic clustering than the other subtypes, while the rate of clustering was higher in MSMs than in the other risk groups (Table 2).
Of these 788 clusters, 69 had one or more sequences with TDR mutations, comprising 111 of 274 (40.5%) TDR sequences. The network construction of these 69 transmission clusters is shown in Fig. S1. The clustering rate of sequences with TDR increased annually from 27.3% in 2005–2006 to 63.6% in 2015–2016 (p <0.001). Of these 69 TDR clusters, 52 had only one TDR sequence, 10 had two TDR sequences, and seven had three or more TDR sequences. Among the 17 clusters with at least two TDR sequences, 15 had the same TDR mutation, seven of which were M46L, followed by M46I (n = 2).
Overall, the sequences with TDR were less likely to form phylogenetic clusters than the sequences no resulting in TDR, even after adjusting for subtype and at-risk populations (40.5% vs. 48.8%, p = 0.023). The clustering rates of NNRTI- and NRTI-associated TDR mutations were significantly lower than those of the wild-type sequences (Table 3). Mutation M46L had the highest frequency in the clusters (n = 32), followed by M46I (n = 26), L210W (n = 7), M184V (n = 7), T215S (n = 5), and K103N (n = 4).
Of the 58 TDR mutations, seven had clustering frequencies that deviated significantly from the neutral expectation, most of which resulted in a significant reduction in transmission fitness (NNRTI and NRTI mutations: K103N, G190A, Y181C, K101E). Conversely, M46I had a relative fitness significantly greater than 1.0 (Table 3).
The spatiotemporal origins and epidemic dynamics of large TDR clusters
There were four large HIV with TDR clusters containing ≥ 5 sequences. All these sequences were from patients infected with strain CRF01_AE in the MSM risk group. We extracted the two largest clusters containing mutants M46I (cluster 300) and M46L (cluster 67) to implement Bayesian phylogeographic inference. The most recent common ancestors (tMRCAs) of clusters 67 and 300 were 2007.2 (95% HPD 2004.1–2008.5) and 2006.9 (95% HPD 2004.1–2008.5), respectively. The evolutionary rates of clusters 67 and 300 were 2.144×10−3 (95% HPD 1.075×10-3–3.213×10-3) and 2.933×10-3 (95% HPD 1.385×10-3–4.873×10-3), respectively. From the phylogeographical analysis, the potential transmission source locations for clusters 67 and 300 were Beijing [posterior probability (PP) = 0.80] and Liaoning (PP = 0.69), respectively (Fig. 4).
Interestingly, two other clusters, 65 and 68, together with clusters 62, 63, and 69, were adjacent to cluster 67 in the phylogenetic tree, and most of them shared the same TDR mutation, M46L. Analysis of the sequences from these clusters found that the TN93 pairwise genetic distances were < 3.0% for the majority of the sequences (Fig. S2). Therefore, we applied the same analyses to these taxa with the tMRCA estimated at 2002.9 (95% HPD 2000.4–2004.8) and its evolutionary rate estimated at 2.087×10-3 (1.467×10-3–2.818×10-3) while Beijing was estimated as the source location with the highest support (PP = 0.99) (Fig. 4). Analysis of the geographical migration as estimated from the Bayes factors indicated that the migration pathways, from Beijing to Hebei, Hubei, Hunan, and Guangdong, and from Hubei to Hunan and Shanghai, were all well supported for the M46L taxa, while in the large M46I cluster, viral migration from Beijing to Liaoning was well supported. Table 4 illustrates the best-supported pathway of migration of the M46I/L large clusters in China.