During the period of this study, we collected most strains from areas with high and low reported incidence of tuberculosis. The data could therefore be representative of genetic and demographic diversity, at least to some extent.
In this study, Beijing family strains were the dominant genotype in both cold and hot spot areas as confirmed by whole genome sequencing. Beijing family strains, which were first reported in 1995, have spread worldwide [19]. These strains originated in Beijing and Mongolia, have highly conserved spoligotyping patterns, and characteristic IS6110 RFLP patterns of Mtb isolates [20, 21]. In addition to the biological characteristics of high multidrug-resistance rate [22–24], the Beijing family strains may also have higher virulence compared to the other genotype strains. A study conducted in The Gambia suggested that patients infected with the Beijing family strain were more likely to progress to disease than those infected with Mycobacterium africanum [25]. However, the sample size of that study was small and other M. tb complex genotypes among the control group were less clear. Interestingly, from animal models, there is clear evidence that the expression of proteins, glycolipids and triglycerides in the Beijing strain is altered, which may contribute to increased pathogenicity. In our study, the proportion of Beijing family strains in hot spots was significantly higher than that in cold spots (P < 0.05). This suggests that the virulence of strains in hot spots is higher than those in cold spots.
In addition to the ratio of genetic makeup, there were also statistically significant differences in the distribution of age, ethnicity, income, BMI, history of TB patient contact and migration history between participants from cold and hot spots. A study performed in The Netherlands found that disease transmission was higher in younger aged people [26], a finding that contrasted our study. In our study, the proportion of elderly TB cases was higher in areas with a high TB incidence. This phenomenon may be related to the socio-economic status of different regions. Income is an important indicator of economic level. The hot spot areas in this study corresponded to regions with a low economic level. And the proportion of low-income TB patients in the hot spots were significantly higher than those in the cold spots (91.1% vs 72.5%, p < 0.05). The results are consistent with a survey fomr the United States [27]. As an independent predictor of TB incidence, it remains to be seen whether socioeconomic factors or immune factors influence the spread and development of TB. We hypothesize that the high polymorphism of strains in cold spots might be related to the high frequency of travelling among this population.
Population genetic structure comparison is the main method used to determine whether two populations evolve independently or have gene exchange [28, 29]. Because M. tb is a highly conserved and differentiated species, the molecular structure differences of its mutant subgroups are relatively small. Through comparison of population genetic structure differences, this study found that the average population structure difference coefficient of strains collected in hot spots and cold spots was only 0.019. In particular, the dominant strain (Beijing family) in this study is more evolutionary conserved than other Mtb lineages, and thus less likely to undergo recent mutations [30]. However, some specific SNP locations showed significant population differences. The gene locus of these positions included Rv1186c, Rv3900c, Rv1508c and Rv0210, where Rv1508c belongs to the fragment of DR4. it has been mentioned that the knock-in of RD4 can improve the protective efficacy of the BCG vaccine. However, the effect of this region deletion on the pathogenicity of M. tb and its clinical phenotype still needs further research. However, this study found that the mutation rate of Rv1508c in hot spots was higher than that in cold spots (58.5% vs 34.0%) which conflicts with other studies. However, after adjusting for other factors, the difference was not statistically significant. The proportion of Rv1186c mutation in hot spots was significantly lower than that in cold spots. The product of this gene is a conserved protein called PruC. M.tb is an obligate aerobic bacterium. However, it has shown remarkable metabolic flexibility, being able to survive and metabolize for a long time without oxygen[31]. A study showed that M.tb can grow on carbon - and energy-derived proline under hypoxia conditions and is regulated by a unique transcription factor (PruC) [32]. Thus, mutations in this gene indicates the immune escape and changes in pathogenicity that affect the transmissibility of M.tb.
The global persistence of M.tb infection over a long period of time suggests that there is a strong evolutionary pressure for the interaction between host and pathogen genomes [33–35]. Ethnic differences or susceptibility gene polymorphisms interacted by M.tb play a role in the development of TB [36, 37]. The GG genotype of IL-17 rs2275913 in the Spanish population is associated with a high risk of tuberculosis [38], while the CC genotype of rs763780 in the Chinese population increases the risk of tuberculosis [39]. In this study, the proportion of Zhuang population was not only significantly higher in TB hot spots, but also had a synergistic effect with Rv1186c and Rv0210 genes mutation through interaction effect analysis. There is therefore reason to suspect that certain genetic polymorphisms in the Zhuang population are associated with susceptibility to tuberculosis. Of course, further biochemical and immunological evidence is needed to confirm this hypothesis. In addition, the results of this study suggest that there is a strong negative interaction between income and ethnic group. In areas with high rates of TB, minorities have lower income levels. In addition to socioeconomic factors, we also need to consider the impact of nutritional deficiencies on the TB development.
In conclusion, we found significant evidence for an association between the SNP difference of Mtb, host environment and TB epidemic. Our data suggests a statistical significant role of ethnicity, income and the polymorphisms of Rv1186c and Rv0210 genes in the transmission and development of M. tb. The results provide clues for the study of susceptibility genes of M.tb in different ethnic groups.
Supplementary file: The questionnaire used in this study