3.1 General situation
Totally, 291 TB patients and their isolates were included in the study. There were 147 isolates in the 3 hot spot areas and 144 in the 3 cold spot areas. Through whole genome sequencing and molecular typing, the dominant strains in both areas belonged to the Beijing family (SPOLIGO typing, SITVIT2 Database). However, the proportion of Beijing family strains in hot spots was significantly higher than that in cold spots (64.6% vs 50.7%, p = 0.022). Other genotypes included T1, T2, T3, H, H3 and LAM which belong to the Euro-American lineage.
3.2 Comparison of demographic characteristics
As shown in Table 1, elderly, ethnic minorities (91.5% were Zhuang), those with low income, low BMI and a history of contact with a former TB patient were the predominant characteristics of TB patients in hot spot areas (p < 0.05). Individuals from cold spot areas had a higher frequency of out-of-town travelling (p < 0.05). However, due to the obvious imbalance of the ethnicity and economic levels between cold and hot spots, the variables of ethnic and income are not comparable [20].
Table 1. Comparison of demographic characteristics of in TB cold and hot spots
Variables
|
Cold spot (n,%)
|
Hot spot (n,%)
|
P-value
|
Age_group
|
|
|
|
< 30
|
35 (24.3)
|
10 (6.8)
|
< 0.001
|
30-49
|
40 (27.8)
|
47 (32.0)
|
|
≥ 50
|
69 (47.9)
|
90 (61.2)
|
|
|
|
|
|
Gender
|
|
|
|
Male
|
114 (79.2)
|
113 (76.9)
|
0.74
|
Female
|
30 (20.8)
|
34 (23.1)
|
|
|
|
|
|
Ethnicity
|
|
|
|
Han
|
139 (96.5)
|
5 (3.4)
|
< 0.001
|
Others†
|
5 (3.5)
|
142 (96.6)
|
|
|
|
|
|
Income (yuan)
|
|
|
|
<3,000
|
100 (72.5)
|
133 (91.1)
|
< 0.001
|
3,000-4,999
|
21 (15.2)
|
12 (8.2)
|
|
≥5,000
|
17 (12.3)
|
1 (0.7)
|
|
|
|
|
|
Migration
|
|
|
|
No
|
110 (76.4)
|
129 (87.8)
|
0.017
|
Yes
|
34 (23.6)
|
18 (12.2)
|
|
|
|
|
|
TB patient contact history
|
|
|
|
No
|
130 (91.5)
|
118 (80.8)
|
0.014
|
Yes
|
12 (8.5)
|
28 (19.2)
|
|
|
|
|
|
BMI
|
|
|
|
<18
|
11 (7.6)
|
39 (26.5)
|
< 0.001
|
18-24.99
|
127 (88.2)
|
100 (68)
|
|
≥25
|
6 (4.2)
|
8 (5.4)
|
|
|
|
|
|
BCG vaccination
|
|
|
|
No
|
18 (12.5)
|
23 (15.6)
|
0.624
|
Yes
|
112 (77.8)
|
113 (76.9)
|
|
Unknown
|
14 (9.7)
|
11 (7.5)
|
|
|
|
|
|
History of DM
|
|
|
|
No
|
122 (84.7)
|
111 (75.5)
|
0.134
|
Yes
|
21 (14.6)
|
33 (22.4)
|
|
Unknown
|
1 (0.7)
|
3 (2)
|
|
|
|
|
|
Drinking status
|
|
|
|
Never
|
98 (68.1)
|
87 (59.2)
|
0.289
|
Former
|
28 (19.4)
|
36 (24.5)
|
|
Current
|
18 (12.5)
|
24 (16.3)
|
|
|
|
|
|
Smoking status
|
|
|
|
Never
|
73 (50.7)
|
70 (47.6)
|
0.696
|
Former
|
48 (33.3)
|
48 (32.7)
|
|
Current
|
23 (16.0)
|
29 (19.7)
|
|
Status of MDR
|
|
|
|
Yes
|
2
|
6
|
0.296
|
No
|
142
|
141
|
|
† More than 95% were from the Zhuang ethnic minority group. BCG: Bacillus Calmette–Guérin. BMI: Body mass index. DM: Diabetes mellitus. TB: Tuberculosis. MDR: Multi-drug resistance.
3.3 Population genetic structure differences
Figure 1 shows a fixed index frequency distribution of mutations in cold and hot spots for various tuberculosis strains. The distribution represents the information of 14,250 mutated gene loci in two regions extracted from the SNPs file for molecular variance analysis. After filtering and Weir-Cockerham weighting, the average fixed index of the two groups was 0.019462. The fixed index of 5 SNP sites (1328687, 4386228, 3847237, 1699849, 251575) was greater than 0.1, indicating that the mutation difference in these sites between the two populations was significant. Table 2. shows the gene locus and their gene products of these SNP sites. Just one SNP site (3847237) locates in the Intergenic Region.
As shown in Table 3., all the mutation of SNP site showed significant difference between TB cold and hot spots. The proportion of 1328687 (Rv1186c) mutation was significant high in cold spots (OR=0.32, 95%: 0.2-0.52) , and the proportion of 4386228 (Rv3900c) (OR=2.79 , 95%: 1.74-4.5), 3847237 (IGR) (OR=2.84, 95%: 1.75-4.62), 1699849(Rv1508c) (OR=2.73, 95%: 1.7-4.) and 251575(Rv0210) (OR=2.65, 95%: 1.65-4.27) was significant high in hot spots.
Table 2. The information of 5 SNP sites with high fixed index
|
Reference Sequence
|
SNP Position
|
F
|
Gene Locus
|
Gene product
|
AL123456.3
|
1328687
|
0.133141
|
Rv1186c
|
Conserved protein
|
AL123456.3
|
4386228
|
0.11225
|
Rv3900c
|
Conserved hypothetical alanine rich protein
|
AL123456.3
|
3847237
|
0.111604
|
IGR*
|
|
AL123456.3
|
1699849
|
0.107462
|
Rv1508c
|
Probable membrane protein
|
AL123456.3
|
251575
|
0.100608
|
Rv0210
|
Hypothetical protein
|
* IGR: Intergenic Region
Table 3. Comparison of high-difference SNP sites
SNP locus
|
Cold spot(n,%)
|
Hot spot(n,%)
|
Odds Ratio (OR)
|
P-value
|
1328687(Rv1186c)
|
|
|
|
|
None(Ref.)
|
46 (31.9)
|
87 (59.2)
|
0.32 (0.2,0.52)
|
< 0.001
|
Mutation
|
98 (68.1)
|
60 (40.8)
|
|
|
|
|
|
|
|
4386228(Rv3900c)
|
|
|
|
|
None(Ref.)
|
85 (59)
|
50 (34)
|
2.79 (1.74,4.5)
|
< 0.001
|
Mutation
|
59 (41)
|
97 (66)
|
|
|
|
|
|
|
|
3847237 (IGR)
|
|
|
|
|
None(Ref.)
|
103 (71.5)
|
69 (46.9)
|
2.84 (1.75,4.62)
|
< 0.001
|
Mutation
|
41 (28.5)
|
78 (53.1)
|
|
|
|
|
|
|
|
1699849(Rv1508c)
|
|
|
|
|
None(Ref.)
|
95 (66)
|
61 (41.5)
|
2.73 (1.7,4.4)
|
< 0.001
|
Mutation
|
49 (34)
|
86 (58.5)
|
|
|
|
|
|
|
|
251575(Rv0210)
|
|
|
|
|
None(Ref.)
|
81 (56.2)
|
48 (32.7)
|
2.65 (1.65,4.27)
|
< 0.001
|
Mutation
|
63 (43.8)
|
99 (67.3)
|
|
|
3.4 MDR analysis of gene-host environment interaction
To test the gene-host environment interaction, the 5 SNP locus mentioned above and factors with significant differences between the two spots were included in the model. But ethnicity and income are excluded because of their incomparability. Table 4. summarizes the cross-validation consistency and prediction error through multifactor dimensionality reduction for each mutation of high-difference SNP sites and host factors. “Rv0210-BMI” model had a maximum testing accuracy of 61.2% and a maximum cross-validation consistency (8/10, P < 0.0001). Figure 2. exhibits three combinations associated with high risk and low risk for the “Rv0210- Age groups-BMI” model. From the distribution of high and low risk factors/SNP sites mutations, we can see that a strain with a mutation of Rv0210, and its host belonging to the normal BMI and low age group was more likely to be in cold spot area.
A hierarchical interaction graph based on all alternative models is shown in Figure 3A. It displays a clear negative interaction effect of age groups with BMI (interaction entropy: -3.55%) and mutation of Rv0210 (interaction entropy: -2.39%). Through the mutations of Rv0210 and BMI had a low independent effect (interaction entropy: -1.46%).
An interaction dendrogram is shown in Figure 3B and shows that age groups and BMI were located on the same branch. These two factors were estimated to have the strongest redundancy interaction, as indicated visually by the blue line. The mutation of Rv0210 was on a different branch, demonstrating a weak redundancy interaction with other factors.
Table 4. The best model for predicting the likelihood of M. tb strains appearing in hot or cold spots
Best Model
|
Training accuracy (%)
|
Testing accuracy (%)
|
CVC
|
Χ 2
|
p value
|
Rv0210
|
63.8
|
59.8
|
8/10
|
21.75
|
< 0.001
|
Rv0210, BMI
|
67.4
|
61.2
|
8/10
|
35.47
|
< 0.001
|
Rv0210, Age groups, Ethnicity
|
70.1
|
60.5
|
4/10
|
44.04
|
< 0.001
|
CVC: Cross-validation consistency