Cluster 113[14] was selected as the source of MTB closely related variants with differential transmission efficacy. We first updated its composition, incorporating WGS data of the new cases identified after its initial description in 2017, in a population-based MIRU-VNTR systematic analysis including all the TB cases in the province during fourteen consecutive years[14]. The updated cluster network (Fig. 1) showed 19 cases, mainly Moroccan migrants (11 cases) and a limited representation of cases from Spain (six cases), Mali (one case), and Nigeria (one case). The network distributed the cases in seven branches, despite sharing identical MIRU-VNTR patterns. The branch with most cases involved Variant 1 (9 cases with 0–1 SNPs pairwise distances, most likely due to uncontrolled recent transmission). Four branches (involving Variants 2–5) were dead-ends, i.e., single cases not causing secondary cases. The remaining three branches (variants 6–8) represented, self-limiting, short transmission events with only two cases.
We tested the behaviour of two representatives for the actively transmitted Variant 1 (Morocco-2010, index case, and Morocco-2014) in an in vitro macrophage infection model (Fig. 2). Variants 2, 5, and 8-2010 (differing 19–22 SNPs from the successful variant) were selected as non-transmissible controls, which had had the opportunity to be transmitted; the corresponding cases were diagnosed in 2010, 2015 and 2016, respectively, but did not cause any secondary case (Fig. 2).
We first assessed dissimilarities in invasiveness. No statistical differences were found between Variant 1 and the non-transmitted control variants (Fig. 3a).
Next, we evaluated their intramacrophage multiplication rates. We observed a statistically significant superiority of Variant 1 based on intracellular growth (Fig. 3b). The multiplication ratios for Variant 1 were 2–4 folds higher than the obtained for the non-transmitted variants (ratios ranging from 5.3 to 10.7 vs from 1.5 to 3.95, respectively; p = 0.0244).
Certain outbreak strains, namely CDC1551, demonstrated high transmissibility but were not found to be more virulent[21, 22]. For Cluster 113, the higher intramacrophage multiplication observed for Variant 1 led us to a further detailed analysis to determine if the differential SNPs found between this variant and the control ones could explain that behaviour. Among the nine specific SNPs identified exclusively in Variant 1 (Table 1), seven were homozygous calls for all isolates, but the remaining two were heterozygous in the index case and were thus fixed in the rest. Two SNPs were located in intergenic regions, which could play a role, if they relate to regulatory elements. For those intragenic, we focused on the (four) corresponding to non-synonymous mutations. Two of them mapped on genes without a described function (the Rv0090 and Rv1517 hypothetical proteins), while the other two may have a functional meaning (Rv0001, alias dnaA, and Rv1028c, alias kdpD).
Table 1
Specific SNP annotation for the successfully transmitted Cluster 113 variant
SNP position | Nucleotide change | Allele freq index* | Gene | Gene coordinates | Gene direction | Essential character | Protein length | Aminoacid change | Function |
1163 | C > T | 0.29 | Rv0001 (dnaA) | 1… 1524 | + | essential | 507 | T > I | chromosomal replication initiation protein |
1623635 | G > A | 0.5 | Rv1444c | 1623287… 1623697 | - | non-essential | 136 | Synonymous | hypothetical protein |
99141 | T > G | > 0.9 | Rv0090 | 98480… 99250 | + | non-essential | 256 | L > R | hypothetical protein |
1151418 | C > T | > 0.9 | Rv1028c (kdpD) | 1149104… 1151686 | - | non-essential | 860 | R > Q | sensor protein KdpD |
1163957 | C > T | > 0.9 | IG1057 (Rv1040c-Rv1041c) | 1163377… 1164571 | | | | | |
1709423 | G > T | > 0.9 | Rv1517 | 1708871… 709635 | + | non-essential | 254 | G > C | hypothetical protein |
2142832 | C > T | > 0.9 | Rv1895 | 2142521… 2143675 | + | non-essential | 384 | Synonymous | possible dehydrogenase |
3659433 | G > A | > 0.9 | Rv3276c (purK) | 3658635… 3659924 | - | non-essential | 429 | Synonymous | phosphoribosylaminoimidazole carboxylase ATPase subunit |
4314130 | T > C | > 0.9 | IG3905 (Rv3840-Rv3841) | 4313981… 4314177 | | | | | |
* Allele frequency in all secondary cases was > 0.9 |
dnaA encodes a key protein for triggering the chromosomal replication machinery at oriC through interactions with the DnaA-boxes, which assembles a nucleoprotein complex responsible for the ATP-dependent opening of the double strand of DNA[23]. The differential SNP found between this variant and the control ones maps at residue 388 of the DNA binding domain of DnaA[24] and causes a threonine (polar uncharged amino acid) to isoleucine (hydrophobic amino acid) substitution. Although modelling data are required to interprete the meaning of this substitution, we might speculate that the nature of this substitution may improve DnaA-oriC interactions. The fact that the SNP mapping in dnaA was initially detected in heterozygosis in the index case and was then fixed in all secondary cases, may reinforce the advantageous character of this substitution.
kdpD is one of the two genes that constitute the kdpDE operon, one of the scarce two-component regulatory systems (2CRSs) found in MTB, and it encodes its sensor protein[25]. It has been shown that strains with deletions in kdpDE, and in some other MTB 2CRSs, increase their virulence in an immunodeficient mouse model[26]. Folkvardsen et al.[15] identified a specific SNP in a 2CRS from a strain responsible for one of the major transmission events worldwide -currently active- not present in a closely related variant that caused few secondary cases. This SNP is located in the tcrY[15] gene, which also codes for the sensor TcrXY protein of a 2CRS system. It has also been proven that deletions in this 2CRS system results in a hypervirulent phenotype in SCID mice[26]. We must admit that several of the observations related to the virulence of mutants in 2CRS systems were obtained from immunocompromised models and might not be transferable to an inmunocomponent system, such as the cases from which Variant 1 was obtained.
The KdpDE 2CRS system is involved in the detection and response of pH and K + changes (related with turgor pressure, regulation of cytoplasmic pH, osmolarity, transmembrane electrical potential, etc.) by modulating the expression of a K + transport system[27]. This system may have a role in MTB efficiency to survive acidic environments, e.g., phagosome or autophagosome vacuoles[27]. The specific SNP harboured by Variant 1 at the N-terminal domain of this protein implies an arginine (positively charged amino acid) to glutamine (polar uncharged amino acid) substitution. The domain interacts with two lipoproteins, LprF and LprJ, to form a ternary complex that modulates the sensing ability of KdpD[28]. Thus, the substitution described for Variant 1 may affect the binding affinity of Lpr and the subsequent expression of the K + transport system, and consequently the modulation of pH and K + uptake. However, these are mere hypothesis that require further validation/evidence.
In summary, we have identified two specific SNPs in Variant 1, located in dnaA and kdpD, not found in other closely related non-transmitted variants. The changes caused by these SNPs may be associated to the higher infectivity shown by Variant 1 in macrophages. We cannot fully exclude the role of other potentially undetected mutations. This is because repetitive regions, including, among others, the PE and PPE genes, must still be systematically excluded from any Illumina-based WGS analysis. It causes the existence of black boxes with unknown diversity[29], in regions known to be essential for MTB virulence and interaction with the host immune system[30].
In addition to environmental and host-related factors, our study suggests that bacterial factors must also be considered when aiming to understand major transmission events. WGS analysis coupled with a simple in vitro infection model may provide a rapid screening platform of closely related variants with a differential transmission success. Once the involvement of bacterial factors is shown, additional studies on the candidate genes and polymorphisms should be required to more thoroughly characterize them and finally define determinants of MTB virulence, transmissibility, or evasion of the immune system. Targeting these SNPs with tailored allele-specific PCRs may allow fast tracking of strains requiring special surveillance.