H. pylori has a complex and long-term coexistence relationship with humans, and is closely related to the occurrence of GI and GC [15, 16]. Whether the pathogenicity of H. pylori has population, source, gene specificity, and whether there is a specific highly pathogenic strain has been a research hotspot in the field [17, 18]. Hence, it is critical to identify the key factors of strain pathogenicity and to understand the regional highly pathogenic strains. Here, we sequenced 12 H pylori isolates of derived from GC or GI patients and performed whole genome–based comparative analysis to investigate and understand the genetic structure and genetic differences between GC and GI strains.
East Asia has always been a high incidence area of GI and GC, and H. pylori is highly correlated with the occurrence of GI and GC [19, 20]. In this study, 12 H. pylori strains from GI and GC patients in China were selected as research objects, and it is representative to explore the pathogenicity and genetic differences of the two cohort strains. Meanwhile, we selected strains representative of other regions of the world from public databases as references [15]. Therefore, the results of our study can reflect the homology and difference between the strains of GI and GC, and with other reference strains to a certain extent.
In the present study, we used next-generation sequencing technology to sequence the whole genome of H. pylori strains that were isolated from GI and GC patients in Shanghai (China). The genome sequence of GC strain H. pylori consists of a ring chromosome with an average total length of 1,579,639 bp and average coding genes sequence (CDS) of 1594. Simultaneously, the average genome size and number of CDS in GI strains was lower than that of GC group (size: 1,569,508 bp, CDS: 1589). Previous studies have reported that the whole genome sequence size of Helicobacter pylori is about 1.5–1.9 Mb, and the coding genes are about 1600 [21, 22]. Additionally, VFs of H. pylori from GC and GI, although with high consistency. However, cagA gene sequence was found in GC strains compared with GI strains, which may indicate higher virulence and aggressiveness of GC strains. H. pylori strains with cagA-positive are associated with acute GI, peptic ulcer and GC [23]. First reported in 1995, infection with cagA-positive strains increases the risk of stomach cancer by at least an order of magnitude more than with cagA-negative strains [24].
In this study, the results of SNP evolutionary tree first displayed that the 12 strains from GC and GI patients in china, were not uniform in their H. pylori lineage. The intersections of GC and GI H. pylori lineages sequenced in this study are consistent with the co-evolution of H. pylori lineages in Japan, especially strains from GI patients. In addition, phylogenetic tree based on homologous proteins and visualization of gene island prediction did not distinguish H. pylori strains from GC and GI. The observation of two distinct lineages has been also reported previously. Previous genome-wide association studies of strains of hp-East Asia by Japanese scholars have shown that differences in SNPs between strains of GC and duodenal ulcer can be detected, and potential pathogenic mechanisms such as charge changes in ligand-binding pockets, changes in subunit interactions, and pattern switching DNA methylation have been proposed [25]. Yamaoka et al found that virulence genes in the genomes of CHC155 (GC) and VN1291 (Duodenal ulcer) strains are a key and risk factor for H. pylori pathogenicity [26]. Another study based on comparing GC and duodenal ulcer H. pylori infection suggests that vacA genotype status will help to identify patients at high risk for GC development [8]. The comparison results of strains VF described above in this study also found that the cagA status of H. pylori strains in GC and GI was different, which may be an important genomic feature of the two cohort strains.
Subsequently, the results of pan-genomic and ANI analyses suggested that GC, GI and other reference H. pylori strains had high homology consistency. This is similar to previous research results. H. pylori lineages are related to geographical regions and can be divided into 7 lineages: HP-Africa1, HP-Africa2, HP-Sahul, HP-Europe, HP-Asia2, HP-Amerind and HP-East Asia [15]. But the expectation is that these coexisting lineages will converge over time due to the plasticity of strains at the genomic level [16]. In addition to genome homology, the gene function annotation results of COG, GO and KEGG further suggested that the H. pylori strains of GC and GI also had high similarity in gene function, and their specific gene functions were mainly concentrated in the process of metabolism, transcription and repair. Previously, the gene annotation for other regions strains were similar to our results [10, 27].
There are certain limitations to our study. First of all, due to the limited sample size in the study design and the regional nature of the samples, the obtained research results need to be further confirmed by large sample size and multi-center studies. Secondly, the reference genome in the study only selected a few representative strains from each continent, which may also have a certain selection bias. Thirdly, the differences between VF and SNP in strains of GC and GI have not been further explored, nor have relevant basic experiments been designed for verification, and more detailed answers need to be provided in future studies.