4.1 Study Methodology
In this study, genomic DNA Sanger sequencing was used to detect SNPs in the region [-687_+297] of IL1B in 14 H. pylori-infected patients. Then, computational analyses of the IL-1B promoter region [-687_+297] were applied in two steps: 1) in silico prediction of the promoter region and 2) in silico analysis of the predicted promoter region [−368_+10]. Furthermore, genotyping of IL1B-31 C>T polymorphism was performed using PCR-CTPP in 122 participants to study its association with the susceptibility to H. pylori infection in the Sudanese population. The methodology followed in this study is described in Figure 6.
4.2 Study setting and study population
This study was carried out at public and private hospitals in Khartoum state. The hospitals included Ibin Sina specialized hospital, Soba teaching hospital, Modern Medical Centre and Al Faisal Specialized Hospital. Sample size was calculated using Epi Info software version 7 (65, 66). The matched case-control formula was selected assuming 95% confidence level, 80% power of study, 1 ratio of control to case, 15 % of controls exposed, 3.36 odds ratio and 37.2 % of cases exposed. Based on the sample size calculation, a total of 122 individuals were recruited for this study.
The 122 participants had been referred for endoscopy. Out of that, 15 had gastric cancer, 27 had peptic ulceration, 61 had gastroduodenal inflammation, 10 had esophageal diseases, while nine showed normal upper gastroduodenoscopic features. The diagnosis of gastroduodenal diseases had been made by an experienced gastroenterologist during the upper gastrointestinal (GI) endoscopy procedure. While gastric cancer was diagnosed based on histology. Participants’ demographic and clinical data were obtained by a structured questionnaire, personal interviews, and a review of case records. The selection criteria included the Sudanese population from both sexes, no antibiotic or non-steroidal anti-inflammatory drugs (NSAIDs) uses. All the participants were informed with the objectives and purposes of the study and the written informed consents were taken. The demographic characteristics of participants is presented in Table 8.
Table 8. Demographic characteristic of participants
Variables
|
Total
n=122
|
H. pylori (+ve)
n=61
|
H. pylori (-ve)
n=61
|
P-value
|
Age years ± Std. Deviation
(range)
|
44.37±17.48
(15-89)
|
44.00±16.99
(15-85)
|
44.74±18.10
(17-89)
|
0.9184
|
Gender
|
Male
|
72 (59.02%)
|
36 (50%)
|
36 (50%)
|
1.0000
|
Female
|
50 (40.98%)
|
25 (50%)
|
25 (50%)
|
Residence
|
Urban
|
54 (44.26%)
|
24 (44.44%)
|
30 (55.56%)
|
0.3622
|
Rural
|
68 (55.74%)
|
37 (54.41%)
|
31 (45.59%)
|
4.3 DNA extraction
Gastric biopsies were collected in 400µl phosphate buffer saline (PBS). For histological examination, the biopsies were transported in formalin. DNA extraction was carried out by using innuPREP DNA Mini Kit (analytikjena AG, Germany) according to the protocol given by the manufacturer, as previously described in (67).
4.4 PCR amplification of specific 16S rRNA of H. pylori
The specific 16S rRNA gene of H. pylori was amplified by using the following primers (primers: F:5'-GCGCAATCAGCGTCAGGTAATG-3') (R:5'-GCTAAGAGAGCAGCCTATGTCC-3') (68). The PCR condition was previously described (69).
4.5 PCR amplification and sequencing of the IL1B promoter region
The IL-1B-511 and -31, promoter polymorphisms, were amplified using the following primers: F:5'- CATCCATGAGATTGGCTAG-3' and R:5'- AGCACCTAGTTGTAAGGAAG-3' (70). The cycling conditions were an initial denaturation at 94°C for 5min, followed by 35 cycles of 94°C for 1 min, 60°C for 1min and 72°C for 1 min, with a final extension at 72°C for 7 min. The amplified PCR product is 800 bp and was located between -687 bp upstream and +297 bp downstream of the IL-1B gene.
Out of 14 PCR products, of H. pylori-infected subjects, which have the clearest bands, were sent for DNA purification and Sanger dideoxy sequencing. Both DNA strands were sequenced commercially by Macrogen Inc, Korea.
4.6 Sequence analysis and SNPs detection
The sequencing results, two chromatograms for each patient (forward and reverse), were visualized, checked for quality, and analyzed using the Finch TV program version 1.4.0 (71). The nucleotide Basic Local Alignment Search Tool (BLASTn; https://blast.ncbi.nlm.nih.gov/) was used to assess nucleotide sequence similarities (72).
To determine the SNPs in the IL-1B promoter region, multiple sequence alignment (MSA) for tested sequences with a reference sequence (NG_008851) were performed by using BioEdit software (73).
4.7 Bioinformatics analysis of the IL-1B promoter region in H. pylori-infected subjects
4.7.1 in silico prediction of the promoter
The crucial element for initiating and regulating messenger RNA transcription is the promoter sequence which is generally located in the 5’ upstream region of a structural gene (44). Promoters have complex and specific architecture, and contain multiple TFs involved in specific regulation of transcription (74). Different features of a promoter region may have different power for promoter identification (49), therefore, we applied a variety of programs for prediction of promoter regions in order to obtain accurate results for subsequent experimental proof. These programs include: (1) Promoter 2.0 Prediction Server (http://www.cbs.dtu.dk/) which takes advantage of a combination of elements similar to neural networks and genetic algorithms to recognize a set of discrete sub-patterns with variable separation as one pattern: a promoter (75); (2) Neural Network Promoter Prediction (NNPP2.2) (http://www.fruitfly.org/) which applying multiple hidden layers and time-delay neural networks (TDNNs) for promoter annotation (76); (3) TSSW (http://softberry.com/) that uses functional motifs from the Wingender et al. database (77) and linear discriminant function combining characteristics describing function motifs and oligonucleotide composition of these sites (78); (4) TSSG program (http://softberry.com/) program that uses the same approach of TSSW but the TFD database of functional motifs (79); (5) Fprom program (http://softberry.com/) which is TSSG variant with different learning set of promoter sequences (49).
4.7.2 In silico analysis of the predicted promoter region
4.7.2.1 Assessment for the presence of promoter associated features
In silico predicted promoter region was additionally assessed for the presence of promoter associated features, including promoter-associated histone marks, broad chromatin state segmentation, transcription factor ChIP-seq, and DNase I hypersensitivity clusters, using the ENCODE data (https://epd.epfl.ch/cgi-bin/get_doc?db=hgEpdNew&format=genome&entry=IL1B_1) (45-47).
4.7.2.2 Prediction of CpG Islands
A CpG island is often regarded as a marker for the initiation of gene expression. It is a segment of DNA with high GC and CpG dinucleotide contents which is located in the 5’ UTR (untranslated regions) of genes. In this study, MethPrimer(44, 80) and GpC finder software(http://www.softberry.com/berry.phtml?topic=cpgfinder&group=programs&subgroup=promoter) were employed to predict CpG islands in the promoter. CpG finder is intended to search for CpG islands in sequences, while MethPrimer is developed to design PCR primers for methylation mapping and primers are picked around the predicted CpG islands. CpG islands are predicted by using a simple sliding window algorithm to examine the GC content and the ratio observed/expected (Obs/Exp) across the sequence. The search parameter values for the software were CpG island length >200 bp, CG% > 50%, and Obs/Exp > 0.6.
4.7.2.3 Prediction of Transcription Factor Binding Sites (TFBSs)
One of the important steps in the chain of promoter analytical events is the prediction of the potentially functional TFBSs. Protein binding sites in a promoter represent the most important elements and the corresponding proteins are called transcription factors (TFs). In this step, the promoter region was analyzed for possible TFBSs using five prediction software. (1) Alggen Promo (http://alggen.lsi.upc.es/cgi-bin/promo_v3/) in which positional weight matrices (PWM) are constructed from known binding sites extracted from TRANSFAC (38) and used for the identification of potential binding sites in sequences (81, 82). (2) AliBaBa2 (http://www.gene-regulation.com/) which works based on the assumption that each binding site has an unknown context that determines its sequence and this leads to a construct of specific matrices for each sequence we are analyzing. And to do so a context-specific process starting at a dataset of known binding sites and ending with the identification of a potential new binding site (83). (3) Gene Promoter Miner (GPMiner) (http://GPMiner.mbc.nctu.edu.tw/) which is an integrated system that identifies promoter regions, regulatory elements and DNA stability by incorporating the support vector machine (SVM) with nucleotide composition features, over-represented hexamer nucleotides, and DNA stability. For predicting TFBSs, MATCH tool (84) was utilized to scan TFBSs in an input sequence using the TF binding profiles from TRANSFAC public release version 7.0 (85) and JASPAR (86, 87). (4) TF-Bind (http://tfbind.hgc.jp/) which uses positional weight matrices (PWMs) and Bucher’s calculating method (88) to calculate the matching score between an input sequence and a set of known TF binding sites. To estimate TF binding sites, a robust cut-off value determining algorithm was proposed using the background rate estimated on non-promoters sequences (89). (5) Tfsitescan (http://www.ifti.org) which is an object-oriented transcription factors database (ooTFD)-retrieval tool that is used for transcription factors sites analyses. It constructs an image-map in association with sequence analysis results which is linked to individual sites entries (90).
4.7.2.4 Prediction of composite regulatory elements (CEs)
CE is the minimal functional unit, which can provide combinatorial transcriptional regulation of gene expression. Structurally, a CE consists of two closely located DNA binding sites (BSs) for distinct transcription factors. But its regulatory function is qualitatively different from regulation effects of either individual DNA binding sites. In this study, we identified the composite regulatory elements in our region by using MatrixCatch algorithm (http://gnaweb.helmholtz-hzi.de/cgi-bin/MCatch/MatrixCatch.pl). The basic idea of MatrixCatch is to recruit data collected for respective binding sites separately from each other in order to complement the lack of knowledge on sequence variation of each DNA BS in CEs, and such information is compiled in position weight matrices (PWMs). The CE model consists of two PWMs, as well as their minimal scores, relative orientation and distance. Moreover, MatrixCatch is supplied with a library of 265 matrix models used for recognition which represents the widest scope of known CEs available to date (91).
4.7.2.5 Comparative analysis
Promoter region was analyzed for possible conservation using the ECR Browser (http://ecrbrowser.dcode.org) (92), NCBI BLASTn (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and ClustalW(https://www.genome.jp/tools-bin/clustalw). Conservation was assessed in 11 species: chimpanzee (Pan troglodytes), rhesus monkey (Macacamulatta), mouse (Mus musculus), rat (Rattusnorvegicus), dog (Canisfamiliaris), cow (Bostaurus), opossum (Monodelphisdomestica), chicken (Gallus gallus), frog (Xenopuslaevis), zebrafish (Danio rerio), fugu pufferfish (Takifugurubripes), and spotted green pufferfish (Tetraodon nigrovoridis).
Also, conservation of SNPs was evaluated and the possible conservation of TFBSs at these SNP locations was screened with Multiple-sequence local alignment and visualization (Mulan) search engine (https://mulan.dcode.org/) (93).
4.8 Detection of the IL-1B-31 C/T polymorphism using PCR with confronting two-pair primer (PCR-CTPP)
For detection of the IL-1B-31 polymorphism, PCR-CTPP was applied. The primers for the C allele were (F:5′-ACT TCT GCT TTT GAA GGC C-3′) and (R:5′-TAG CAC CTA GTT GTA AGG A-3′); and those for the T allele were (F:5′-AGA AGC TTC CAC CAA TAC T-3′) and (R:5′-CTC CCT CGC TGT TTT TAT A-3′) (94). 1 µl of extracted DNA was used in a 25 µl reaction mixture with a prepared Maxime PCR PreMix Kit (i-Taq) (iNtRON BIOTECHNOLOGY, Seongnam, Korea), 23 μl of de-ionized sterile water, 0.25 μl of each primer. PCR conditions were as follow: 5 min of initial denaturation at 94°C, followed by 25 cycles of 1 min at 94°C, 1 min at 54°C, and 1min at 72°C, and a 5 min final incubation at 72°C. The PCR products were visualized by electrophoresis on a 2% agarose gel stained with ethidium bromide. Genotyping was performed as follows; 240, 155 bp for CC genotype, 240, 155, 122 bp for CT genotype, and 240, 122 bp for TT genotype (94).
4.9 Statistical analysis
Deviations from Hardy-Weinberg equilibrium in control were examined by χ2test. According to prevalence of H. pylori infection, differences in distribution by age were assessed by Mann-Whitney test, while differences in distribution by categorical variables were examined by χ2test or Fisher's test. Odds ratios (ORs) were calculated and reported within the 95% confidence intervals (CIs). P<0.05 was considered to be statistically significant. The statistical analyses were performed using the GraphPad Prism 5.