Here we present the highlights findings from the most studied phenotypes in this population. In order to well characterize the population-dependent genetic markers in line with precision medicine, we present the aims of studying the presented phenotypes and their findings.
- Blood pressure
Overall aim and focus areas
High blood pressure (BP), the most frequent risk factor for cardiovascular and kidney diseases, is the leading cause of morbidity and mortality worldwide[9]. High BP is affected by both genetic and environmental factors. To figure out the genomic aspects of BP, we performed familial aggregation analysis, heritability analysis[10], family-based linkage study[11], GWAS, epistasis analysis, and polygenic risk score (PRS) estimation [12]
Key methods and data collection
All study subjects and average systolic blood pressure (SBP) and diastolic blood pressure (DBP) during follow-up visits were considered in the subsequent analysis. The main outcome of interest was hypertension (HTN) incidence and corresponding SBP and DBP in three age groups (children: 1-9 years, adolescents: 10-17 years, and adults: ≥18 years.) and SBP³140 mmHg or DBP ³90 mmHg or taking antihypertensive treatment [13] were considered as HTN in adults. We used the Expectation-Maximization method with the Bootstrapping (EMB) approach by Amelia package in R to analyze quantitative and binary traits after imputing missing values for BMI and waist circumference (WC). Also, we used the regression-based two-level Haseman-Elston (HE) method by SAGE for linkage analysis of familial risk factors among index families (2 or more hypertensive cases in at least two generations), including BMI, (WC), and SNPs, which were previously reported in the GWAS studies responsible for BP quantitative and binary traits in diverse populations.
Main results
Our findings in Iranian families showed the SBP, DBP, BMI, and WC were highly correlated in the inter-class of mother-offspring and intraclass of sister-sister with heritability of 30% and 25% for DBP and SBP, respectively, during a follow-up of 15 years. Among index families, members with higher familial BMI or WC had significantly increased risk of hypertension, consistent, strong AGT gene signals linked with SBP and DBP, NLGN1 gene linked with SBP and HTN, and epistasis of TNXB gene and known genetic variants linked with all BP traits. In the GWAS analysis, we identified consistent signals on ZBED9 gene associated with HTN in the genome-wide borderline threshold after adjusting for different environmental predictors. Our finding on ZBED9 gene was confirmed for all BP traits by linkage analysis in an independent sample. The single-locus analysis identified two missense variants in ZBED9 (rs450630) and AGT (rs4762) associated with hypertension. The G allele of rs450630 showed an antagonistic effect on hypertension, but interestingly, IGENT analysis revealed significant epistasis effects for different combinations of ZBED9, AGT, and TNXB loci in the further analysis[14].
2. Diabetes
Overall aim and focus areas
Type 2 diabetes (T2D) is one of the major life-threatening diseases, accounting for the increased death and development of cardiovascular disease (CVD)[15] worldwide. Our main aim was to capture the Iranian population's T2D genetic architecture.
Key methods and data collection
As an initial step through finding genetic architecture in Iranians, segregation, aggregation, and family-based heritability of T2D were assessed among TCGS participants. Besides, we detected T2D-associated SNPs significantly enriched in the Iranian population, calculated the PRS, and surveyed the association of PRS with T2D incidence in TCGS participants. We calculated the weighted PRS for TCGS participants and checked the association with T2D lifetime incidence. For calculating restricted PRS we selected the associated SNPs with a genome-wide p-value threshold of 5e-8.
Main results
As the first study on the Iranian population, Akbarzadeh et al. 2019, exploited the TCGS family structure to estimate familial aggregation, heritability, and mode of inheritance of type 2 diabetes (T2D)[16]. The construction of 2594 constituent pedigrees was built on 13741 individuals aged >20 (mean±sd: 39.71 ± 16.56 years). T2D familial aggregation was significant (p<0.05), and family-based heritability highlighted the importance of the 65 % genetic variation in T2D development and expression among individuals (S.E.=0.034). Within first-degree relatives (parent-offspring and siblings), the risk of parental effect was higher than in siblings (OR=4.11 vs. OR=1.65). The family history of T2D among first-degree relatives was more remarkable than the second-degree relatives (OR=3.84 vs. OR=0.59). Complex segregation analysis revealed that the polygenic model well illustrated the mode of inheritance of T2D among the TLGS participants. As a first step toward the prediction of T2D development using the person-specific genetic profile, Moazzam-Jazi et al. in 2020 recognized multiple T2D-associated SNPs significantly enriched in the TCGS cohort, compared to the global population[17]. It can partly account for the differences in drug response and subsequent treatment efficiency among cases with diverse ancestries. They also assessed the cumulative effect of enriched risk SNPs by computing the PRS for adult participants (³20 years) and figured out the significant association between the PRS and T2D incidence in the TCGS cohort. Hence, the high genetic burden of T2D across the Iranian population can contribute to the enhanced prevalence of the disease in this population. In line with this finding, they demonstrated a high hazard of T2D development in the genetically high-risk individuals compared to the genetically low-risk individuals in the model adjusted for age, sex, BMI, and other biochemical T2D risk factors[17].
3. Lipid profile
Overall aim and focus areas
As the lipid profile is the main contributing factor for cardiometabolic disease development, we performed some candidate gene analyses to find the lipid traits-associated variants among the Iranians. In addition, we have been conducting some studies assessing the power of genomic prediction for lipid profile traits.
Key methods and data collection
In the Iranian population, a nested case-control study was conducted on two SNPs on CHD incidence. In addition, we proposed a strategy based on a 10-fold-10-repeat cross-validation method in which both whole-genome resequencing (WGR) and GWAS are employed to find the optimized number of SNPs with the most contribution to the explanation of genomic phenotypic variation and make genomic relationship matrix (GRM) perform computationally efficiently in gBLUP[18]. Furthermore, we tested the strategy on lipid traits, including HDL-C, low-density lipoprotein cholesterol (LDL-C), triglyceride (TG), and cholesterol (CHOL) in TCGS participants.
Main results
Our findings demonstrated that rs2048327-G (SLC22A3) and rs17465637-C (MIA3) could significantly increase the risk of CHD development about two times in only males and females, respectively. Also, in the male carriers of the risk allele (G) of rs2048327, the HDL level can significantly predispose them to develop coronary heart disease in the future [19]. In another association study, we found a significant association between the presence of risk alleles of rs7865618 and coronary heart disease (CHD) development in the TCGS population (p=0.03, OR=1.73, CI95%:1.04-2.88)[20]. To determine if the Cholesteryl ester transfer protein (CETP) gene polymorphisms of rs5882 and rs3764261 influence the association between diet and changes in serum lipid profiles. A total of 4700 individuals aged ≥18 were selected among the TCGS participants to survey the changes in serum lipid profiles after 3.6 years of follow-up. Mean changes of total CHOL decreased in the higher quartiles of fish intake in carriers of rs3764261-A compared to the CC genotype. There are ascending trends in TG levels across quartiles of total fat, monounsaturated, and saturated fat consumption in carriers of rs5882-G compared to the AA genotype. There was also a declining trend for mean changes in TG concentrations across quartiles of carbohydrate intake in carriers of rs5882-G compared to the AA genotype[21].
In another study aimed at detecting the informative SNPs that can explain the genotypic heritability of lipid traits, Akbarzadeh et al. found that the highest prediction accuracy was achieved when the entire SNPs were considered for each trait. In contrast, including the subsets of associated SNPs obtained from previous GWAS produced the lowest prediction accuracy for each trait. However, the subset of SNPs, called "truly influential SNPs" showed interesting results about heritability as it could capture marked genotypic variance[18].
Additionally, Sung's two-step method [22] was used to identify pleiotropic genetic variants significantly associated with the longitudinal data of HDL-C, LDL-C, CHOL, and TG. At first, a three-level GLMM was fitted for each longitudinal trait as a response variable. Second, a simultaneous genetic association test was conducted by the generalized quasi-likelihood scoring method (GQLSM) for each SNP. Twenty variants belonged to the AC009035.1, SLC12A3, CETP, NLRC5, ESRP2, and C16orf95 genes were strongly associated (p-value < 6.6 × 10-5) with HDL-C, cholesterol, and triglycerides[23].
4. Obesity
Overall aim and focus areas
The genetic factors involved in obesity incidence have not been a study in the Iranian population, which could be a substantial limitation for personalized medicine in the future. Therefore, to fill this gap, we aimed to characterize the genetic variants associated with obesity and the obesity-related traits, including WC, waist-to-hip ratio (WHR), TG, TC, LDL-C, and HDL and to examine their aggregated effects on obesity incidence among Iranians.
Key methods and data collection
Following data quality control, we used various regression tests embedded. All models were adjusted for the relevant covariates, such as age and sex. The genetic risk score (GRS) was calculated using the weighted method. Furthermore, the false discovery rate (FDR) at the 5% significance level was considered for correcting multiple testing.
Main results
As the first comprehensive study on Iranian pedigrees, we conducted a family-based joint linkage and linkage disequilibrium analysis of 3109 pedigrees. We found that RPGRIP1L is the key gene within the 16q12.2 region whose polymorphisms could be associated with obesity risk factors among TCGS participants[24]. Moreover, different SNP clusters composed of rare and common SNPs within the 16q12.2 region significantly increased BMI among Iranians. They were randomly distributed across the region, with a higher density around Fat mass and obesity-associated (FTO), AIKTIP, and MMP2 genes[25].
In another study, we found nine correlated SNPs upstream of the PPARG gene are significantly involved in long-term and persistent obesity[26]. Four SNPs belonging to the MC4R gene were also significantly involved in the percentage of excess weight loss (EWL%) and BMI weight loss (EBMIL%), especially after 12 months of bariatric surgery[27]. Furthermore, rs13107325 was significantly associated with the increased likelihood of persistent metabolically healthy obesity in menopaused women[28].
FTO is represented as one of the central genes involved in obesity and its corresponding traits. They indicated that some FTO variants (rs1421085, rs1558902, rs1121980, and rs8050136) were significantly associated with the metabolically unhealthy obesity (MUO) phenotype even after adjusting for lipid profile. At the same time, no significant association was detected between those SNPs and metabolically healthy obesity[29]. Another study was designed to investigate the interaction between dietary patterns and FTO polymorphisms regarding changes in BMI and WC over 3⋅6 years of follow-up[30]. Six common SNPs (rs1421085, rs1121980, rs17817449, rs8050136, rs9939973, and rs3751812) within the FTO gene region were chosen. We recognized that BMI nearly 2-fold was higher in individuals who carried the risk alleles of rs1121980, rs1421085, rs8050136, rs1781799, and rs3751812, in the higher quartile of Western dietary patterns (WDP) score.
BMI and WC increased progressively in the high GRS group while elevating the quartiles of the WDP score29. Therefore, it is inferred that adults with a higher genetic predisposition to obesity are more susceptible to the harmful effects of adherence to the WDP, which emphasizes the need to reduce the consumption of unhealthy foods to prevent obesity. Moreover, WC increased with increasing WDP score in carriers of the risk alleles of rs1121980 and rs3751812 but not in individuals without any risk alleles. Higher intake of trans-fatty acids (TFAs) in adults carrying FTO rs8050136 polymorphism could significantly enhance the BMI and WC alternation during an average of 3.6 years of follow-up[31]. However, there is no significant interaction between the combined FTO variants (rs1121980, rs14211085, and rs8050136) and dietary diversity score in general obesity, implying that dietary diversity patterns may play a mediatory role in the presentation of obesity-related factors[32].
A healthy dietary pattern could modify the effect of MC4R rs17782313 on general obesity. As a result of this interaction, individuals with the risk allele of rs17782313 with a higher healthy dietary pattern score have a lower risk of prevalent obesity than those without the risk allele[33]. Moreover, Moazzam-Jazi et al. very recently recognized that eight SNPs in or near MC4R gene are significantly associated with increased BMI, WC, and WHR over a lifetime. Interestingly, they showed that the aggregated effect of these SNPs significantly influences increased BMI and WC only in early adulthood, not during the middle or early adulthood stages. Therefore, the effect of MC4R risk SNP is not constant during the lifetime[34].
Furthermore, we reported two rare signals that were strongly associated with total hip replacement (THR), including a missense variant, c.1141G>C (p.Asp369His), in the COMP gene and a frameshift mutation rs532464664 (p.Val330Glyfs*106), in the CHADL gene. Moreover, c.1141G>C heterozygotes and individuals homozygous for rs532464664 had their hip replacement operation 13.5 years and 4.9 years earlier than others, respectively. Furthermore, it was shown that the full-length CHADL transcript upregulated in cartilage, and the premature stop codon introduced by the CHADL frameshift mutation results in nonsense-mediated decay of the mutant transcripts[35].
5. Metabolic syndrome
Overall aim and focus areas
Metabolic syndrome (MetS) is a multifactorial disease characterized by metabolic disorders such as abdominal obesity, dyslipidemia, hyperglycemia, and HTN. Environmental effects (e.g., inappropriate diet and physical inactivity) play the most important role in the development of MetS[36,37]. Furthermore, familial aggregation and heritability studies suggest that genetic variants mainly contribute to the etiology of this syndrome[38,39]. Accordingly, genetic variability at several loci was shown to be associated with an increased risk of this syndrome[40]. Investigating the modifying effects of genetic variants and dietary determinants associated with the risk of MetS is a novel approach to preventing and treating MetS. Therefore, we studied the risk of MetS in two areas, the genetic variations and related advanced statistical models and gene-nutrition interactions relevant to the risk of MetS.
Key methods and data collection
In the first study, a retrospective cohort study was performed among 5666 participants of the TCGS. To investigate the association of MetS and its components with glucokinase regulator (GCKR) polymorphisms (rs780093, rs780094, and rs1260326), linear and logistic regression analyses were used in an additive genetic model. Moreover, Cox regression analysis was carried out to show the variants' association with the incidence of MetS[41]. In a candidate gene study, to find the optimal prediction model (s) for MetS, GCKR gene variants, along with clinical and demographic information, were used on 4756 eligible TCGS participants. Then, predictive models were compared using logistic regression (LR), Random Forest (RF), decision tree (DT), support vector machines (SVM), and discriminant analyses[42].
Furthermore, following the previous study to find the optimal prediction statistical models for this disease, the association of MetS and BUD13, ZPR1, and APOA5 genes was assessed with 18 SNPs in 5421 MetS-affected and non‑affected TCGS participants. In this cross-sectional study, two models were used to analyze the data. The first model investigated the association between variants and MetS, while the second one (HTG-MetS) evaluated variants' associations with MetS patients with high plasma TG levels. Besides, to make SNP sets from correlated SNPs, four-gamete rules were used. To estimate the association between SNP sets and MetS, the kernel machine regression models and single SNP regression were used[43].
There are some investigations into the gene-diet interactions in TCGS participants. Two subsequent studies to predict the risk of MetS observed the association of CETP gene polymorphisms (rs5882 and rs3764261) among 441 cases and 844 matched controls and TCF7L2 gene variants (rs7903146 and rs12255372) among 1423 individuals with dietary intakes [44,45].
Main results
The findings indicated the association between functional GCKR variants and higher TG and lower fasting blood sugar (FBS) levels. Moreover, the results of Cox-adjusted model regression revealed that TT carriers of rs780094, rs780093, and rs1260326 were associated with a 20, 23, and 21% higher risk of MetS incidence, respectively[41]. The following study reported that the logistic regression model showed a significant association of MetS with age, gender, schooling years, BMI, physical activity, rs780094, and rs780093. Furthermore, the Random Forest analysis showed that BMI, physical activity, and age are the most influential model features. Besides, based on decision tree analysis, we noticed that a person with BMI<24 and physical activity<8.8 had a 4% chance of MetS development[42]. In another study, the kernel machine showed that two sets of over three sets of correlated SNPs have a significant joint effect on both models. Moreover, a single SNP regression analysis indicated that although the ORs of both models were the same, the p-values in the HTG MetS model had a marginally higher significance level. In addition, we observed that the highest OR in the HTG MetS model was for the G allele in rs2266788 (MetS: OR = 1.3, HTG MetS: OR = 1.4) and the T allele in rs651821 (MetS: OR=1.3, HTG MetS: OR = 1.4)[43].
The gene-diet interaction analysis showed no interaction between rs5882 of the CETP gene and dietary macronutrient intakes for MetS risk. However, the first quartile of monounsaturated fatty acids (MUFA) and total fat consumption among G allele carriers was associated with a lower risk of low HDL-C. Moreover, a higher quartile of trans-fatty acid intake among these allele carriers was associated with the risk of high BP[44]. We also observed that the highest tertile of nut consumption was associated with a reduced risk of MetS among T allele carriers of rs12255372, resulting in a 34% reduction of MetS risk[45].
6. Others
6.1 Run of homozygosity (ROH)
Overall aim and focus areas
Consanguineous marriage has long been the culturally preferred form of matrimony. This kind of inbreeding could increase the inbreeding coefficient and the run of homozygosity (ROH). Inbreeding lowers fitness-related characteristics in humans and results in inbreeding depression. Major abnormalities are more frequent in inbred (consanguineous) families than outcrosses. These abnormalities include mutant phenotypes and severe genetic diseases that are lethal in early life. Hence, it is essential to understand the genetic basis of these effects. The initial aim was to estimate the effect of the fraction of ROH and FROH on a handful of quantitative and binary traits of TCGS participants along with 118 other studies and to compare the ROH pattern of Iranian ethnicity with others[46].
Key methods and data collection
TCGS was one cohort among 119 independent genetic epidemiological study cohorts that contributed to the ROHgen consortium. For this analysis, 11,760 participants were included with the genotype information of 675,088 SNPs. These participants were classified into Western Asian/Persian groups, excluding ethnic outliers, duplicates, gender mismatch, and unresolvable pedigree mismatch. Runs of homozygosity (ROH) of >1.5 Mb in length were identified for 18 traits in ten categorized groups (Table 7 S).
Main results
The mean of FROH>1.5Mb for TCGS participants was reported as 0.017 (SE=0.026). We calculated FIS, which measures inbreeding as reflected by non-random mating in the most recent generation. The correlation of FROH and FSNP was reported at 0.98, FROH and FGRM were 0.98, and FROH and FSN_OutsideROH were 0.091. In this consortium data analysis, the authors concluded that TCGS cohorts have high consanguinity rates compared with other studies.
6.2 COVID-19
Overall aim and focus areas
Our main goal was to determine the molecular mechanism of SARS-CoV-2 pathogenesis from different genetic aspects of humans and the virus. Thus, we designed three studies to evaluate the role of genetic polymorphisms in the ACE2 gene, coding for one of the main SARS-CoV-2 receptors and recognizing the SARS-CoV-2 responsive long non-coding RNAs (lncRNAs), and simulate the critical molecular interactions between coronaviruses and the human genome.
Key methods and data collection
In the first study, all genetic polymorphisms of ACE2 and TMPRSS2 genes were identified in the TCGS participants, and evaluated their effects on the virus affinity to the corresponding receptor through the structural bioinformatic simulation methods. The second study investigated the interplay between human long non-coding RNAs and the SARS-CoV-2 genome using publicly available RNA sequencing data via different in silico approaches. Moreover, we computationally recognized the physically interacting regions of the SARS-CoV-2 genome with the infection-responsive lncRNAs. In the third one, a high-throughput computational approach was applied to scrutinize the probability of the host RNA-viral protein and viral RNA-host protein interactions.
Main results
We detected 570 genetic variations, including SNP and INDEL, near or in the ACE2 gene among TCGS participants. Interestingly, two observed missense variants, K26R and S331F, of which only the first one was previously reported, can reduce the receptor affinity for the viral spike protein. Moreover, we demonstrated the important details of ACE2-Spike and ACE2-TMPRSS2 interactions, especially the critical role of Arg652 of ACE2 for the protease function of TMPRSS2[47]. Through the transcriptome analysis, we recognized that more than half of the interactions between lncRNAs of protein-coding genes (PCGs) in bronchoalveolar lavage fluid samples were established by three trans-acting lncRNAs (HOTAIRM1, PVT1, and AL392172.1), which also exhibited a high affinity for binding to the SARS-CoV-2 genome, suggesting the major regulatory role of these lncRNAs during the SARS-CoV-2 infection. Besides, the lncRNAs of MALAT1 and NEAT1 are possibly involved in the development of inflammation in the SARS-CoV-2 infected cells. We also found that the 5′ part can interact with many human lncRNAs, in contrast to the 3′ part of the SARS-CoV-2 genome[48]. Finally, the RNA-protein interaction study explains that evolution attempts to conserve key viral proteins involved in viral genome replication and transcription and restricts their interaction ability. Undesired interactions do not perturb the functions of these proteins. In contrast, the hypermutation rate of non-structural protein 3 (NSP3) endows an affinity to interact with diverse host cell RNAs[49].
Main strengths and weaknesses
The longitudinal tracking since the early decades of the TCGS participants has not only enabled the identification of risk factors for complex cardiovascular outcomes longitudinally but also has shed light on the natural progression of risk factors over time. This study has enhanced our understanding of familial and genetic determinants of cardiometabolic risk factors in multigeneration and over time. Besides, the measurements of the TCGS project were performed by trained technicians and not self-reported. The other strength of the present study is the collection of phenotypes, in all ethnicities and all age groups with a maximum of 22 years of follow-up, from different organ systems coded based on the last version of ICDs. Thus, there is an opportunity to validate the self-reports via patient visits in the clinic in the future and do some more examinations to catch more precise measurements. Nonetheless, our work has some limitations. First, some genotyped participants do not have sufficient information about CRF, or there might be differences in phenotype measurement and definition between parents' study; hence, some information will be lost during genetic analysis. However, due to the small sample size in TCGS compared to the Iranian population and possible ethnic differences, the findings may not be generalizable to all Iranians that indicating the need for national and international replication. The last limitation is the lack of diversity in the clinical phenotypes, which reduces the possibility of examining and evaluating all categories of the disease. We lack a sufficient sample size to evaluate all disease categories.