Study populations
Our study used a part of the KoGES dataset obtained from the Korean Center for Disease Control and Prevention. The largest cohort of KoGES is the health examination cohort [KoGES_HEXA], and its dataset consists of participants’ medico-pharmacologic history, anthropometric traits, and blood biochemistry traits.[15] Briefly, KoGES_HEXA is a population-based prospective cohort of 173,357 urban Korean adults that had health examinations at medical centers, recruited from the national health examinee registry. Participants were men and women, aged 40-69 years from 14 major cities across Korea and recruited at baseline between 2004 and 2013. A total of 58,701 participants, whose genome-wide SNP genotype data were obtained, were included in the city-based cohort of the KoGES. All participants voluntarily signed an informed consent form before the study, and the study protocol was approved by the Institutional Review Boards (IRB) of the institutions that participated in KoGES. This study was performed in accordance with the Declaration of Helsinki and approved by the IRB of Theragen Etex (approval numbers: 700062-20190819-GP-006-02).
Measurement of anthropometric and laboratory data & Definition of lifestyle factors
Study participants completed a standardized medical history and lifestyle questionnaire and underwent a comprehensive health examination by trained medical staff according to a standard protocol. Smoking status was classified into three groups: participants who had smoked over 100 cigarettes throughout their lifetimes but had quit before this study were ex-smokers, currently smoking were current smokers, and the remaining were non-smokers. Drinking status (alcohol intake) was also classified into three groups: current drinkers, ex-drinkers, and non-drinkers. Regularity of physical activity was determined according to whether subjects participated regularly in any sports to the point of sweating.
Body mass index (BMI) was calculated as weight in kilograms divided by height in meters squared (kg/m2). Systolic (SBP) and diastolic blood pressure (DBP) were measured twice by a standardized mercury sphygmomanometer (Baumanometer-Standby; W.A. Baum Co. Inc., New York, NY, USA). Blood samples were drawn after overnight fasting, and venous blood sampling was collected in a plain tube. Biochemical parameters, including fasting glucose, hemoglobin A1c (HbA1c), total cholesterol, HDL cholesterol, and triglycerides (TG), were determined by enzymatic methods (ADVIA 1650, Siemens, Tarrytown. NY, USA). LDL- cholesterol was calculated using the Friedwald equation (LDL- cholesterol = total cholesterol –HDL- cholesterol –[TG/5]).
Definition of study phenotypes
HTN was defined as systolic BP ≥ 140 mmHg or diastolic BP ≥ 90 mmHg on health examination, currently taking an anti-hypertensive drug, or diagnosed by a physician. DM was defined as fasting blood glucose ≥ 126 mg/dl, HbA1c ≥ 6.5% (48 mmol/mol), currently taking an anti-diabetic drug or insulin, or diagnosed by a physician.
DL was defined as either diagnosis by a physician, current use of lipid-lowering medication, or according to the National Cholesterol Education Program- Adult Treatment Panel III (NCEP-ATP III) criteria: (1) hypercholesterolemia (serum TC ≥ 240 mg/dl), (2) hypertriglyceridemia (serum TG ≥ 200 mg/dl), (3) hyper-LDL cholesterolemia (serum LDL-C ≥ 160 mg/dl), (4) hypo-HDL cholesterolemia (serum HDL-C < 40 mg/dl).
Outcome measurements
We defined CAD as the participant-reported history of the diagnosis or treatment of angina pectoris or myocardial infarction. Ischemic stroke (IS) was defined in the same manner, in that it was based on the participant-reported history of the diagnosis or treatment of ischemic stroke. Cardio-cerebrovascular disease (CCD) was defined as the combination of CAD and IS per our study outcome definition.
Study design
This study investigated the genetic risk factors of CVDs in the patients with metabolic disease (HTN, DM, or DL). For this analysis, we applied the exclusion criteria schematically illustrated in Figure 1: from baseline we excluded participants with missing data values, i.e. smoking, alcohol, exercise history and body mass index (BMI) (n=471). Subsequently, participants with histories of malignancy or no response regarding malignancy were excluded (n=2,202). After these exclusions, 56,028 participants were included; the final sample size for the present analysis was 16,313 participants with HTN, 5,394 participants with DM, and 20,788 participants with DL.
Genotyping and quality control procedures
The genotype data were graciously provided by the Center for Genome Science, Korea National Institute of Health. The genotype data was produced by the Korea Biobank Array (Affymetrix, Santa Clara, CA, USA) (Moon et al., 2019, PMID 30718733). The experimental results of Korea Biobank Array were filtered by the quality control procedures of the following criteria: call rate higher than 97%, minor allele frequency higher than 1%, and Hardy-Weinberg equilibrium test p < 1 x 10-5. After the quality control procedures, the experimental genotypes were applied to the imputation genotype dataset of 1000 Genome Phase 1 and 2 Asian panel. Finally, the number of SNPs for the GWAS was 7,975,321 SNPs from chromosomes 1 to 22.
Statistical analysis
All data are presented as the mean ± standard deviation (SD), or number (%). For the discovery GWAS analysis, the association between individual SNP genotypes and CAD, IS, and CCD risks were modeled additively for each copy of the minor allele using logistic regression adjusted for age, sex, BMI, exercise status, smoking status, alcohol intake, and PC1 and PC2 as covariates using PLINK, version 1.9.[16] PC1 and PC2 were obtained through a principal component analysis, which was conducted to reduce the bias of genomic data due to the regional differences in sample collection. We selected high linkage disequilibrium (LD) and cluster SNPs wherein no SNP gap exceeded 50kb with high LD (r2>0.8) from the top significant SNPs. The significant associations were defined by genome-wide significance level p-values (5.00 × 10-8). The gene-region plot of the top SNP associations was generated with Locus- Zoom version 0.4.8.2.[17]