22q11.2DS cohort and phenotypic and demographic variables
The study cohort consisted of individuals aged ≥17 years with a typical 22q11.2 microdeletion (i.e., the common low copy repeat (LCR)22A–LCR22D extent or proximal nested LCR22A–LCR22B or LCR22A–LCR22C microdeletions extents) ascertained from a specialized adult 22q11.2DS clinic in Toronto, Canada. Prior to filtering and applying exclusion criteria, there were 360 individuals; 259 individuals met criteria for inclusion in primary genetic analyses (Table 1, Supplementary Figure 1). The presence of a typical 22q11.2 microdeletion was identified through standard clinical laboratory methods [13,14] and breakpoints were confirmed using genome sequencing data.
To be included in the study, participants had to have at least one of either a height or BMI (height and weight) measurement obtained from clinical examination. For individuals with multiple height and weight measurements, the most recently available measurement was used, along with the corresponding age at measurement. All but four individuals in this adult cohort were 18 years or older. There were four individuals in the primary analyses for whom the most recently available height and weight measures available were taken at age 17 years. None of these four individuals had short stature. Additionally, we included other phenotypic or demographic variables that may affect height and/or BMI based on their inclusion in the phenotype-only studies of these traits [8,9]. These were sex, age, moderate-severe congenital heart disease (CHD), intellectual disability (ID), psychotic illness, and clinically documented ancestry that was verified using genetic principal component analysis (PCA) (Supplementary Figure 2). As before, we defined “psychotic illness” as individuals diagnosed with schizophrenia or schizoaffective disorder [20].
For details on genome sequencing methods and variant annotation, quality control of common variants, and PCA for ancestry assignment, see Supplementary Methods.
Polygenic risk score regression analyses
We used previously published gold-standard PRSs for height [2] and BMI [21]. Genotype positions and effect sizes were retrieved from the PGS catalog [22] (height: PGS002804, BMI: PGS000027). After performing standard common variant quality control, 1,004,205 variants were used in the height-PRS (91.4% of the 1,099,005 variants in PGS002804) and 2,062,833 variants were used in the BMI-PRS (98.2% of the 2,100,302 variants in PGS000027). The PRS for each individual in the cohort was calculated using PRSice-2 [23]. Both height and BMI PRSs were derived from entirely European cohorts, and upon testing these PRSs within the European and non-European subsets of the cohort in this study, we observed dramatically diminished effect sizes in the non-European subset (Supplementary Figure 3). We therefore opted to perform our primary analyses restricted to individuals of European ancestry.
We tested for associations between the respective PRS and the height or BMI measures using linear regression (lm() function in R) in 1) a univariable model and 2) a multivariable model that adjusted for phenotypic/demographic variables, 22q11.2 deletion extent (LCR22A–D vs LCR22A–B/LCR22A–C), sequencing platform/batch (categorical variable with four levels: TCAG HiSeqX vs TCAG Novaseq6000 vs CIDR NovaSeq6000 vs IBBC HiSeq2500 or HiSeqX), and the first four principal components (PCs) of ancestry.
1) BMI or sex-standardized height ~ corresponding PRS
2) BMI or sex-standardized height ~ corresponding PRS + sexa,b + age + deletion extenta + CHDa + IDa + psychotic illnessa + sequencing platform/batcha + PC1 + PC2 + PC3 + PC4
aCategorical variable
bOnly included in the regression model for BMI
To adjust for the large height difference between sexes, height was represented in the regression analyses as the difference from the individual’s height to the mean height of the individual’s sex (i.e., “sex-adjusted height”), a method previously used in the study generating height-PRS [2]. Sex was therefore not included as a covariate in the multivariable regression model for height. Categorical variables were treated as factors (as.factor() in R) and all continuous variables were standardized using the scale() function in R to produce beta coefficients for the regression analyses. For the sequencing platform/batch variable, CIDR NovaSeq6000 was used as the reference level in the regression model (and is thus not shown in Table 2). The increase in height or BMI per standard deviation increase in PRS was obtained from the “estimate” coefficient ± standard error from the lm() function in R, where only the PRS values were standardized and the raw values for sex-adjusted height (cm) or BMI (kg/m2) were used. The variance in height or BMI that explained the multivariable model was quantified by the multiple R2 metric in R. The variance explained by the PRS variable alone in the multivariable model was represented by the difference in multiple R2 (ΔR2) between the full multivariable model and the multivariable model without the PRS variable. An interaction between sex and PRS for height and BMI was tested using a sex*BMI interaction term in linear regression models.
Risk stratification for short stature using polygenic risk score
To assess the capacity of the height PRS to stratify risk for short stature, we first stratified our cohort by quintiles of height-PRS. Short stature was defined as less than the third percentile height, stratified by sex, based on World Health Organization growth curves at age 18 (https://www.dietitians.ca/Advocacy/Interprofessional-Collaborations-(1)/WHO-Growth-Charts/WHO-Growth-Charts-Set-2). This corresponds to a height cut-off of less than 163 cm for males and 151 cm for females. A Fisher’s exact test was used to compare the proportion of individuals with short stature in the lowest vs highest quintiles of height-PRS.
Furthermore, we constructed receiver operator characteristic (ROC) curves using logistic regression models predicting short stature to calculate sensitivity and specificity of these models without the use of arbitrary cut-offs. We compared three logistic regression models:
1) Covariate only: short stature* ~ sex* + age + deletion extent* + CHD* + ID* + psychotic illness*
2) PRS only: short stature* ~ PRS
3) PRS + covariates: short stature* ~ height-PRS + sex* + age + deletion extent* + CHD* + ID* + psychotic illness*
*binary variable
ROC curves were created using the R package “pROC”. The difference between the area under the curve (AUC) of two ROC curves was compared using Delong’s test for two correlated ROC curves and the optimal sensitivity and specificity of each ROC curve was determined using Youden’s J statistic.
All statistical analyses were performed using R version 4.0.3. Statistical significance was defined as p<0.05. All test performed were two-tailed. P-values were not adjusted for multiple testing.