Ethics statement
All research conducted as part of this study was approved by the Research Ethics Committee for Wales and consistent with regulatory and ethical guidelines.
Schizophrenia de novo data
De novo variants from 3,444 schizophrenia proband-parent trios (2,121 male and 1,323 female probands) were obtained from 11 published studies (Supplementary Table S8)6–8, 29,31–37. The probands were ascertained from psychiatric wards or outpatient clinics, and all had received a DSM-IV (Diagnostic and Statistical Manual of Mental Disorders; fourth edition) or ICD-10 (International Statistical Classification of Diseases and Related Health Problems; 10th revision) research diagnosis of schizophrenia or schizoaffective disorder, apart from 5 probands who had a diagnosis of non-organic psychosis (details in Supplementary Table S8).
De novo variants were re-annotated using Ensemble Variant Effect Predictor (version 96)38. PTVs included stop-gain, frameshift, or splice donor/acceptor variants. Missense variants were annotated with their “Missense badness, Polyphen-2, constraint” (MPC) score, which is a pathogenicity metric that combines predictions of variant deleteriousness with measures of regional missense constraint21. We prioritised missense variants with MPC scores ≥ 2 in our analyses, as this class of variant has been shown to be enriched in ASD cases compared with controls20.
Genic Pleiotropy
Neurodevelopmental disorder gene sets
NDD associated genes were identified from the Deciphering Developmental Disorders study13. In that study, 180 and 156 genes were, respectively, associated with de novo PTV and missense variants at exome-wide significance (P value < 2.5 × 10− 6). 53 genes were independently associated at this threshold with both PTVs and missense variants. We stratified the NDD associated genes into 3 independent groups – PTV specific (127 genes), missense specific (156 genes) and PTV + missense (53 genes) – and tested each group for enrichment for de novo variants in the schizophrenia probands. The genes included in these sets are provided in Supplementary Table S9. We did not include ASD associated genes in these sets as independent PTV and missense P values were not reported in the largest published ASD study20.
Statistics
For 3,444 schizophrenia trios, we used published gene mutation rates to estimate the number of de novo variants expected to occur under the null in the NDD gene sets39,40. Where possible, gene mutation rates were adjusted for sequencing coverage; the use of unadjusted per-gene mutation would overestimate the expected number of de novo variants in these trios, and produce more conservative enrichment results (see8 for further details). A two-sample Poisson rate ratio test was used to compare the enrichment of de novo variants in NDD genes, relative to the expected number, with the enrichment observed for all genes outside of the NDD gene set relative to the expected number, thereby controlling for the minimal elevation in the background schizophrenia de novo rates. Gene set enrichment tests were conducted for two mutation classes: PTVs and missense variants with MPC scores ≥ 2.
A Poisson regression was used to test for differences in the degree of enrichments for schizophrenia PTV and missense de novo variants in NDD gene sets. Here, a regression was first performed for each mutation class, with the number of observed de novo variants being the outcome variable, gene set membership (e.g. NDD associated or not) a categorical predictor, and the log of the expected number of de novo variants in each gene set category the offset. The log of the rate ratio for the enrichment of NDD associated genes in PTVs relative to missense variants is then the difference in the log rate ratios for NDD genes in the two Poisson regressions (i.e. the regression coefficient for NDD gene membership). The variance of this difference is the sum of the variances of the regression coefficients, enabling confidence intervals to be generated. The square of the difference in regression coefficients divided by the sum of the variances can be compared to a χ2 distribution with one degree of freedom to give a test of significant differences in the schizophrenia enrichment of PTV and missense de novo variants in NDD associated genes. This approach allows for the background enrichment of schizophrenia de novo variants in non-NDD genes to differ between PTV and missense variants.
We also used a Poisson regression model to evaluate the relationship between schizophrenia de novo variant enrichment and gene level P values for PTV and missense variants in NDDs simultaneously. NDD gene P values were taken from13. Unlike the gene set analysis, which required an arbitrary significance threshold for a gene being considered NDD associated (i.e. P < 2.5 × 10− 6), this Poisson regression was applied to all genes.
N SZ variants (per gene) ~ -log(DD PTV P value) + -log(DD missense P value), offset(log(N SZ expected variants))
Allelic pleiotropy
Neurodevelopmental disorder variants
NDD variants were identified from de novo variants observed in the largest ASD and DD proband-parent sequencing studies (total NDD trios = 37,488; Table 5), which together reported a total of 48,155 single-nucleotide de novo variants, corresponding to 46,772 unique single-nucleotide variants (summarised in Table 5, full list of variants in Supplementary Table S10). We divided these variants into primary and negative control sets. The primary set contains variants with characteristics known to be associated with pathogenicity for NDDs13,41, namely PTVs in loss-of-function intolerant genes (genes with gnomAD pLi scores ≥ 0.922) and missense variants with MPC scores ≥ 221. The negative control set contained all remaining variants (PTVs in genes with pLi scores < 0.9, missense variants with MPC scores < 2 and all synonymous variants), properties that do not predict NDD pathogenicity. Under an allelic pleiotropy model, we predicted that schizophrenia de novo variants would be more enriched among the primary variant set than the negative control variant set.
Phenotype
|
N trios
|
Total DNVs
(PTVs, miss, syn)
|
Unique DNVs
(PTVs, miss, syn)
|
Schizophrenia
|
3,444
|
3,208
|
3,207
(186, 2197, 824)
|
DD13
|
31,058
|
40,818
(3,638, 28,193, 8,987)
|
39,560
(3,400, 27,211, 8,949)
|
ASD20
|
6,430
|
7,337
(516, 4,954, 1,867)
|
7,306(514, 4,934, 1,858) |
NDD (ASD + DD)
|
37,488
|
48,155
(4,154, 33,147, 10,854)
|
46,772
(3,900, 32,076, 10,796)
|
Table 5. Summary of single-nucleotide variants included in the allelic pleiotropy de novo analysis. The ‘N DNVs’ column shows the total number of de novo missense, synonymous, stop-gain, splice-donor or splice-acceptor variants reported in the respective phenotype after excluding variants on the Y chromosome or in mitochondrial DNA. The Unique DNVs column shows the number of de novo variants observed in the respective phenotype after excluding duplicate variants. DNV = de novo variant; PTV = protein truncating variant; miss = missense variant, syn = synonymous variant.
Statistics
Tri-nucleotide mutation rates were used to estimate the expected per-generation mutation rates for NDD variants21. These mutation rates were then used to derive the number of NDD variants expected to occur de novo under the null hypothesis in the 3,444 schizophrenia trios. As mutation rates have not been empirically established for indels, only single-nucleotide variants were considered (Table 5).
The numbers of schizophrenia de novo variants overlapping our primary and negative control variant sets were compared to that expected under the null using a two-tailed Poisson exact test. We also used a two-sample Poisson exact test to evaluate whether the enrichment of schizophrenia de novo variants in the primary variant set was greater than the schizophrenia background de novo rate of all PTVs in LoF intolerant genes and missense variants with MPC scores ≥ 2. Statistics were generated using R statistical software (version 3.4.3) and the poisson.test() function.
NDD variants in our primary and negative control sets were further evaluated using a Swedish schizophrenia case-control exome sequencing data set, which consists of 4,079 cases and 5,712 controls9. Case-control exome sequencing data were analysed using Hail (https://github.com/hailis/hail). To test for an excess burden of NDD variants in cases compared with controls, a one-tailed Firth’s penalized-likelihood logistic regression model was used, correcting for the first 10 principal components derived from the sequencing data, and for the exome-wide burden of synonymous variants, sequencing platform and sex. To focus the case-control analysis on ultra-rare alleles, as those are more likely to be pathogenic, we excluded variants with an allele count > 5 in gnomAD22. Frameshift variants were included in the case-control analysis.