Participants. We studied participants in the Simons Simplex Collection (SSC), a collection of 1,651 carefully ascertained families made up of both parents, one affected child, and one unaffected child, each with high quality (at least 30x read depth) whole genome sequence (WGS) data. Because the polygenic risk scores essential to our design are sensitive to ancestry effects, we restricted our analysis to only families of European ancestry. This restriction was done by matching the SSC to the European population in the 1,000 Genomes reference resource (The 1000 Genomes Consortium, 2015) using the Peddy software package (Pedersen & Quinlan 2017), reducing our sample to 1,300 families. Of these, 1,136 probands were male (87.4%).
Genomic data. All genomic analyses were conducted using whole genome sequence (WGS) data. We combined the two available batches of the Simons Foundation Autism Research Initiative (SFARI) WGS data by converting all genomic coordinates to hg19 with Bioconductor package liftOver using the hg38ToHg19.over.chain chain file provided within the package (https://www.bioconductor.org/packages/devel/workflows/vignettes/liftOver/inst/doc/liftov.html, version 1.8.0). For polygenic risk score (PRS) calculations, variants were extracted and filtered to include only those with minor allele frequency greater than 0.01 using the PLINK software package (Purcell et al., 2007; Chang et al., 2015). Single nucleotide polymorphisms (SNPs) with minor allele frequency discrepancies between batches of over 0.01 were excluded from the PRS calculations. For de novo gene enrichment testing in probands and comparison enrichment in unaffected siblings, de novo variants in probands and siblings were curated for quality and called using standard GATK (Auwera et al., 2013) and RUFUS (Ostrander et al., 2018) variant calling pipelines, resulting in a total of 216,476 de novo putatively functional variants.
Polygenic risk score generation and quartile selection. We obtained genome-wide association study (GWAS) summary statistics from a published study of BMI based on a sample size of 339,224 individuals (Locke et al., 2015). PRS on the SSC participants were then calculated using PRSice2.0 (Choi & O’Reilly, 2019) using p-value thresholds from 0.001 to 1.0. Using regression, PRS were adjusted for sex and for the first 20 principal components of ancestry to account for any residual ancestry stratification. A PRS p-value threshold of 0.2 was selected as optimal by determining the correlations in adult parents between measured BMI and the PRS at each p-value threshold, then selecting the PRS p-value corresponding to the most stable peak (Supplementary Fig. 1). Though sequence batches were carefully combined, we assessed for evidence of residual WGS batch effects by analyzing the distribution of PRS, testing for differences in mean PRS by batch. We observed no significant residual batch differences (Supplementary Fig. 2).
PLACE FIGURE 1 ABOUT HERE
Our PRS selection design is outlined in Fig. 1. We selected mothers in the top quartile of the distribution of maternal PRS for BMI (N = 325). Affected offspring of these selected mothers (and their unaffected siblings as a control test) were then the focus of subsequent GO de novo gene pathway enrichment tests (Figure. 2). Similarity of maternal quartile membership was tested varying the PRS p-value threshold from 0.1 to 0.4. Quartile membership was found to be stable (r = 0.7–0.9, Supplementary Fig. 3), demonstrating that quartile membership is robust to our selection of PRS p-value threshold selection.
GO enrichment testing of de novo functional variants. Of the 216,476 curated WGS variants, we selected variants with predicted medium or high functional consequences on gene function using VEP (McLaren et al. 2016). Variants were extracted using the GEMINI database tool (Paila et al. 2013), producing a final sey of 3,001 putatively functional, genome-wide de novo variants. Gene set enrichment analysis was then carried out using the topGO R package (https://bioconductor.org/packages/release/bioc/html/topGO.html). For the probands and siblings defined by the quartile selections described above, we included all genes in the enrichment if they had at least one de novo variant that passed our predicted medium to high VEP impact filter. We used annotations provided by the org.Hs.eg.db R package, as suggested by the topGO vignette. We limited our analysis to the Biological Processes ontology, which contains 16,113 GO pathways. We used a node size (minimum number of genes associated with a given GO pathway for it to be included) of 10, which reduced the number of GO pathways we considered to 6,917. We applied a FDR correction to raw p-values generated by topGO (6,917 tests, one for each GO pathway passing the node size filter) with the p.adjust function (base R, https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/p.adjust), referred to hereafter as q-values.
Monte Carlo simulation. To provide a null distribution against which to assess our observed results for enrichment testing, we conducted gene set enrichment analysis (as described above) of de novo mutations in randomly selected groups of 325 probands (the same sample size as our selected high maternal BMI PRS quartile). We repeated this process 10,000 times, recording all data provided by topGO during each iteration. We report these null distributions as the mean and two standard deviations of q-values from these 10,000 iterations (multiple testing correction was applied within each iteration) for relevant GO pathways.
Secondary tests to assess specificity of pathway associations. We expected that patterns of GO enrichment that implicate mechanism in ASD would be specific to the probands and not present in the unaffected siblings. Unaffected sibling pathway enrichment was therefore identified using the methods outlined above, and pathways enriched in both probands and unaffected siblings were flagged and omitted, leaving pathways only specific to probands. In addition, our hypothesis focused on enrichment specific to ASD offspring of mothers with high PRS for BMI. It is possible that observed significant pathway enrichment could highlight processes involved in more general metabolic disturbance not specific to high BMI. We therefore also tested for pathway enrichment in probands of mothers in the lowest PRS quartile for BMI and omitted any overlap under the assumption that overlapping pathways likely reflect more general metabolic processes. Specificity of a maternal effect was also part of our hypothesis. We therefore tested for pathway enrichment ranking probands themselves based on their PRS for BMI and selected the top proband quartile. This enrichment would reflect within-proband interaction of background polygenic high BMI risk and de novo events, rather than a maternal effect. We therefore omitted any within-proband de novo enrichments to retain only those due to maternal risk. Finally, we stratified the sample using PRS for BMI of fathers and tested for GO enrichment in the probands in the top paternal BMI quartile. Any overlapping enrichments were deleted, leaving pathways reflecting only maternal and not paternal specificity of the observed enrichments.
Specificity of stratification on maternal polygenic risk for BMI vs. maternal measured BMI. We have focused on underlying polygenic risk of BMI rather than measured BMI to avoid age effects and other environmental effects that can substantially confound measured maternal BMI. To test the correspondence of findings using measured BMI rather than underlying polygenic BMI risk, we also ranked mothers by their measured BMI adjusted for age and selected the top quartile for subsequent proband and unaffected sibling de novo enrichment tests.
Phenotypic tests. All phenotypic data were provided by the Simons Foundation (https://www.sfari.org/resources/ssc-instruments/). A binomial test was used to determine differences in the sex ratio of probands in the selected highest quartile for maternal BMI PRS compared to other ASD probands. For quantitative phenotypes, comparisons of this quartile to the other probands were done using linear regression models. Quantitative assessments included full scale IQ scores measured using assessments appropriate for age and level of development (Fischbach & Lord 2010), available on 277 cases with high maternal BMI PRS and 830 cases without high maternal BMI PRS. In addition, we tested proband scores on the Social Responsiveness Scale (SRS; Constantino et al., 2003), a quantitative measure of autism severity with demonstrated reliability and validity (Parks, 1983; Bolte et al., 2008; Frazier et al., 2010). We included tests of the total SRS score and the individual domain scores for cognition, awareness, communication, mannerisms, and motivation. We similarly tested two other evidence-based behavioral measures, the Autism Behavior Checklist (ABC; Krug et al., 1980) and the multi-axial Child Behavior Checklist (CBCL; Achenbach, 1999), a measure of global emotional/behavioral problems. All tests were subjected to false discovery rate (FDR) correction for multiple testing.