Study cohort
All complete trios were drawn from among 1 179 individuals ascertained, assessed, and whole-exome sequenced as part of the Amish Mennonite Bipolar Genetics Study (AMBiGen). AMBiGen consists of families ascertained through probands with bipolar disorder and related conditions (Detera-Wadleigh et al in press). DNA was extracted from whole blood (n = 771), lymphoblastoid cell lines (n = 403), or saliva (n = 4) using Qiagen DNeasy Blood & Tissue or OraGene saliva kits. One sample had an unknown DNA source. Supplementary Table S1 provides details of the cohort.
Definition of Phenotypes
AMBiGen recruited probands with bipolar disorder and their family members, some of whom also have a psychiatric disorder. All diagnoses were based on the Diagnostic and Statistical Manual of Mental Disorders, fifth edition (DSM 5). The offspring of 199 trios that passed filtering included individuals with a variety of major psychiatric disorders: 107 cases (49 bipolar I [BD-I], 11 bipolar II [BD-II], 6 schizophrenia [Scz], 6 schizoaffective disorder [SczAD], 11 recurrent [MDD-R] and 12 single episode [MDD-S] cases of major depressive disorder, 2 social anxiety disorder [SAD], 10 unspecified psychiatric disorder. The 92 trios with offspring unaffected by any psychiatric disorder were used as controls. Since a variety of psychiatric disorders show familial aggregation with BD 18 and there is strong genetic overlap among psychiatric disorders across the allele frequency spectrum 3, 8, we divided the trios into four groups based on offspring phenotype: 1) Narrow phenotype (severe bipolar subtypes: BD-I and SczAD, n = 55); 2) Broad phenotype (the narrow group plus BD-II, Scz, and MDD-R, n = 83); 3) All cases (the broad group plus MDD-S, SAD, and unspecified major psychiatric disorder, n = 107); and 4) Controls (none of the listed disorders, n = 92). (Supplementary Table S2)
Whole-exome sequencing (WES)
WES was performed by the Regeneron Genetics Center (RGC) (RGC, Tarrytown, NY, USA). Library capture and sequencing has been described in detail previously 19. Briefly, the IDT xGen Exome Research Panel v1.0 (Integrated DNA Technologies, Coralville, IA, USA) capture was used and 75bp paired-end read sequencing was performed on the Illumina NovaSeq 6000 platform (Illumina San Diego, CA, USA). All samples were randomized before library preparation and sequencing.
Read alignment and variant calling
Reads were aligned to the human genome build 38 (GRCh38) reference genome provided by UCSC using BWA-mem2 version 2.2.1 20. We used Genome analysis Tool Kit (GATK) version 4.2.4.1 21 for variant calling based on the GATK4 best practices workflow 22. Single nucleotide variants (SNVs) and insertions/deletions were called jointly across all 1,179 samples using the GATK HaplotypeCaller package to produce a version 4.2 variant callset file (VCF). Variant call accuracy was estimated using the GATK variant quality score recalibration (VQSR) approach 23.
Dataset QC
The VCF file, containing 1 179 samples, was loaded into Hail 0.2 (https://hail.is/) to perform basic QC steps. Multi-allelic sites were split into bi-allelic sites using HAIL 0.2. A total of 5 438 676 variants in 19 396 genes were included in the VCF file. An overview of the QC and data cleaning process is presented in Supplementary Figure S1.
Initial variant filtering
Low-complexity regions defined by RepeatMasker (downloaded from the UCSC Table browser: http://genome.ucsc.edu/cgi-bin/hgTables) were removed, as were SNVs and Indels that failed VQSR (tranche filter level of 99 for both SNVs and Indels).
Genotype filtering
Samples with mismatched genotyped gender were excluded from the downstream analysis. Variants with < 10 reads, homozygous reference calls with a genotype quality (GQ) < 25; homozygous variant calls with < 0.9 of the read depth supporting the alternate allele or with a Phred-scaled likelihood (PL) of being homozygous reference of < 25 were excluded. Additionally heterozygous calls with variant call rate < 0.9 ((Reference allele depth + alternative allele depth) divided by total depth < 0.9), with a PL of being homozygous for the reference allele < 25, or with < 0.25 of the read depth supporting the alternate allele (i.e., an allele balance of < 0.25) were excluded. Heterozygous calls in the X or Y non-pseudoautosomal regions in males were excluded.
Sample QC
We removed samples with estimated contamination levels using FREEMIX > 2.0% 24 or chimeric reads > 8.5%. We also removed low quality samples with call rate < 95%. To check the accuracy of the reported pedigree information, relatedness was calculated between each pair of samples using Hail’s King function and sex was imputed for each sample using Hail’s impute_sex function. Combined with the imputed sex, these inferred pedigrees were compared to reported pedigrees and checked for discrepancies. We defined duplicate and 1st-degree relative samples using a KING 25 kinship value of greater than 0.354 and 0.117, respectively. No duplicate samples were identified. As a result of the above QC steps, a total of 23 samples in the dataset were excluded, leaving 1 156 samples in the analysis.
Final variant filtering
For final variant filtering, variants with call rate < 90% or a Hardy-Weinberg equilibrium p-value less than 1x10− 6 were excluded, leaving 1 156 samples and 1 082 271 unique variants. This dataset was then used as the starting point for the de novo workflow.
Variant annotation
We used the Variant Effect Predictor (VEP 26) version 104 to annotate variants against GRCh38. VEP assigns properties such as gene name, consequence, and pathogenicity inference by Combined Annotation Dependent Depletion (CADD) version 1.6 to each variant 27. In addition, we annotated with allele frequencies in the Genome Aggregation Database (gnomAD) r2.1.1 in non-neuro samples 28 (https://gnomad.broadinstitute.org/), and allele frequencies in Anabaptist populations (from the Anabaptist Variant Server (AVS), https://edn.som.umaryland.edu/Anabaptist/query.htm) after lifting the genome coordinates over to GRCh38. Finally, we annotated constraint matrix with probability of being loss-of-function intolerant (pLI) scores, loss-of-function observed/expected upper bound fraction (LOEUF) scores, using the gnomAD loss-of-function metrics table from release 2.1.1 28.
We processed the VEP annotated consequences, and we defined variant-specific consequences and gene annotations as the most severe consequence of a canonical transcript in which that variant lies. We then assigned variants to four distinct consequence classes: lof, missense, synonymous and noncoding. We subdivided missense variants into ‘missense damaging’ (misD) if the CADD Phred-scaled scores is greater than 15. The threshold for the CADD Phred-scaled scores was preset according to previous studies 10, 11. Lof or misD DNMs were referred to as deleterious or dDNMs. We defined evolutionarily-constrained genes as those with LOEUF scores < 0.35, as recommended by the gnomAD team.
Detection of DNMs
De novo variants were called using the de novo function of Hail 0.2, developed by Samocha et al. 29 (https://github.com/ksamocha/de_novo_scripts). Population allele frequencies for variants were obtained from the non-neuro subset of gnomAD 28 and these frequencies were used as the input priors. As additional parameters, parents’ homozygous reference genotypes were required to have no more than 3% of reads supporting the alternate allele, offspring’s heterozygous calls were required to have at least 30% of reads supporting the alternate allele, and the ratio of offspring read depth to parental read depth was required to be at least 0.3.
DNM filtering and Sample QC
This process identified 4 415 putative de novo variants at 3 729 distinct genomic locations in the 208 offspring in this dataset. For QC on the de novo variants, we retained variants if they were high confidence as indicated by the calling algorithm (Hail 0.2), medium confidence and a singleton in the dataset (N = 1 156). To remove variants stemming from cell line artifacts, an allele balance of at least 0.4 was required for the 104 offspring whose data were generated from lymphoblastoid cell line DNA. Since true de novo variants should be rare, variants were removed if they had an allele frequency > 0.1% in the non-neuro subset of gnomAD, or > 1% in Anabaptist populations (based on the AVS). Variants were excluded if they appeared more than twice in the remaining list of putative DNMs and were then limited to one variant per person per gene, retaining variants with the most severe consequences. For sample QC, samples whose DNA source was whole-blood or saliva were excluded if they had more than seven protein-coding putative de novo variants. Samples whose DNA source was from cell lines were dropped if they had more than five protein-coding putative de novo variants. We subsequently performed manual inspection with IGV-2.11.8 30 and excluded remaining DNMs with either of the following criteria as in the previous study11: (1) supported by less than two reads in IGV visualization, (2) coinciding with other two or more variant positions in the same read (likely due to misalignment), and (3) with two or more reads supporting the variant in the parent(s) (suggestive of transmission or systematic errors). For the final sample QC, a total of five samples with more than seven DNMs were excluded from the remaining samples.
On average, the 199 offspring in the final dataset had 0.97 [range: 0.91–0.99] of the exome target meeting 15× sequencing depth, 0.0027 [range: 0.0019–0.0179] of free-mix contamination, and 0.02 [range: 0.003–0.083] of chimeric read percentage. Sample information for the 199 trios passing QC is available in Supplementary Table S3.
DNM validation
We validated a subset of the DNMs with Sanger sequencing prioritizing dDNMs that were lof, misD in the narrow phenotype (BD-I and SczAD). Sequencing primers were designed using NCBI Primer-Blast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) and synthesized by Integrated DNA Technologies (IDT Inc, Coralville, Iowa, USA). Forward and reverse primer sequences are shown in Supplementary Table S6. Sanger sequencing was performed by Psomagen Inc. (Gaitthersburg, Maryland, USA).
Incorporation of published DNMs in controls
To increase statistical power of our case-control comparisons, we incorporated DNMs from published control trios comprising the unaffected siblings of ASD probands in SPARK 31. We chose SPARK trios because their DNMs were detected in a similar method to ours: They were exome-sequenced at RGC, aligned to GRCh38, genotype-called using the GATK4 pipeline, and DNM-called using Hail0.2. Variant and sample filtering methods are also similar to what we used. However, due to data availability, we only used autosomal DNMs for our case-control comparison. After re-annotation with our procedures (described above), 3 583 exonic DNMs from 3 032 control trios were combined with the autosomal exonic DNMs from our 92 control trios for the downstream analysis.
Statistical analysis of the patterns of dDNM enrichment and genes recurrently hit by dDNMs by phenotype
All statistical tests were performed using R (http://www.r-project.org/). To test for overall rates and recurrence of DNMs, we fit the data to Poisson-distributed models. For comparisons against control DNMs, we used a one-tailed two-sample exact Poisson test. For comparisons against mutation model expectations, we used denovolyzeR 32 that estimates the number of expected DNMs by incorporating the triplet context, indel rates, the gene length, and null expectation based on macaque–human gene comparisons. We used a previously developed mutability model 29 to compute a mutability table containing the expected number of DNMs per gene per variant class. The mutation rate for damaging missense DNMs (CADD score ≥ 15) in a gene was used from the mutability table generated by Dong et al 33, 34. We did not include inframe indels in this analysis because mutation rates for inframe indels are not evaluated within the statistical framework. The observed versus expected number of DNMs for each variant class were compared using a one-tailed Poisson test. The mutation rates were adjusted with the overall rate of synonymous DNMs in SPARK control trios, assuming that the rates of synonymous DNMs are not greatly different across case and control groups from different ancestries, based on the results in the Iossifov et al. 35 and Howrigan et al. 36 studies.
Statistical significance for the observed numbers of dDNMs in a gene was assessed by using the denovolyzeByGene function in DenovolyzeR 32. As above, we used the mutation rate in a gene from the mutability table generated by Samocha et al. 29 and Dong et al. 33, 34, and excluded inframe indels from the analysis. The exome-wide significance threshold was defined as P = 2.74 × 10− 6 based on the number of genes with available mutation rates in Samocha et al. (n = 18 271) 29.
All the analyses except for overall DNM rate comparison against the mutation model expectation were restricted to autosomal DNMs since Chr X DNMs are hard to interpret and none were validated.
Incorporation of published DNMs in Bipolar disorder
We incorporated published DNMs in BD to further refine their effect on BD risk. We collected results from independent exome-sequenced BD trios from three previous studies 9–11. Specific publications and descriptive data are listed Supplementary Table S4. In total, we assessed 354 published BD trios. DNMs extracted from Supplementary Data 1 of Nishioaka et al 11 which included Fromer et al9 and Goes et al10. These were re-annotated using the same procedures as the present study and combined with our list of DNMs. The combined DNM list in cases includes DNMs from the 355 narrow, 437 broad (including BD-NOS), and 461 total case phenotype groups.
Functional enrichment analysis
To explore functional enrichment of genes carrying dDNMs we employed g:Profiler (https://biit.cs.ut.ee/gprofiler/gost) 37, which provides up-to-date information from numerous databases, including gene ontology (GO).
Co-expression analysis of DNMS
In the co-expression analysis, we considered variants based on their phenotype and function. We used six independent gene lists extracted from the DNM list combined with previous studies: 1) genes with lof variants from narrow group; 2) genes with misD variants from narrow group; 3) genes with lof variants from broad group; 4) genes with misD variants from broad group; 5) lof variants from control group; 6) genes with misD variants from control group.
Reference module overlap
To characterize which functional pathways are associated with the DNMs from six gene lists, we used published co-expression modules. Weighted gene co-expression network analysis (WGCNA) identifies clusters, or ‘modules’, of genes that are highly correlated due to similar expression patterns 38. The reference modules used were identified in healthy controls across multiple brain regions in Gandal et al. 39 and Hartl et al. 40. A hypergeometric test was run on the gene overlap between each of DNM gene lists with each reference module from Gandal et al. 39 & Hartl et al. 40. The background genes were set as the 19 396 unique protein-coding genes compiled from the list of the IDT xGen Exome Research Panel v1.0 (Integrated DNA Technologies, Coralville, IA, USA) capture.
To determine if the gene list overlap is greater than what would be expected by random chance, a permutation test was also performed on each overlap greater in size than 1 gene and not with the grey modules. To do this, a set of genes equal in size to the list was selected from the AMBIGen background. The gene set selected was random but corrected for gene length, i.e., selected from a similar distribution of gene lengths as the original AMBIGen list (with small defined as < 13 Mbp, medium 13–46 Mbp, and large > 46 Mbp; these groupings were selected by creating 3 groups of approximately equal size among the union of the AMBIGen gene lists). The overlap of the permuted gene list with each reference module was calculated; this process was repeated 1000 times for each overlap. Modules that significantly overlapped with the control lists were removed from consideration, so that the reported modules are exclusive to phenotype-associated DNM lists. All p-values were adjusted using the Benjamini- Hochberg (BH) method.
Characterization of reference modules
Functional enrichment
Functional enrichment for each of the reference modules was performed using the R package topGO 41, which accounts for the hierarchical structure of the GO database by penalizing ‘parent’ pathways with enriched ‘children’. Pathways are scored by the number of genes in that pathway also found in DNM list (‘significant’) vs the number of total genes in that pathway from the background (‘annotated’). Using the expected number of significant genes given the size of the list and background, Fischer’s exact test is performed. All p-values were corrected for multiple comparisons using the BH method. We used the same background genes (n = 19 396) as in the co-expression analysis.
To further synthesize and extract meaning from these GO results, topic modeling was employed on all significant GO terms across models. Briefly, topic modeling is a text mining method for unsupervised classification of text. The algorithm identifies natural groups of co-occurring words in significant GO terms, or “topics”. The algorithm can then quantify the mixture of words associated within each topic, while also determining the mixture of topics that describes each grouping (in this case – reference module). Here, Frequently in and Exclusively (FREX) words are used to characterize each topic. The strength of association of each word with each topic is described by the parameter beta. The strength of association of each gene list module with each topic is described by the parameter gamma, which is the estimated proportion of words from the GO terms in that module that are generated from the respective topic. Thus, each module is summarized by general biological function across all pathway results. Number of topics (K = 4) was chosen based on optimal exclusivity and semantic coherence, biological knowledge, strength of association between gene lists and topics.
Cell type enrichment
Cell-type enrichment for each module significantly overlapped with at least one DNM gene list was calculated as the hypergeometric overlap between the module genes list and the list of genes associated with each cell type 42. All p-values were corrected for multiple comparisons using the BH method.
Developmental trajectories
In order to explore the expression of DNMs during development, we performed a developmental trajectory analysis on each reference module that significantly overlapped with at least one DNM gene list. Neocortical gene expression values across different windows of life from 421 samples from 41 human brains were accessed from Li et al. 43. For each module, genes were averaged across samples at each time window and plotted as a smooth curve to visualize periods of average higher and lower expression.