Curation of disorder-type specific gene lists
We retrieved variants associated with neurological, psychiatric, and neurodevelopmental disorders from the NHGRI-EBI GWAS and ClinVar databases. Intronic and exonic variants were used to create disorder-specific gene lists. In total, 2680 unique genes were found to be associated with disorders of the brain and nervous system. 1390 genes related to neurodevelopmental disorders (i.e., intellectual disability, autism spectrum disorders), 593 to neurological disorders (i.e. Alzheimer’s disease, cognitive decline, epilepsy), and 998 genes related to psychiatric disorders (i.e. major depressive disorder, bipolar disorder, anxiety). There was limited overlap between genes derived from GWAS and ClinVar (Supplementary Table 1). For neurodevelopmental genes, 19 genes were common to both GWAS and ClinVar disease gene lists. Similarly, only 9 genes were present in both neurological gene lists, and there was no overlap between the psychiatric disorder gene lists. This indicates that the ClinVar and NHGRI-EBI GWAS databases capture different sets of genes associated with these conditions, highlighting the importance of integrating multiple data sources for a comprehensive understanding.
Enriched terms in curated disorder-type specific gene lists align with disorder gene list type
To assess whether our curated disorder-type specific gene lists align with known disorders we performed standard enrichment analysis for HPO and DisGeNET terms. Enrichments confirmed over-representation of matching disease terms as expected. For example, the HPO term “delayed speech and language development” was enriched in the neurodevelopmental disorder list, while terms such as “abnormal nervous system electrophysiology” and “mental deterioration” were enriched in the neurological disorder list (Fig. 1A, B, Supplementary Table 2, 3, 4). Overall, our curated gene lists were enriched for matching disease and phenotypic terms.
Interestingly, several disease terms relating to facial abnormalities were found, particularly in the neurodevelopmental disease gene lists. Many disorders involving neurodevelopmental phenotypes also have symptoms involving facial abnormalities, and this has been a clinical research focus in recent years, given the fact that craniofacial and nervous system development are linked (62, 63).
Altogether, these results show that integrating data from ClinVar and the EMBL-GWAS database results in gene lists enriched for known disease signatures. Although the GWAS and ClinVar disorder-type specific gene lists show limited overlap to one another, each contains key genes associated with their associated disorder and are there therefore complementary. We combined the two gene lists associated with each disorder to capture as many known disease genes as possible, creating a unified, disorder-specific gene list for subsequent analysis.
To further investigate the properties of our curated gene lists, we conducted a complementary analysis. We took the genes associated with the significantly enriched terms found in Fig. 1A and 1B and compared their similarity to the three curated gene lists using Jaccard indices. Unsurprisingly, we observe that terms found to have a significant enrichment have correspondingly high Jaccard index in the relevant list (Supplementary Table 5). Based on this simple similarity measure, we observe substantial overlaps between the representation of HPO terms between the neurological and neurodevelopmental gene list. For example, genes associated with abnormal communication, a term significantly enriched in the neurodevelopmental list are also present in the neurological list. On the contrary, terms enriched in the neurological list are more specific and have lower similarity to the other two gene lists (Fig. 1C). This pattern is repeated in DisGeNET terms, although there are terms including “Intelligence” and “Ataxia” that are more specific to one gene list (Fig. 1D). Our list of genes associated with psychiatric disorders is enriched for fewer HPO and in DisGeNET terms with their underlying genes being under-represented in the other two custom gene lists. This analysis highlights the potential value of our curated gene lists in uncovering previously unrecognized connections between disease genes from ClinVar and GWAS studies and related disease categories. The significant overlaps also suggest that these genes may play roles in multiple related diseases, and additional work is required to disentangle the complex gene – phenotype relationships. For our present study, we will utilize our curated gene lists, along with the HPO and DisGeNET terms, to test for potential disease associations in the subsequent analysis.
a HPO phenotypic enrichments across SNV and GWAS gene lists for neurological, neurodevelopmental, and psychiatric disorders, demonstrating distinct enrichment patterns by disease category.
b Comparison of DisGeNET phenotypic enrichments in SNV and GWAS gene lists, highlighting significant overlaps in enrichment patterns across neurological and neurodevelopmental disorders.
c Jaccard scores of HPO terms significantly enriched in curated disorder-type specific gene lists, with terms subset according to the curated gene list they are enriched in.
d Jaccard scores of DisGeNET terms significantly enriched in curated disorder-type specific gene lists, with terms subset according to the curated gene list they are enriched in.
Contrasting our curated gene lists to gene lists in HPO and DisGeNET
Given our curated gene lists contain more and different sets of genes, we were curious about potential overlaps with known disease gene lists. Genes unique to our lists but reported alongside known disease genes could potentially represent additional disease genes. A significant overlap between the two types of gene lists could allow us to identify such genes. There are notable overlaps between our gene lists and known disease genes in HPO and DisGeNET, including 499 genes from the psychiatric gene list being present in the DisGeNET gene list. 263, 151 and 28 genes are unique to the curated neurodevelopmental, psychiatric, and neurological gene lists respectively, with a total of 456 genes not being present in the HPO and DisGeNET gene lists (Supplementary Fig. 1). These findings suggest that our curated gene lists may harbor novel disease-associated genes warranting further investigation.
Temporal clusters are enriched for disease terms
We generated clusters of genes sharing common gene expression patterns over time within each of the 18 major cell types as defined by Herring et al. (for principal neurons; PN-dev, L2-3-CUX2, L4-RORB, L5-6-THEMIS and L5-6-TLE4, for inhibitory neurons; early developing MGE-dev, CGE-dev, ID2, VIP, SST, PV, PV_SCUBE3, other cell types; Astro, Oligo, OPCs, Micro, Vas);(Supplementary Table 6). For each cell type we generated 12 temporal clusters. Of the resulting 216 clusters, 7 contained less than 200 genes (Supplementary Table 7). To assess whether temporal clusters are made up of genes sharing a common function, we performed enrichment analysis using gene ontology terms. We identified 152 unique enriched terms across 17 cell types and 57 clusters, including 52 brain and nervous system specific terms (Supplementary Table 8). These results demonstrate that we can use an un-supervised approach to discover temporal clusters of functionally related genes within individual cell types. The enrichment of brain and nervous system-specific terms indicates that our temporal clusters capture gene expression patterns of potential importance to disease onset and progression. To test this, we performed overrepresentation analysis (ORA) using DisGeNET and HPO terms. We found 139 unique disease term enrichments, of which, 60 were related to the brain and nervous system (Fig. 2A, Supplementary Table 9, 10). Further, we tested whether our manually curated disorder-type specific gene lists were over-represented in specific temporal gene clusters.
Genes transiently upregulated during the neonatal stage in SST interneurons were associated with dystonia and parkinsonian disorders (Fig. 2A, C). Additionally, this temporal gene cluster shows a higher Jaccard score for genes present in our manually curated neurological gene list compared to other clusters (Fig. 2B). This points to early developmental disturbances having an important role in the emergence of neurological disorders in adulthood.
Further, the DisGeNET terms “progressive supranuclear palsy” and “Drug dependence” are enriched in genes upregulated through development in SST interneurons (Fig. 2A, C). In this cluster of temporally expressed genes, the Jaccard score for genes belonging to the curated psychiatric disease gene list is high relative to other SST correlation clusters (Fig. 2B). This suggests that the upregulation of these genes may play a crucial role in the onset and progression of psychiatric disorders, particularly those relating to substance abuse. Further, these genes are of interest for studies relating to neurodegenerative disorders, such as progressive supranuclear palsy, which is currently poorly understood.
Temporal clusters in specific cell types also align with the observed HPO enrichments. For example, the HPO term “Microcephaly” is enriched in genes upregulated in the fetal stage followed by a decline and subsequent plateau in remaining developmental stages for ID2 inhibitory interneurons (Fig. 2D). Microcephaly is an early developmental condition wherein an infant’s head circumference is more than 2 standard deviations below the mean for their age and sex (64–67). This is typically indicative of an infant’s brain not developing properly during pregnancy (68, 69). As such, the temporal expression patterns align well with enriched disease terms and the stages expected to be important in disease.
A further example of interest is temporal cluster 11 in PN-dev neurons. This cluster is transiently upregulated during the childhood developmental stage and shows enrichment for ten neurodevelopmental-associated HPO terms (Fig. 2D, F, Supplementary Table 10). In this cluster, the curated neurodevelopmental disease gene list has the highest Jaccard score among curated gene list (Fig. 2E). Given neurodevelopmental disorders typically manifest in childhood (18), this is to be expected. A similar pattern is seen in cluster six in PN-dev cells, where genes are transiently upregulated during childhood, and the curated neurodevelopmental disease gene list has a high Jaccard score relative to other temporal gene clusters (Fig. 2C).
a Enrichment of DisGeNET terms in temporal gene clusters in the prefrontal cortex identifies disease associations.
b Comparison of Jaccard scores between temporal gene clusters in SST cells and curated disorder-type specific gene lists, and temporal gene clusters in SST cells and DisGeNET terms enriched in SST cells. The x-axis represents different temporal gene clusters in SST cells, while the y-axis shows the Jaccard scores, indicating the degree of similarity. Each point corresponds to a Jaccard score between a temporal gene cluster and either a curated disorder-type specific gene list or a DisGeNET term. Of particular note are temporal clusters six, seven and twelve.
c Overview of disease-relevant temporal gene expression patterns in clusters showing significance to specific disease classifications.
d Enrichment of HPO terms identifies disease-associated terms in temporal clusters in the prefrontal cortex
e Comparison of Jaccard scores between temporal gene clusters in PN-dev cells and curated disorder-type specific gene lists, and temporal clusters in PN-dev cells and HPO terms enriched in PN-dev cells. The x-axis represents different temporal gene clusters in PN-dev cells, while the y-axis shows the Jaccard scores, indicating the degree of similarity. Each point corresponds to a Jaccard score between a cluster and either a curated disorder-type specific gene list or a HPO term.
f Temporal gene expression patterns of clusters that appear important to specific disease classifications. Enrichments relating to structural abnormalities are associated with a pattern of gene expression by which genes are highly expressed in the fetal stage, followed by a decline in expression through to adulthood. Meanwhile, enrichments associated to neurodevelopmental disease terms are associated with a pattern of gene expression by which gene expression transiently increases during childhood before decreasing.
In summary, we clustered genes based on their temporal gene expression patterns in each cell type. We found known disease term associated genes over-represented in these clusters. Our analysis demonstrates that our curated disorder-type specific gene lists and overrepresented disease terms match the expected temporal expression patterns of various disease phenotypes. Given the overrepresentation of known disease genes in our temporal clusters, we hypothesize that other genes in these clusters could also be linked to diseases, potentially providing an avenue to uncover new biological pathways or mechanisms critical to disease progression. These observations could contribute to the development of novel therapeutic targets. Moreover, their association with established disease genes suggests these genes might play as of yet unrecognized roles in disease etiology and severity, providing opportunities for early diagnosis or personalized treatment strategies respectively.
Network derived gene modules provide further insight into disease associations
Using hdWGCNA we performed weighted gene co-expression network analysis for each cell type independently to obtain gene modules. Mirroring the approach for the temporal clusters, we performed enrichment analysis on these modules. We derived between 3 and 17 modules for each cell type and identified 101 enriched disease terms, of which 40 were brain-specific (Supplementary Tables 11,12).
Three gene modules were responsible for all brain-related HPO term enrichments: two modules in L2-3 CUX2 cells, and another one in VIP cells. One of the L2-3 CUX2 module appeared to be specific to neurological and morphological abnormalities, while the other L2-3 CUX2 module and VIP module were primarily psychiatric enrichment driven (Fig. 3A).
Gene modules in eight cell types (Astro, ID2, L2-3 CUX2, L4-RORB, L5-6 THEMIS, LAMP5 NOS1, PV SCUBE3 and VIP cells) were enriched for brain-related DisGeNET terms. Interestingly, ventriculomegaly appeared in both HPO and DisGeNET enrichments for the same L2-3 CUX2 module (Fig. 3A, B). While enrichments between HPO and DisGeNET have followed similar “themes”, there has been limited overlap in the specific enrichments observed in DisGeNET and HPO terms thus far in the analysis. DisGeNET enrichments were varied, with one L5-6 THEMIS module producing the majority of the enrichments, including “Specific Learning Disability” and “Frontotemporal Lobar Degeneration”.
By combining module eigengene values with sample metadata, we calculated the average module eigengene value for each developmental stage. A high value in a stage indicates that the co-expression module is highly active during that specific developmental stage, suggesting that the underlying genes are likely playing an important role. We examined how curated disorder-type specific disease gene list genes are represented in these modules and paired this with their activity at the different the developmental stages. We found that while Jaccard scores are relatively low, the stages contributing to the module reflect the known trajectory of presentation for these curated disease gene lists. For example, in the L2-3 CUX2 turquoise module, the highest scoring curated disorder-type specific gene list is the neurodevelopmental list. In this co-expression module, infancy and adolescence are the developmental stages contributing (Fig. 3C, D).
Interestingly, the majority of enrichments reaching significance arise from L2-3 CUX2 modules (Fig. 3A). One co-expression module contains enrichments pertaining to neurological and morphological enrichments such as “abnormal PNS morphology”, “abnormal cerebral cortex morphology”, and “peripheral neuropathy”. Further, the only stage contributing to the co-expression module is infancy (Fig. 3C). Given these disease terms relate to the structure of the CNS and PNS, it follows that an early developmental stage would be an important timepoint.
In the second co-expression module, enrichments such as “depression”, “impairment in personality functioning”, and “dystonia” are present (Fig. 3A). The Jaccard scores are similar for the curated neurological and psychiatric disease gene lists, and the stages contributing to the co-expression module are adolescence and adulthood (Fig. 3C, D). Psychiatric disorders, particularly mood disorders such as depression typically have an onset during adolescence and adulthood, with a quarter of individuals having their first symptoms before the age of 17, and three quarters before 34 (70), aligning with adolescence and adulthood being the major stage contributors in the co-expression module.
a Enrichment of HPO terms in gene co-expression modules
b Enrichment of DisGeNET terms in gene co-expression modules
c Jaccard scores in L2-3 CUX2 gene co-expression modules for enriched HPO and DisGeNET terms, and curated disorder-type specific gene lists. Scores emphasize the variable presence of curated disorder-type specific gene lists in co-expression modules.
d Importance of developmental stages in gene co-expression modules for L2-3 CUX2 cells.
In summary, while network derived gene modules overall do not show numerous associations to either our curated or HPO and DisGenNET gene lists, activity of the network module at particular stages is relevant. The difference in enrichments between the temporal clusters and network derived gene modules underscores the importance of interrogating single cell datasets with multiple methods to discover novel gene-disease associations.
Presence of manually curated disease gene lists in co-expression modules and temporal expression clusters
In the context of the developing brain, studying gene expression patterns can contribute to our understanding of the molecular mechanisms underlying brain functions, development, and disorders. We wanted to determine whether co-expression modules in which brain-related disease terms were enriched had a corresponding “signature” for a specific curated disease gene list. We found that co-expression modules enriched with significant brain-related disease terms did not consistently align with our curated disorder-type specific gene lists (Fig. 3C). This is in contrast to the results using temporal clustering, in which an increase in the Jaccard score of a curated disorder-type specific gene list contributing to a temporal gene cluster corresponded directly with the type of disease term that was enriched (Fig. 2B, E). This inconsistency in the relationship between enrichments and curated disorder-type specific disease gene lists in co-expression modules highlights a distinct difference between how disease terms and curated disorder-type specific gene lists are represented in clusters obtained by temporal clustering or network-based identification of gene modules.