Forward and reverse genomic screens enhance the understanding of phenotypic variation in a large Chinese rhesus macaque cohort

doi:10.21203/rs.3.rs-4800799/v1

Download PDF

Article

Forward and reverse genomic screens enhance the understanding of phenotypic variation in a large Chinese rhesus macaque cohort

https://doi.org/10.21203/rs.3.rs-4800799/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Combining genotype and phenotype data promises to greatly increase the value of macaque as biomedical models for human disease. Here we launch the Macaque Biobank project by deeply sequencing 919 captive Chinese rhesus macaques (CRM) while assessing 52 phenotypic traits. Genomic analyses revealed CRMs exhibit 1.7-fold higher nucleotide diversity and significantly lower mutational load than their Indian counterparts. We identified hundreds of loss-of-function variants linked to human inherited disease and drug targets, and at least seven exert significant effects on phenotypes using forward genomic screens. Genome-wide association analyses revealed 30 independent loci associated with phenotypic variations. Using reverse genomic approaches, we identified DISC1 (p.Arg517Trp) as a genetic risk factor for neuropsychiatric disorders, with macaques carrying this deleterious allele exhibiting impairments in working memory and cortical architecture. This study demonstrates the potential of macaque cohorts for the investigation of genotype-phenotype relationships and exploring potential spontaneous models of human genetic disease.

Biological sciences/Genetics/Population genetics

Biological sciences/Genetics/Sequencing

Macaque Biobank

Chinese rhesus macaques

loss-of-function

forward genomics

reverse genomics

GWAS

DISC1.

Over the past decades, rhesus macaque (Macaca mulatta) bioresources have played a crucial role in deepening our understanding of human physiology, metabolism, reproduction, development, cognition, and pathology^1–3. More recently, the importance of this species as an experimental model increased substantially during the COVID-19 pandemic, a dire public health crisis that urgently necessitated the recruitment of many animal models for vaccine testing and drug treatments⁴. However, this global pandemic also triggered, either directly or indirectly, a worldwide shortage of rhesus macaques for research⁵. In consequence, fully appreciating and efficiently utilizing macaque bioresources has become a major challenge currently faced by all biologists⁶.

Effectively utilizing rhesus macaques as an experimental animal model benefits from the greater resolution of genetic variation and detailed phenotypic examination in parallel^7,8. Additionally, insights into the genetic diversity of macaque populations will greatly assist in the rational genetic management of research colonies⁹. Rhesus macaques are geographically widespread and consequently genetically diverse, although the number of recognized subspecies varies between studies^10–12. Three distinct lineages are nevertheless well recognized: Indian, Chinese and Indochinese¹³. Currently, the most significant macaque bioresource, macaque genotype and phenotype (mGAP)¹⁴, primarily concentrates on Indian rhesus macaques (IRM), with only a limited number of samples being of Chinese origin. However, it is now clear that Chinese rhesus macaque (CRM) populations exhibit considerable genetic variations, potentially surpassing that of their Indian counterparts¹⁵, and they vary markedly in traits such as body size, pelage and other morphological characteristics^16,17. To effectively monitor and preserve the diversity of CRM, and with an eye to utilizing them as biomedical experimental models, a national primate facility known as "National Research Facility of Phenotypic and Genetic Analyses of Model Animals (Primate Facility)" has been established at the Kunming Institute of Zoology (KIZ), Chinese Academy of Sciences (CAS)¹⁸. This invaluable bioresource not only offers an opportunity to explore the genetic variation that underlies observable phenotypic, physiological and behavioral differences between macaques, but the identification of functionally significant genetic variations will also enhance our understanding of existing models thereby paving the way for the discovery of novel genetic models for inherited human diseases.

Two complementary approaches, namely forward genomics and reverse genomics, can be utilized to achieve these goals. Forward genomics, a phenotype-driven strategy (i.e., genome-wide association study [GWAS]), starts with the measurement or observation of a phenotype and proceeds to the mapping of the causative loci or genes¹⁹. This method is particularly powerful in deciphering the molecular mechanisms underlying natural phenotypic variation, in those cases where we have no prior knowledge of the genes involved in the biological process. Conversely, reverse genomics is a gene-driven approach that involves identifying mutations in specific genes of interest, followed by phenotypic assessment²⁰. Whereas reverse genetic studies tend to be more straightforward and shorter in duration by comparison with forward genetic studies, they can be hampered by challenges such as inefficient gene knockdown or genetic background effects^21,22. Until now, both approaches have been successfully applied to a number of model organisms, including mouse^23,24, zebrafish²⁵, Drosophila²⁶, and Arabidopsis²¹.

Accordingly, we have launched the Macaque Biobank (MB) project, with the aim of capturing a wide range of phenotypic and omics data across large numbers of individual macaques. In the initial phase, we densely genotyped 919 CRMs and assessed 52 phenotypic traits that were collected from the colony of KIZ. We first explored the ancestry, genetic diversity and sequence variations present in this cohort. Next, we performed forward genomic screens to identify the genetic variants responsible for specific phenotypes. Finally, we employed reverse genomic screens, focusing mainly on neurological disease genes, to examine the phenotypic consequences arising from specific mutations. Overall, the MB introduced here promises to serve as an invaluable resource for the study of the genotype-phenotype relevance of macaques to molecular medicine, as well as for the discovery of new naturally occurring models of human genetic diseases.

Genetic ancestry and status of the CRM cohort

The initial dataset comprised 919 captive CRM individuals that were sequenced to a high mean depth (~ 30.47X) (Supplementary Table 1) and 80 wild CRM samples¹⁶ with moderate genomic coverage (~ 11.71X). After applying a series of sample and variant quality controls (see Methods), we obtained a total of 84,480,388 high-quality sequence variants across 961 individuals, including 74,752,163 single-nucleotide variants (SNVs) and 9,728,225 insertions or deletions (Indels) (Fig. 1a). This corresponds to an average of one variant per 35 base-pairs (bp) genomic DNA. Nearly 59% of these variants occurred at low allele frequencies (AF < 0.01) whereas approximately 17.9% were classified as very common (AF > 0.05). The intersection with the largest IRM cohort (mGAP v2.2)¹⁴ revealed that more than 62 million of the variants (73.94%, Fig. 1a) were newly identified, despite the much smaller sample size of our cohort compared to that of the mGAP project (964 vs. 2,425). This is perhaps not surprising given that the reference genome per se is an Indian-origin subspecies, which is phylogenetically distinct from the CRM²⁷. Nevertheless, we cannot exclude another possibility that our CRM cohort may possess higher levels of genetic diversity compared to the mGAP cohort, which is evident from the results presented below.

We traced the genetic ancestry of the CRM cohort by incorporating samples from diverse geographical regions of China alongside samples from India. The PCA results show a clear separation between the CRMs and the IRMs (Fig. 1b), thereby corroborating the marked genetic divergence of these two geographically separated subpopulations^16,28. Within the Chinese samples, the captive CRMs were indistinguishable from the wild population, irrespective of whether or not the Indian-origin samples were excluded. Such pronounced admixture between captive CRM samples and the wild population was further corroborated in FRAPPE-inferred ancestral clusters (Fig. 1c), implying that the captive CRMs are likely a recent admixture of multiple wild sources, aligning with the maintenance history of the cohort. The combination of multiple genetic ancestries introduces increased nucleotide variation into the recipient population. As expected, we found that the captive CRMs showed the highest genetic diversity (mean π = 0.0016), which is comparable to that of the wild population (average π = 0.0015) and 1.7-fold higher than the mGAP cohort (average π = 0.0001) (Fig. 1d). This notwithstanding, the mutational load pattern indicated that both the captive CRMs and the wild population carried significantly fewer deleterious mutations (Fig. 1e) and homozygous loss-of-function (LoF) (Fig. 1f) than the mGAP cohort (Mann-Whitney U test, p-value < 2.2x10^− 16). High genetic diversity and low genetic load are reliable indicators of a population's long-term viability^29,30. These results imply that the genetic status of our captive CRMs compares favorably with the mGAP samples, with a lower risk of inbreeding, germplasm degradation and loss of genetic diversity.

Variant annotation and mutational profiling

We classified the variants into different categories based on their location and functional impact. As seen in human cohorts^31,32, the majority of the CRM variants were found in intergenic and intronic regions, accounting for 45.13% and 39.75%, respectively, whereas the variants located in coding and splicing regions made up 0.89% of the total (Fig. 2a and Extended Data Fig. 1a–c). The number of synonymous variants (~ 328K) was slightly higher than the non-synonymous variants (~ 315K); they together comprised 85.22% of the variants in coding and splice regions. The allele frequency distribution indicated that the non-synonymous and frameshift mutations, start/stop gains or losses, and splice site variants are more likely to be rare or singletons (Fig. 2b), reflecting the putative purifying selection acting on them.

We further examined the mutational constraint on different genes and pathways. To mitigate the potential population genetic forces that may have influenced the amount of variations as well as the features of the local sequence (e.g., gene length, mutation rate), we utilized the number of synonymous variation as the control baseline³³. Specifically, for each gene, we computed the ratio of non-synonymous to synonymous substitutions (dN/dS), a statistic that is also commonly used to measure the strength and mode of natural selection acting on protein-coding genes³⁴. After controlling for the false discovery rate (FDR), our results showed that the most evolutionarily constrained pathways (involving genes with no observed non-synonymous mutations) were related to core biological processes, e.g., ribosome, spliceosome and proteasome components (adjusted p-value < 0.05, Fig. 2d), consistent with previous findings in human cohorts^33,35. By contrast, the immune-related pathways, such as the chemokine signaling pathway, cytokine − cytokine receptor interactions, viral protein interactions with cytokine and cytokine receptors, were among the least constrained pathways (dN/dS > 4). Interestingly, several neurodegeneration pathways, such as those evident in amyotrophic lateral sclerosis (ALS), Parkinson disease (PD), Huntington disease (HD) and Alzheimer disease (AD), were also found to be markedly conserved (adjusted p-value < 0.05), implying their functional importance and strong purifying selection in rhesus macaques. It is reasonable to suppose that these categories of conserved pathways are also less tolerant to deleterious mutation.

Loss of function (LoF) variants and association with phenotypes

LoF variants, including nonsense, frameshift or canonical splice-site mutations, are of particular interest as they have the potential to severely disrupt the functionality of protein-coding genes, thereby could serve as naturally occurring gene knockouts to explore gene function³⁶. However, LoF variants are known to have a high false-positive rate due to various factors, including incomplete and imperfect genome annotation, occurrence on non-canonical transcripts or within the last 5% of the transcript^37,38. To increase the probability of a given variant being accurately annotated as a predicted loss-of-function (pLoF) mutation, we applied a set of filtering strategies to the raw LoF variants derived from the SnpEff prediction (see Methods for detail). In total, we identified 4,166 high-confidence pLoF variants across 2,746 genes (Supplementary Table 2), where at least one copy of the gene was predicted to be inactivated based on both rhesus macaque and human genome annotations. Of these, the majority (83.08%) were found to be rare (MAF < 0.01) and only 5.61% of the pLoF variants were very common (MAF > 0.05). On average, each individual macaque carried 97 pLoF variants, similar to the numbers found in human genomes^36,37.

The very common pLoF alleles are likely to be LoF-tolerant because they are less constrained by purifying selection. We observed a significant enrichment in olfactory receptors among these alleles (adjusted p-value = 1.809×10^− 4, Supplementary Table 3), consistent with the findings of previous studies^33,35,39. It is intriguing to find that seven mouse essential genes (PPP1R15B, IFT52, CYP1A2, ETV2, NFASC, SLC2A9, PLRG1) and two human essential genes (MAK16, PLRG1) were tolerant to biallelic inactivation in CRMs (Fig. 3b and Supplementary Table 2). For example, the PLRG1 gene, which encodes a core component of the cell division cycle 5-like (CDC5L) complex, is crucial for both mouse embryonic and human cells in terms of their viability^40,41. However, our observations suggest that the homozygous knockout of this gene does not result in severe consequences or a disease state in CRMs, probably the evolutionary change of gene essentiality across species⁴² or a compensation effect from gene family members⁴³. By contrast, rare pLoF alleles (MAF < 0.01) are expected to be less tolerated and likely associated with a strong functional effect. We found a strong depletion of homozygosity among rare pLoF variants, with only 78 (2.29%) of the variants being homozygous. These genes were significantly enriched for metabolic pathways, such as arachidonic acid metabolism, glycerophospholipid metabolism, and glycerolipid metabolism (adjusted p-value < 0.05, Supplementary Table 4). Interestingly, we identified 338 genes as potential drug targets within the high-quality pLoF catalog (Fig. 3b and Supplementary Table 2). These genes exhibited varying degrees of gene loss, which could potentially lead to inter-individual differences in pharmacological efficacy. Consequently, the compilation of high-confidence LoF variants could serve as a key resource to guide the selection of suitable "druggable" targets, and it would be rewarding to have a primary screening for these druggable targets in CRMs for selecting the proper individuals for the pharmacological evaluations.

To further characterize the phenotypic consequences of the rare pLoF variants, we performed an association screen against 52 distinct phenotypes (Supplementary Tables 5 and 6). Association results surpassed the Bonferroni significance threshold (p-value = 2.83×10^− 5, see Methods) for seven pLoF-trait pairs (Supplementary Table 7). The most significant association was a splice acceptor variant in ANO10 (c.203-2 AG > G), which was related to the full-leg length (p-value = 8.97×10^− 6). Compared to the non-carriers, ANO10 (c.203-2 AG > G) heterozygotes displayed a significant reduction in full-leg length (Mann-Whitney U test, p-value = 0.0251, Fig. 3c). Notably, ANO10 (c.203-2 AG > G) heterozygous carriers also exhibited a nominally significant reduction in full-arm length (Mann-Whitney U test, p-value = 0.0139, Fig. 3d), although the association test (p-value = 4.12×10^− 5) did not surpass the level of significance required by Bonferroni correction, likely because the correction approach is highly conservative and would tend to "overcorrect" the variants in the context of a mild or small effect⁴⁴. ANO10 encodes a transmembrane protein that belongs to the transmembrane 16 family. Defects in this gene can cause ataxia, a neurological condition characterized by gait and balance impairment, upper limb coordination problems, as well as impairment of speech and eye movements^45,46. However, to our knowledge, ANO10 has never been reported to be associated with limb length. Similarly, we could identify a heterozygous splice acceptor mutation at PRRC2B (c.6379-2A > G), which was predicted to play a role in embryonic development⁴⁷, was significantly associated with a higher body weight (p-value = 9.67×10^− 6, Fig. 3c). If employing a less conservative association p-value threshold (e.g., 1×10^− 4), we could identify another 13 associations that was align with the gene function (Fig. 3d). For instance, the carriers of a stop gain mutation in the ATR gene possess a smaller head length (Mann-Whitney U test, p-value = 0.0072). It has been suggested defects of this gene was a cause of Seckel syndrome 1, a syndrome characterized by severe intrauterine and postnatal growth retardation, microcephaly and mental retardation⁴⁸. In addition, the heterozygous knock-out of ALOX15, which encodes an enzyme that acts on various polyunsaturated fatty acid substrates⁴⁹, was associated with lower high-density lipoprotein (HDL) and low-density lipoprotein (LDL) concentrations in serum of CRMs (p-value = 0.0042 and 0.0073, respectively).

Genome-wide association for 52 phenotypes in CRMs

The availability of multiple genomes coupled with phenotypic data also provides an unprecedented opportunity to investigate the genetic foundations of phenotypic variation in CRMs. To this end, we performed GWAS analyses for each quantified trait on the common variants (SNVs + Indels) with a mixed linear model by fitting relevant covariates, e.g., age, sex, genetic relationship, population structure (see Methods). The genomic control factor λ did not show any sign of inflation for all tests (λ < 1.03), suggesting that population structure has been well controlled. In total, we identified 44 variants associated with 16 phenotypic traits that passed the genome-wide significance threshold (p-value = 5.13 × 10^− 8). These variants were clumped into 30 independent loci across 18 chromosomes, explaining 3.36%-5.97% of phenotypic variation (Extended Data Fig. 2 and Supplementary Table 8).

The annotation of these significant variants revealed six genes (DCDC2C, TRIB1, EDIL3, GGT1, SHISA9, WWOX) have been reported to be associated with specific human traits (Supplementary Table 8). For instance, the EDIL3 gene, which encodes an integrin ligand, has been previously suggested to be related to human body mass index (BMI)⁵⁰. In this study, we discovered that a downstream variant of this gene was significantly associated with a reduction in BMI in rhesus macaques (beta=-1.0737, p-value = 6.30×10^− 9, Extended Data Fig. 2c). We also observed associations of the SHISA9 locus link to hip circumference ⁵¹(beta= -0.6693, p-value = 3.69 ×10^− 8), and the WWOX locus with body weight ⁵²(beta = 0.3909, p-value = 3.87×10^− 8) (Extended Data Fig. 2 and Supplementary Table 8). Apart from these known associations, we identified 11 significant associations that had not previously been reported in the human GWAS catalog⁵³ (Supplementary Table 8). Of these, the most significant association was observed for a 5'-UTR variant at the IGLL1 locus (c.-1436C > T), which was related to the serum gamma-glutamyl transpeptidase concentration (γ-GGT) level in CRMs (p-value = 2.76x10^− 11, Fig. 4a,b and Supplementary Table 8). This gene encodes an immunoglobulin lambda-like polypeptide 1 protein which plays an important role in B cell development⁵⁴. In CRMs, the heterozygous and homozygous carriers exhibited a gradual increase in γ-GGT concentration as compared to non-carriers (Fig. 4d). Interrogation of human ENCODE databases⁵⁵ revealed that this signal region exhibited distinct active enhancer signatures in a range of human cell types (Extended Data Fig. 3). It is noteworthy that this peak also encompassed an independent locus of GGT1 (p-value = 2.59 x10^− 8), which has previously been reported to be associated with γ-GGT level in human⁵⁶. However, regional association analysis indicated that these two variants were in weak linkage disequilibrium (LD) (r² = 0.01, Fig. 4c), suggesting that they are independent associations with both being linked to the GGT level.

Reverse genetic screen identifies DISC1 (p.Arg517Trp) as a genetic risk factor for neuropsychiatric disorders

The above classical forward genetic approaches enabled the identification of multiple genetic variants associated with the phenotypic variations in CRMs. It is intriguing to verify whether a distinct genotype can predict a specific phenotype. In a reverse genetic screen, we identified 3,192 non-synonymous mutations across 2,216 genes that were predicted to be deleterious based on the intersection results of SIFT4G⁵⁷ and PolyPhen-2⁵⁸ (Supplementary Table 9). We are particularly interested in the genes related to human neurological disorders (NDs) as these complex diseases are difficult to investigate using rodent models^3,59. Non-human primates (NHPs) are not only phylogenetically close but they also share similar brain structure and function with humans, making them more suitable for the study of human NDs than other mammalian species⁶⁰. Below, we highlight the case regarding the phenotypic consequences arising from a deleterious missense mutation in the DISC1 gene (p.Arg517Trp, c.1549C > T, SIFT4G score = 0.01).

DISC1 (Disrupted-In-Schizophrenia 1) is a well-established risk gene for schizophrenia and various other neuropsychiatric disorders, including affective disorder, bipolar disorder, autism spectrum disorder, and major depressive disorder^61,62. This gene encodes a multi-compartmentalised protein that functions as a scaffold hub, interacting with numerous partners involved in brain development and disease processes. Defects in DISC1 have been reported to be associated with impaired working memory⁶³. Anatomical changes mostly involve cortical abnormalities, including the prefrontal cortex as this area plays an important role in executive functions and working memory⁶⁴. In this cohort, we identified eight CRMs that carried the DISC1 p.Arg517Trp mutation in the homozygous state. These macaques included three adults (aged 5 to 7 years) and five elderly individuals (aged over 19 years). Given that aging could potentially affect the results obtained (e.g., working memory), we focused on the three adults and excluded the elderly monkeys from the behavioral and brain imaging experiments.

We observed a significant reduction in neurological function in carriers of the risk allele (Trp, two-tailed t-test, p-value < 0.0001, Fig. 5d). This reduction was manifested by diminished limb reflexes, as well as a decreased response to pain and teasing. We further assessed the working memory under mild-stressful and non-stressful conditions, respectively. Our results showed that risk allele carriers consistently exhibited lower working memory performance with increasing delay lengths, and this pattern was particularly evident in the trials with 30 second delays (Fig. 5a,b). When a restraint stress was applied, the risk allele carriers displayed markedly more errors under these stressful conditions (two-tailed t-test, p-value = 0.0363, Fig. 5c). Since stress is a risk factor for psychiatric disorders associated with impaired prefrontal function^65,66, these data may help to explain why the deleterious missense mutation of DISC1 increases the risk of psychiatric disorders.

Next, we carried out magnetic resonance imaging (MRI) to examine whether any cortical structure was altered in DISC1 Trp carriers. Although we did not detect a significant reduction in gray matter volume and thickness (Extended Data Fig. 4), we observed an increase in gray matter surface area in the Trp risk allele carriers, particularly in the motor cortices of the caudal frontal lobe (corrected p-value = 0.0338, Fig. 5e). Conversely, we detected a significant reduction of white matter volume in the temporal lobe (corrected p-value = 0.0064, Fig. 5f) and a significant increase in ventricular volume (corrected p-value = 0.0169, Fig. 5g). Further region-level results confirmed that the majority of changes in gray matter surface area and white matter volume were localized to the frontal lobe and temporal lobe, respectively (Extended Data Fig. 5–6). We also collected resting-state functional magnetic resonance imaging (rs-fMRI) data. The network-based statistic (NBS) was conducted across a range of primary thresholds (t = 3.0-3.4) to identify differences in functional connectivity between the Trp-bearing macaques and the Arg controls under general anesthesia. As the primary threshold increased, a stable set of differing functional connectivity persisted (Extended Data Fig. 7), with the results at the median threshold (t = 3.2) presented in Fig. 5h,i. Among these findings, the majority of increased functional connectivity measures in Trp-bearing monkeys were localized within the frontal lobe (n = 11), while a subset was observed between the frontal lobe and subcortical regions (n = 7) (Fig. 5h). Additionally, we identified 27 connections that displayed a reduction in strength in the Trp-bearing macaques compared to controls, with the majority of these reductions occurring between the frontal lobe and parietal lobe (n = 13) (Fig. 5i). It should be noted that functional connectivity under anesthesia is a measure of correlative firing, not functional efficacy, and increased values can represent correlated slow wave firing. Nevertheless, these data collectively indicated that the macaques carrying the risk allele of DISC1 p.Arg517Trp exhibited alterations in cortical architecture and functional connectivity, which may ultimately contribute to the observed impairment of working memory. As working memory impairment is a contributing symptom to most neuropsychiatric disorders linked to DISC1 mutations, these promising data provide a remarkable bridge across human and macaque species, albeit the number of macaques with DISC1 p.Arg517Trp were relatively small.

The macaque cohort presented here represents one of the most extensive sequencing studies so far performed in rhesus macaques, although our data have primarily been derived from the CRM population. This notwithstanding, we have for the first time incorporated a diverse array of phenotypic data from numerous macaque individuals. The current cohort comprises genomic data from 961 CRMs, supported by 52 hematological, biochemical and anthropometric measurements. Our preliminary analyses indicate that the captive CRMs are a mixture of animals from multiple wild sources, which was consistent with the introduction of wild animals into the colony to avoid potential inbreeding. Together they harbor over 62 million variants (74%) that were previously undetected in the mGAP project¹⁴, thereby demonstrating the distinctness of the CRM and IRM lineages, which serves as a caveat for their use as nonhuman primate models. The higher nucleotide diversity in the CRM cohort was also supported, but our new data with its large sample size and high coverage genomic sequencing indicate that the captive CRMs carry a significantly lower genetic load, and hence are less susceptible to inbreeding compared to mGAP individuals. However, we cannot definitively conclude that captive IRM colonies suffer from serious genomic erosion (i.e. a decrease in average individual fitness) because the requisite genetic information from wild IRM populations is lacking. Future studies incorporating the wild IRM population will be necessary in order to classify the genetic status of captive IRM colonies.

The relatively large sample size of the genomic data obtained enables us to assess the sensitivity of genes to functional variations in non-human primates, thereby enhancing our capacity to discover disease-related genes^67,68. Our results corroborate previous findings performed on large human cohorts^33,35, indicating that genes implicated in core biological processes (e.g., ribosome, spliceosome and proteasome components) belong to the most constrained categories, whereas immune-related genes are the least constrained. Notably, we discovered that human orthologous genes associated with neurological disorders, such as ALS, PD, HD and AD, are also under strong selective constraints (Fig. 2d). This implies that these neural genes are of functional importance and have been conserved in rhesus macaques, making them less tolerant to LoF mutations or detrimental non-synonymous mutations. Our findings therefore provide compelling genetic evidence to support the use of rhesus macaques as a suitable model for studying neurological disease^60,69. Employing a reverse genomic approach, we successfully demonstrated a case arising from a deleterious missense mutation in the macaque DISC1 gene (p.Arg517Trp, Fig. 5), a recognized risk gene for several types of human neuropsychiatric disorder^61,62. Our analyses of behavior, working memory, brain structure and function in these macaques with the risk allele demonstrated the pathogenicity of the mutation. Additionally, we identified hundreds of pLoF variants and non-synonymous nucleotide substitutions that were matched to known human diseases and drug target genes. However, as demonstrated in previous studies^70,71, we cannot be certain that these functional mutations will be associated with an increased susceptibility to certain inherited diseases. Nevertheless, as a complementary approach, forward genomic analyses, coupled with phenotypic data, offer a promising method for understanding these functional mutations, as exemplified by the phenotypic consequences observed in ANO10, PRRC2B, ATR and ALOX15 (Fig. 3c,d and Supplementary Table 7). The naturally occurrence of disease in captive macaques provided a unique resource for establishing non-human primate models for human diseases. With the genome information affiliated prediction, it would be interesting to monitor the onset of spontaneous diseases in these macaques with the pLoF.

Despite these significant observations, several limitations deserve attention. First of all, although we have applied a series of filtering strategies, any given mutation annotated as pLoF may not truly lead to loss of protein function. Therefore, experimental validation such as reverse-transcription PCR of transcript and/or western blot of protein will ultimately be required in order to address this issue. A second limitation is reduced statistical power to establish unambiguous genotype–phenotype correlations if the pLoF is observed in only one or two participants. However, this could be improved if larger sample sizes were employed. Finally, our analysis was limited to readily available phenotypes; in future analyses, a standardized clinical phenotyping protocol would be desirable for each participant. Given the apparent worldwide shortage in experimental monkeys today⁵, the macaque cohort presented here could serve as an invaluable resource with which to obtain new insights into primate-specific biology, guiding the selection of appropriate models for experimental and pharmaceutical tests, facilitating the discovery of new genetic models for human disease research, and further improving and refining the rational genetic management of macaque colonies.

Sample collection and sequencing

We enrolled a total of 919 Chinese rhesus macaques (Supplementary Table 1) that were housed in KIZ, for genomic sequencing during their annual physical checks (normally September or October, outside the breeding season) since 2021. The initial cohort comprised 293 males and 626 females, aged from 3 to 30 years. To ensure that our blood collection did not adversely affect the safety of the monkeys, we extracted a 3–5 ml peripheral blood sample from each individual using conventional intravenous sampling method. One half of each blood sample was used for hematological trait examination while the other half was used for genomic DNA extraction using the QIAGEN® extraction kit. After DNA quality assessment, libraries were prepared following the standard protocol of the DNBseq platform and sequenced to a target depth of ~ 30× per individual, generating about 90 GB sequencing data. All samples were collected in accordance with the policy of the Institutional Animal Care and Use Committee (IACUC) of KIZ, CAS (Approval ID: IACUC-PE-2021-12-001 and IACUC-PE-2022-11-003), which conforms to the regulatory standards for the human care and treatment of animals in research.

Phenotypic data collection

Hematological trait examination was performed using a hematology analyzer (Mindray, BC-5000Vet, China), which recorded 21 standard sets of blood cell traits. We also obtained a number of biochemical and anthropometric body measurements (summarized in Supplementary Tables 5 and 6) during the following year (2022). Prior to biochemical testing, participant animals fasted overnight or at least 6 hours prior to the peripheral blood sample being drawn, and the blood was centrifuged within 60 min of venipuncture. The serum samples were subsequently used to measure the biochemical traits via an automated autoanalyser (Dimension EXL200). For anthropometric body measurements, all individual animals received an intramuscular injection of 5 mg/kg ketamine to ensure sedation on the operating table while the various measurements were being obtained. We took 11 body measurements as well as the body weight for each animal. These measurements were taken following the standardized procedures as described in Supplementary Table 5.

Variant calling and filtration

To explore the genetic ancestry of our sequenced individuals, we additionally included 80 wild CRMs¹⁶ in our cohort. We followed the Genome Analysis Toolkit (GATK) best practices pipeline⁷² to call the variants. Briefly, raw sequence reads were mapped to the reference genome of IRM (Mmul_10)⁷³ using BWA-MEM v0.7.17-r11198⁷⁴ with default parameters. Sambamba⁷⁵ was used to remove multiple aligned, duplicated and unaligned reads. We first obtained the GVCF file for each sample using the HaplotypeCaller function in GATK version 4.1⁷⁶. Then joint calling was performed to generate ‘raw’ variant data via the GenotypeGVCFs function. We used the following hard quality filter criteria (QD < 2.0 || QUAL < 50.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0) for SNPs filtering, and (QD < 2.0 || QUAL < 50.0 || FS > 200.0 || MQ < 40.0 || ReadPosRankSum < -20.0) for Indels filtering, respectively, as suggested by the pipelines. After this, the filtered variant call files were merged together for subsequent quality control.

Variant level quality control. To reduce false positive calls, we removed SNPs occurring in a cluster (more than three SNPs within 10 bp) using the VariantFiltration function in GATK (--cluster-size 3 --cluster-window-size 10) because these tightly spaced SNPs are more likely to result from read mis-alignment. In addition, variants located within 6 bp of predicted indels, presenting in fewer than 80% of individuals, and the approximate read depth exceeded 97.5% or lower than 2.5% of the quantile distribution, were also filtered using BCFtools v1.9⁷⁷. Triallelic alleles were further filtered out in the population genetic analyses (e.g., PCA, STRUCTURE).

Sample level quality control. For quality control of samples, we first removed duplicate samples with kinship coefficient > 0.35 based on the estimations from KING software⁷⁸. Then we removed samples with an excess of heterozygosity calls (inbreeding coefficient < − 0.1) or outlier number of SNPs (> 17,000,000). We also examined whether the self-reported information on gender could be verified by the “check-sex” option implemented in PLINK software (v1.90b6.9)⁷⁹. Finally, having removed samples with high missingness (> 0.05), we retained 961 samples in the final cohort.

Variant annotations

Identification of loss-of-function variants. The effects of filtered variants were annotated and classified by software SnpEff version 4.3⁸⁰ based on the latest rhesus macaque gene build (Mmul_10). The putative loss-of-function (LoF) annotations, e.g., stop gains, stop losses, start losses, frameshifts, splice-disrupting mutations, were extracted and filtered using the accompanying software of SnpSift⁸⁰. We retained those LoF variants that were predicted to affect more than 50% of transcripts and where the nonsense-mediated mRNA decay (NMD tag) occurred within more than half of the transcripts. The LoF variants located within the last 5% of the length of the transcript were filtered out using in-house Perl scripts. Despite these filtering strategies, LoF variants are known to be enriched for annotation artefacts, e.g., exons flanked by non-canonical splice sites or incomplete transcripts^37,38. We utilized LOFTEE³³, a plugin of Ensembl Variant Effect Predictor (VEP)⁵⁸, to filter out the aforementioned LoFs. As LOFTEE is currently only available for the human genome, we utilized the LiftOver function in Picard (v2.23.9) (http://broadinstitute.github.io/picard) to transfer the variants in macaque (Mmul_10) position to the human genome (hg38) based on the overchain file download from the UCSC database. Only the successfully transferred and high-confidence LoF variants (labeled as HC) were then considered as predicted LoF variants (pLoFs) in the following analyses.

Inferring the pathogenicity of missense variants. We used the software of SIFT4G⁵⁷ to predict the deleteriousness of missense variants. Prior to this step, a custom database was built with the genomic annotation file of Mmul_10. The scores of SIFT4G range from 0 to 1, and SNPs are predicted to be deleterious if the score is < 0.05 and tolerated if the score is ≥ 0.05. We also utilized human genome annotation to further infer the potential pathogenicity of missense variants detected in the macaque genome. Again, the LiftOver tool in Picard (v2.23.9) (http://broadinstitute.github.io/picard) was used to transfer the variants in the macaque (Mmul_10) position to the corresponding human coordinates (hg38) based on the overchain file download from the UCSC database. Then, the functional impact of amino acid substitutions was predicted by SIFT and PolyPhen-2 implemented in VEP⁵⁸.

Function enrichment analyses

The web-server g:Profiler⁸¹ was used to explore whether specific types of biological function were over-represented among the discovered genes. The species Macaca mulatta (Rhesus macaque) was selected as the background organism. P-values were adjusted by means of the Benjamini–Hochberg correction algorithm and the terms with false discovery rate (FDR) q < 0.05 were deemed to be significant.

Analyses of genetic ancestries

We performed principal component analysis (PCA) in software GCTA (v. 1.94.0)⁸² to infer the genetic ancestries of the sequenced rhesus macaques. Two sample sets were used: one included the Indian-origin rhesus macaque (mGAP v2.2)¹⁴ in our cohort and one without. The variant data from the mGAP project were filtered in the same manner and then merged with our cohort via BCFtools software⁷⁷. For each sample set, we restricted our analyses to bi-allelic SNPs on autosomes and common variants with MAF above 1%. We further reduced the number of sites by applying a linkage disequilibrium (LD) pruning filter using PLINK v1.90b6.9 (–indep-pairwise 50 5 0.1)⁷⁹. We also used Frappe 1.1 (EM algorithm)⁸³ to infer the individual ancestries. The postulated number of ancestral clusters (K) was set to range from 2 to 6, and the maximum number of EM iterations was set to 10,000.

Analyses of genetic diversity and genetic load

The level of nucleotide diversity (π) was estimated in a 50-kb sliding-window size with no step using VCFtools (v0.1.17)⁸⁴. However, estimating genetic load is challenging without information on the fitness effects of deleterious mutations. An alternative approach is to estimate changes in mutational load (i.e., number of deleterious mutations)⁸⁵. For the CRM cohort in this study and mGAP cohort, we calculated the ratio of the number of derived homozygous LoF variants to homozygous derived synonymous variants, as well as the number of homozygous derived missense variants to homozygous derived synonymous variants for each individual, respectively, based on the annotation of SnpEff results (version 4.3)⁸⁰. Since no ancestral allele information is available for macaques, we followed the example of a previous study in adopting the minor allele as the derived allele⁸⁶.

Association analyses with rare pLoFs

From the list of high confidence rare LoF mutations identified above, we sought to determine whether any of the pLoF variants was associated with phenotypic trait variation. We employed a mixed linear model-based association analysis (GCTA-MLMA)^87,88 for each pLoF–trait pairing. Quantitative traits were inverse normalized and age, sex and the first four ancestral clusters of FRAPPE results were used as covariates. To reduce the likelihood of false positives, we only considered the pLoF–trait pairs in which there were at least three LOF alleles genotyped, yielding 1,767 (2373) pLoF–trait pairs for analysis. After Bonferroni correction, we considered 2.83×10^− 5 (0.05/1,767) as a threshold of significance.

Phenotype data processing and GWAS analyses

In order to focus on determinants of variation in the general population rather than on specific diseases, each quantitative trait was filtered those data over seven standard deviations of the mean value prior to GWAS analysis. Subsequently, the filtered trait data were standardized by rank-based inverse normal transformation (INT) using in-house R scripts. Genotype data were further filtered to exclude variants with a missing genotype rate greater than 0.02, minor allele frequency (MAF) less than 0.01, and deviation from Hardy-Weinberg equilibrium smaller than 1 x 10^− 6, leaving 32,588,339 autosomal alleles for downstream analysis. After that, GWAS analyses were performed using the mixed linear model with the option of leaving one chromosome out (--mlma-loco) implemented in GCTA software⁸² for each quantitative trait. This GCTA-LOCO approach⁸⁷ provides a more robust association estimate by employing a genetic relatedness matrix (GRM) to account for genomic relationships, and the Leave One Chromosome Out (LOCO) method to control for proximal contamination⁸⁹. The data were adjusted for covariates including age, sex and the first four ancestral clusters from FRAPPE results. We further employed a deep neural network of DeepNull⁹⁰ to model and account for potential non-linear or interactive effects among phenotypic data and their covariates. This method allows one to control for type I errors while enhancing phenotypic prediction⁹⁰. The genome-wide significance thresholds (5.13 × 10^− 8) were determined using a uniform threshold of 1/n, where n is the effective number of independent variants calculated using the Genetic type 1 Error Calculator (v.0.2)⁹¹. The proportion of variance in the phenotype explained by a given SNP (PVE) was estimated using the formula from⁹².

Behavioral and brain imaging experimentation on CRMs with DISC1 mutation p.Arg517Trp)

Animals. We identified 3 adult samples (ages 5 to 7 years, two male and one female) and 5 elderly samples (ages > 19 years, all female) harboring the homozygous missense mutations (p.Arg517Trp) in the cohort. Considering the old age of some of the monkeys, and our inability to eliminate the potential influence of aging on the results obtained, we performed the behavioral and brain imaging detection specifically on the three younger adult samples. All animal experimental procedures were approved by the Institutional Animal Care and Use Committee (IACUC) of KIZ, CAS (IACUC-PE-2022-07-001).

Behavioral experiments. We first estimated the neurological function of 3 homozygous carriers vs. 19 non-carriers using a neurological deficit score developed in our previous study (Supplementary Table 10)⁹³. This scoring system assigned points to three aspects of neurological function: the motor system (16 points), skeletal muscle coordination (9 points) and the sensory system (25 points), totaling a maximum of 50 points. A score of 0 indicated normal behaviors whereas higher scores reflected neurological deficits. Next, we performed a spatial working memory test using the WGTA (Wisconsin General Test Apparatus) that modified from our previous studies^94,95. Considering the significant amount of time required for the training and experimental stages, we selected three non-carriers, who were of similar age and gender as the controls. Briefly, the macaque was allowed to choose food (e.g., peanut) from one of the two covered wells with six time delays (0s, 6s, 12s, 18s, 24s, 30s; Fig. 5a). The delays were semi-randomly distributed over the trials with totaling 36 trials conducted in one session. We performed one session per day for each macaque and 10 sessions were performed. To investigate the spatial working memory under stress, restraint stress was performed by fixing the macaque in a narrow space in its home cage for 30 min, then working memory was tested immediately after the stress. The next session was conducted after a recovery interval of at least three days when the macaque attained the average performance level without stress (6s). Three trials were performed for each macaque under stress. The inhibition of working memory was obtained using the formula of ((Pre - Post stress)/(Pre + Post stress))×100. Differences of the behavioral performance were estimated by unpaired t-test.

Brain imaging. Magnetic resonance imaging (MRI) and resting state functional MRI (rs-fMRI) data were acquired with a 3.0 T UMR790 MRI scanner (United Imaging, Shanghai, China) at KIZ. T1-weighted images were acquired using a 3D T1-weighted fast spoiled gradient echo (gre_fsp) sequence (voxel size = 0.5 mm isotropic, TE = 5.6 ms, TR = 13.01 ms, flip angle: 8°) and T2-weighted images were acquired using a fse_mx sequence (voxel size = 0.5 mm isotropic, TE = 396.48 ms, TR = 3400 ms, flip angle: 59°) by using a 12-channel head coil. The structural data were processed using Analysis of Functional NeuroImages software (AFNI)⁹⁶, FMRIB Software Library (FSL)⁹⁷, Advanced Normalization Tools (ANTs)⁹⁸ and FreeSurfer⁹⁹ (see details in Supplementary materials). Rs-fMRI images were collected using an echo planar imaging (EPI) sequence (voxel size = 1.5 mm isotropic, TE = 29 ms, TR = 1700 ms, flip angle: 80°). During rs-fMRI scanning, macaques were placed under the general anesthesia, similar to structural imaging, to alleviate stress and minimize motion artifacts. Note that resting-state functional activity is an inherent characteristic of the brain, observed in both humans and macaques, even under anesthesia^100,101. The rs-fMRI data preprocessing was performed using the workflow outlined in previous study¹⁰² (see details in Supplementary materials).

Quantification and statistical analysis

Mann-Whitney U test was used to compare the phenotype difference between the pLoF allele carriers and non-carriers. Two-tailed Student’s t test were used to determine the significance of behavioral difference between DISC1 (p.Arg517Trp) carriers and controls. Structural difference at the global, lobe, and region levels were conducted under Generalized Linear Mixed Models (GLMMs), using Hemisphere as the random factor, and all structural data were corrected with the intracranial volume of the corresponding hemisphere. Other statistical analyses can be found in the relevant sections of the method details, also have given in figure legends and supplementary tables.

Data availability

All data needed to evaluate the conclusions in the paper are present in either the paper and/or the Supplementary Materials. The raw whole genomic sequencing data generated in this study have been deposited in the Genome Warehouse (GWH) database under accession number CRA014717.

Code availability

No specific custom codes were developed in this study. All commands and pipelines used for data analyses were conducted according to the manuals or protocols provided by the corresponding software development team, which are described in detail in the Methods section. Default parameters were employed if no detailed parameters were mentioned for the software used in this study.

Acknowledgments

We thank Amy F. T. Arnsten for invaluable input and discussions. We also thank the Core Technology Facility, KIZ, CAS, for providing us with MRI services. This work was funded by the National Key Research and Development Program of China (2022YFF0710900), the STI2030-Major Projects (2021ZD0200901, 2022ZD0205100, 2021ZD0203900, 2021ZD0200200, and 2021ZD0204200), Yunnan Province (202305AH340006), and the CAS Light of West China Program (xbzg-zdsys-202213).

Author contributions

D.D.W., J.W., N.L. and Y.G.Y. conceived and supervised the project; X. You, Yijiang Li, Y. Wang, X. Yu, M.M.Y., and L.L. collect the blood sample; B.L.Z., Y.C., Yi Zhang, Y. Lu, Yijiang Li, W.X., and H.D. performed anthropometric body measurements; Q.W. and Y. Wang performed hematological and biochemical measurements; Yali Zhang, Yanling Li, H.D.H., and J.W. performed behavioral experiments and statistical analyses; Yali Zhang, Y.Q., M.H.Q., N.H.C., and N.L. performed brain imaging and statistical analyses; P.Z. provided the macaque monkey for behavioral and brain imaging test; B.L.Z., Y.C., and Y. Wu performed genetic association analyses; B.L.Z. and Y.C. performed the overall analysis; B.L.Z and D.D.W. wrote the original draft; Y. Wu, D.N.C., and Y.G.Y. reviewed and edited the paper; all authors discussed the results and commented on the manuscript.

Declaration of interests

The authors declare no competing interests.

Chiou KL et al (2020) Rhesus macaques as a tractable physiological model of human ageing. Phil Trans R Soc B 375:20190612
Gardner MB, Luciw PA (2008) Macaque models of human infectious disease. ILAR J 49, 220 – 55
Pan MT, Zhang H, Li XJ, Guo XY (2024) Genetically modified non-human primate models for research on neurodegenerative diseases. Zool Res 45:263–274
Yuan LZ et al (2021) SARS-CoV-2 infection and disease outcomes in non-human primate models: advances and implications. Emerg Microbes Infect 10:1881–1889
Tian CY (2021) China is facing serious experimental monkey shortage during the COVID-19 lockdown. J Med Primatol 50:225–227
Rogers J (2022) Genomic resources for rhesus macaques (Macaca mulatta). Mamm Genome 33:91–99
Sanchez-Roige S, Palmer AA (2020) Emerging phenotyping strategies will advance our understanding of psychiatric genetics. Nat Neurosci 23:475–480
Timpson NJ, Greenwood CMT, Soranzo N, Lawson DJ, Richards JB (2018) Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat Rev Genet 19:110–124
Brekke TD, Steele KA, Mulley JF (2018) Inbred or outbred? Genetic diversity in laboratory rodent colonies. G3-Genes Genomes Genet 8:679–686
Tosi AJ, Morales JC, Melnick DJ (2003) Paternal, maternal, and biparental molecular markers provide unique windows onto the evolutionary history of macaque monkeys. Evolution 57:1419–1435
Roos C, Zinner D The nonhuman primate in nonclinical drug development and safety assessment. in Diversity and evolutionary history of macaques with special focus on Macaca mulatta and Macaca fascicularis (eds. Joerg, B., Sven, K., Emanuel, S. & Gerhard, F.W.) 3–16 (Elsevier, 2015)
Morales JC, Melnick DJ (1998) Phylogenetic relationships of the macaques (Cercopithecidae: Macaca), as revealed by high resolution restriction site mapping of mitochondrial ribosomal genes. J Hum Evol 34:1–23
Srikulnath K, Ahmad SF, Panthum T, Malaivijitnond S (2022) Importance of Thai macaque bioresources for biological research and human health. J Med Primatol 51:62–72
Bimber BN, Yan MY, Peterson SM, Ferguson B (2019) mGAP: the macaque genotype and phenotype resource, a framework for accessing and interpreting macaque variant data, and identifying new models of human disease. BMC Genomics 20:176
Xue C et al (2016) The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences. Genome Res 26:1651–1662
Liu ZJ et al (2018) Population genomics of wild Chinese rhesus macaques reveals a dynamic demographic history and local adaptation, with implications for biomedical research. Gigascience 7:giy106
Wu RF et al (2023) Landscape genomics analysis provides insights into future climate change-driven risk in rhesus macaque. Sci Total Environ 899:165746
Yao YG, Facility KP (2022) Towards the peak: The 10-year journey of the National Research Facility for Phenotypic and Genetic Analysis of Model Animals (Primate Facility) and a call for international collaboration in non-human primate research. Zool Res 43:237–240
Tarantino LM, Eisener-Dorman AF (2012) Forward genetic approaches to understanding complex behaviors. Curr Top Behav Neurosci 12:25–58
Argmann CA, Dierich A, Auwerx J (2006) Uses of forward and reverse genetics in mice to study gene function. Curr. Protoc. Mol. Biol. Chapter 29, Unit 29A 1
Alonso JM, Ecker JR (2006) Moving forward in reverse: genetic technologies to enable genome-wide phenomic screens in Arabidopsis. Nat Rev Genet 7:524–536
Lehner B (2013) Genotype to phenotype: lessons from model organisms for human genetics. Nat Rev Genet 14:168–178
Takahashi JS, Pinto LH, Vitaterna MH (1994) Forward and reverse genetic approaches to behavior in the mouse. Science 264:1724–1733
Adams DJ, van der Weyden L (2008) Contemporary approaches for modifying the mouse genome. Physiol Genomics 34:225–238
Lawson ND, Wolfe SA (2011) Forward and reverse genetic approaches for the analysis of vertebrate development in the zebrafish. Dev Cell 21:48–64
Adams MD, Sekelsky JJ (2002) From sequence to phenotype: reverse genetics in Drosophila melanogaster. Nat Rev Genet 3:189–198
He Y et al (2019) Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants. Nat Commun 10:4233
Yan G et al (2011) Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol 29:1019–1023
Kardos M et al (2021) The crucial role of genome-wide genetic variation in conservation. Proc. Natl. Acad. Sci. USA 118
Bertorelle G et al (2022) Genetic load: genomic estimates and applications in non-model animals. Nat Rev Genet 23:492–503
Halldorsson BV et al (2022) The sequences of 150,119 genomes in the UK Biobank. Nature 607:732–740
Cong PK et al (2022) Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project. Nat Commun 13:2939
Karczewski KJ et al (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581:434–443
Yang Z, Nielsen R (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46:409–418
Lek M et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285 – 91
MacArthur DG, Tyler-Smith C (2010) Loss-of-function variants in the genomes of healthy humans. Hum Mol Genet 19:R125–R130
MacArthur DG et al (2012) A systematic survey of loss-of-function variants in human protein-coding genes. Science 335:823–828
Saleheen D et al (2017) Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature 544:235–242
Steux C, Szpiech ZA (2024) The maintenance of deleteriousv variation in wild Chinese rhesus macaques. Genome Biol Evol 16:evae115
Blake JA et al (2021) Mouse Genome Database (MGD): Knowledgebase for mouse-human comparative biology. Nucleic Acids Res 49:D981–D987
Kleinridders A et al (2009) PLRG1 Is an Essential Regulator of Cell Proliferation and Apoptosis during Vertebrate Development and Tissue Homeostasis. Mol Cell Biol 29:3173–3185
Liao BY, Zhang J (2008) Null mutations in human and mouse orthologs frequently result in different phenotypes. Proc. Natl. Acad. Sci. USA 105, 6987-92
Xu L et al (2016) Loss of RIG-I leads to a functional replacement with MDA5 in the Chinese tree shrew. Proc. Natl. Acad. Sci. USA 113, 10950-5
Duggal P, Gillanders EM, Holmes TN, Bailey-Wilson JE (2008) Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies. BMC Genomics 9:516
Vermeer S et al (2010) Targeted next-generation sequencing of a 12.5 Mb homozygous region reveals ANO10 mutations in patients with autosomal-recessive cerebellar ataxia. Am J Hum Genet 87:813–819
Nanetti L et al (2019) ANO10 mutational screening in recessive ataxia: genetic findings and refinement of the clinical phenotype. J Neurol 266:378–385
Jacobo-Baca G et al (2022) Proteomic profile of preeclampsia in the first trimester of pregnancy. J Matern-Fetal Neo M 35:3446–3452
Alderton GK et al (2004) Seckel syndrome exhibits cellular features demonstrating defects in the ATR-signalling pathway. Hum Mol Genet 13:3127–3138
Benatzy Y, Palmer MA, Brune B (2022) Arachidonate 15-lipoxygenase type B: Regulation, function, and its role in pathophysiology. Front Pharmacol 13:1042420
Huang J et al (2022) Genomics and phenomics of body mass index reveals a complex disease network. Nat Commun 13:7973
Tachmazidou I et al (2017) Whole-genome sequencing coupled to imputation discovers genetic signals for anthropometric traits. Am J Hum Genet 100:865–884
Li L et al (2023) Interactions between genetic variants and environmental risk factors are associated with the severity of pelvic organ prolapse. Menopause 30:621–628
Sollis E et al (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 51:D977–D985
Thompson EC et al (2007) Ikaros DNA-binding proteins as integral components of B cell developmental-stage-specific regulatory circuits. Immunity 26, 335 – 44
Consortium EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74
Whitfield JB et al (2019) Biomarker and genomic risk factors for liver function test abnormality in hazardous drinkers. Alcohol Clin Exp Res 43:473–482
Vaser R, Adusumalli S, Leng SN, Sikic M, Ng PC (2016) SIFT missense predictions for genomes. Nat Protoc 11:1–9
McLaren W et al (2016) The ensembl variant effect predictor. Genome Biol 17:122
Park JE, Silva AC (2019) Generation of genetically engineered non-human primate models of brain function and neurological disorders. Am J Primatol 81
Capitanio JP, Emborg ME (2008) Contributions of non-human primates to neuroscience research. Lancet 371:1126–1135
Duff BJ, Macritchie KAN, Moorhead TWJ, Lawrie SM, Blackwood DHR (2013) Human brain imaging studies of DISC1 in schizophrenia, bipolar disorder and depression: a systematic review. Schizophr Res 147:1–13
Hodgkinson CA et al (2004) Disrupted in schizophrenia 1 (DISC1): association with schizophrenia, schizoaffective disorder, and bipolar disorder. Am J Hum Genet 75:862–872
Cannon TD et al (2005) Association of DISC1/TRAX haplotypes with schizophrenia, reduced prefrontal gray matter, and impaired short- and long-term memory. Arch Gen Psychiat 62:1205–1213
Perlstein WM, Carter CS, Noll DC, Cohen JD (2001) Relation of prefrontal cortex dysfunction to working memory and symptoms in schizophrenia. Am J Psychiat 158:1105–1113
Gamo NJ et al (2013) Role of disrupted in schizophrenia 1 (DISC1) in stress-induced prefrontal cognitive dysfunction. Transl Psychiatry 3:e328
Arnsten AF (2009) Stress signalling pathways that impair prefrontal cortex structure and function. Nat Rev Neurosci 10:410–422
Fiziev PP et al (2023) Rare penetrant mutations confer severe risk of common diseases. Science 380:eabo1131
Gao H et al (2023) The landscape of tolerated genetic variation in humans and primates. Science 380:eabn8153
Passingham R (2009) How good is the macaque monkey model of the human brain? Curr Opin Neurobiol 19:6–11
Li J et al (2018) Comparative genome-wide survey of single nucleotide variation uncovers the genetic diversity and potential biomedical applications among six Macaca species. Int J Mol Sci 19
Pritchard JK (2001) Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet 69:124–137
Van der Auwera GA et al (2013) From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinf 43(10 1–11 10):33
Warren WC et al (2020) Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility. Science 370
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303:3997
Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P (2015) Sambamba: fast processing of NGS alignment formats. Bioinformatics 31:2032–2034
McKenna A et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303
Li H et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079
Manichaikul A et al (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26:2867–2873
Purcell S et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575
Cingolani P et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6:80–92
Reimand J, Kull M, Peterson H, Hansen J, Vilo J (2007) g:Profiler–a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res 35:W193–200
Yang J, Lee SH, Goddard ME, Visscher PM (2013) Genome-wide complex trait analysis (GCTA): methods, data analyses, and interpretations. Methods Mol Biol 1019:215–236
Tang H, Peng J, Wang P, Risch NJ (2005) Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol 28:289–301
Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158
von Seth J et al (2021) Genomic insights into the conservation status of the world's last remaining Sumatran rhinoceros populations. Nat Commun 12
Zhu Q et al (2011) A genome-wide comparison of the functional properties of rare and common genetic variants in humans. Am J Hum Genet 88:458–468
Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46:100–106
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82
Cheng R, Parker CC, Abney M, Palmer AA (2013) Practical considerations regarding the use of genotype and pedigree data to model relatedness in the context of genome-wide association studies. G3 (Bethesda) 3:1861–1867
McCaw ZR et al (2022) DeepNull models non-linear covariate effects to improve phenotypic prediction and association power. Nat Commun 13
Li MX, Yeung JM, Cherny SS, Sham PC (2012) Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum Genet 131:747–756
Shim H et al (2015) A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS ONE 10:e0120758
Yang L et al (2020) Extracellular vesicle-mediated delivery of circular RNA SCMH1 promotes functional recovery in rodent and nonhuman primate ischemic stroke models. Circulation 142:556–574
Wang JH et al (2013) Interactive effects of morphine and dopaminergic compounds on spatial working memory in rhesus monkeys. Neurosci Bull 29:37–46
Zhang B et al (2019) Chronic phencyclidine treatment impairs spatial working memory in rhesus monkeys. Psychopharmacology 236:2223–2232
Cox RW (1996) AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res 29:162–173
Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM, Fsl (2012) Neuroimage 62, 782 – 90
Avants BB, Tustison N, Song G (2009) Advanced normalization tools (ANTS). Insight j 2:1–35
Fischl B, FreeSurfer (2012) Neuroimage 62, 774 – 81
Vincent JL et al (2007) Intrinsic functional architecture in the anaesthetized monkey brain. Nature 447:83–86
Larson-Prior LJ et al (2009) Cortical network functional connectivity in the descent to sleep. Proc. Natl. Acad. Sci. USA 106, 4489-94
Jo HJ et al (2013) Effective preprocessing procedures virtually eliminate distance-dependent motion artifacts in resting state FMRI. J Appl Math 2013

There is NO Competing Interest.

Supplementarytables.xlsx
Supplementary table 1-11
Supplementalinformation.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Forward and reverse genomic screens enhance the understanding of phenotypic variation in a large Chinese rhesus macaque cohort

Status:

Version 1

Abstract

Figures

Introduction

Results

Genetic ancestry and status of the CRM cohort

Variant annotation and mutational profiling

Loss of function (LoF) variants and association with phenotypes

Genome-wide association for 52 phenotypes in CRMs

Discussion

Methods

Sample collection and sequencing

Phenotypic data collection

Variant calling and filtration

Variant annotations

Function enrichment analyses

Analyses of genetic ancestries

Analyses of genetic diversity and genetic load

Association analyses with rare pLoFs

Phenotype data processing and GWAS analyses

Quantification and statistical analysis

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1