Whole-genome Sequencing Reveals De-novo Mutations Associated with Nonsyndromic Cleft Lip/Palate

doi:10.21203/rs.3.rs-1064924/v1

Download PDF

Research Article

Whole-genome Sequencing Reveals De-novo Mutations Associated with Nonsyndromic Cleft Lip/Palate

https://doi.org/10.21203/rs.3.rs-1064924/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

The majority (85%) of nonsyndromic cleft lip with or without cleft palate (nsCL/P) cases occur sporadically, suggesting a role for de novo mutations (DNMs) in the etiology of nsCL/P. To identify high impact DNMs that contribute to the risk of nsCL/P, we conducted whole genome sequencing (WGS) analyses in 130 African case-parent trios (affected probands and unaffected parents). We identified 162 high confidence protein-altering DNMs that contribute to the risk of nsCL/P. These include novel loss-of-function DNMs in the ACTL6A, ARHGAP10, MINK1, TMEM5 and TTN genes; as well as missense variants in ACAN, DHRS3, DLX6, EPHB2, FKBP10, KMT2D, RECQL4, SEMA3C, SEMA4D, SHH, TP63, and TULP4. Experimental evidence showed that ACAN, DHRS3, DLX6, EPHB2, FKBP10, KMT2D, MINK1, RECQL4, SEMA3C, SEMA4D, SHH, TP63, and TTN genes contribute to facial development and mutations in these genes could contribute to CL/P. Association studies have identified TULP4 as a potential cleft candidate gene, while ARHGAP10 interacts with CTNNB1 to control WNT signaling. DLX6, EPHB2, SEMA3C and SEMA4D harbor novel damaging DNMs that may affect their role in neural crest migration and palatal development. This discovery of pathogenic DNMs also confirms the power of WGS analysis of trios in the discovery of potential pathogenic variants.

Scientific Communication

Health Economics & Outcomes Research

Nonsyndromic clefts of the lip with or without cleft palate (nsCL/P) represent one of the most common types of birth defect in humans and the most common of the craniofacial region¹. These are developmental malformation resulting from the failure of the well-coordinated complete fusion of the facial prominences during embryogenesis². Based on a lack of other anatomical structures being affected, these birth defects are classified as nonsyndromic cleft lip only (nsCLO), nonsyndromic cleft lip and palate (nsCLP) and nonsyndromic cleft palate only (nsCPO).The combined global prevalence of orofacial clefts (OFC) is reported to be 1 in 700 livebirths³.

Associated impairments due to these malformations include feeding problems, speech defects, malocclusion, and esthetics problems. Studies have reported a significant increase in overall mortality in people with OFC⁴. Families of affected individuals are often stigmatized and have reported huge burdens on their financial, psychological and social well-being ⁵. Effective management involves a team of specialists who conduct surgical repair of the defect and manage challenges in dental, speech and psychology ⁶⁷. Due to the need for multi-disciplinary expertise over the life course from birth to adulthood, the cost of management and the negative impact on oral health-related quality of life, OFC poses a huge public health burden.

Genetic and environmental factors have been shown to contribute to the risk of nsCL/P at the population level. Despite extensive genome-wide association studies (GWAS) which have identified about 60 common risk loci, ~75% of the estimated heritability of liability to nsCL/P remains unexplained^8–13. The contribution of rare coding variants has also been investigated to identify this missing heritability^14–16. However, a significant knowledge gap still remains in our understanding of the genetics of nsCL/P. OFCs can either be sporadic or familial and the sporadic cases suggest a role for de novo mutations (DNMs)¹⁷. A few studies have examined the role of these DNMs in the etiology of nsCL/P through targeted sequencing analysis of candidate genes in affected families^18,19. With the advent of next-generation sequencing, discovery of DNMs that may contribute to the risk of nsCL/P has yielded more positive results. The first large scale whole-genome sequencing (WGS) study reported an enrichment of DNMs in multi-ethnic nsCL/P case-parent trios of European, Colombian, or Taiwanese ancestry²⁰. However, the role of these DNMs is yet to be studied on the African continent which has populations with the most genetic diversity and provides opportunity for novel findings²¹.

Using the WGS data from nsCL/P African case-parent trios generated as part of the Gabriella Miller Kids First (GMKF) Pediatric Research Consortium, here we investigated the role of high impact DNMs and here we identify some that could increase the risk of nsCL/P.

Samples and Variants Filtration

Following our deep phenotyping and samples recruitment through the African Craniofacial Anomalies Network (AfriCRAN)¹⁰, 150 case-parent trios ( i.e., each trio consists an affected child and unaffected parents as depicted in Figure 1A) were selected for whole-genome sequencing at the Broad Institute. The quality control (QC) process checked for completeness of the sequenced genomes, Mendelian errors and relatedness; this resulted in dropping 20 trios. The variants from the remaining 130 case-parent trios were filtered for high quality by ensuring that each had a genotype quality of at least 20 (GQ ≥ 20) and read depth of at least 10 (RD ≥ 10). These high-quality variants were further filtered to identify those predicted to have a high impact on the gene product and present in the case but not the parents, i.e., a de novo mutation (DNM) (MAF < 1%; loss-of-function (LOF) and missense consequences) (Figure 1C). Details of the number of variants from each data filtering steps are shown in Figure 2A. This resulted in the identification of 162 DNMs (10 LOF; 152 missense) (Figure 2B).

Novel DNMs contribute to the risk of nsCL/P

We found 162 DNMs (Supplementary Table 1) 17 of them in genes recognized to play roles in craniofacial development and/or contribute to the risk of nsCL/P (Table 2). Copy number variations (CNVs) involving these genes indicate that these genes could well be involved in craniofacial morphogenesis (Table 2). Mouse knockouts of Acan,Dhrs3, Kmt2d, Recql4, Shh and Tp63 showed orofacial cleft phenotypes. The remaining genes: ACTL6A, ARHGAP10, FKBP10, MINK1, TMEM5, TTN and TULP4 lack mouse knockout models to support their involvement in OFCs but other craniofacial dysmorphologies have been reported. Knockout of Actl6a in mice was reported to be embryonically lethal as the mice did not survive beyond developmental stage E6.5²².

The DNM in TTN was found in the exon 49 which changed the codon for the amino acid Arginine to a stop codon at position 4738. The consequence of this premature stop codon is a truncation in the polypeptide chain and hence result in a loss of function of the gene product. Similar mutation consequences were found in the ARHGAP10 and MINK1 genes. However, the LOF mutations in ACTL6A and TMEM5 are due to an initiator codon and splice donor variants, respectively. The missense mutation in DHRS3 changes a Serine amino acid to leucine at position 37. This position falls in the DHRS3 catalytic domain which is critical for its enzymatic function in vitamin A metabolism. The mutation in TP63 lies in the highly conserved sterile alpha motif (SAM) domain which is critical for protein-protein interaction of the molecule. Interestingly, the DNM in SHH, p.Ser362Leu has been reported to cause holoprosencephaly type3 (OMIM #142945) whose clinical presentations can include midline facial defects with cleft as a feature.

Bioinformatics analysis showed pathogenicity of identified DNMs

The pathogenicity of these DNMs was predicted using in-silico tools to investigate the effect of the DNMs on gene expression, protein structures and functions. We used Combined Annotated Dependent Depletion (CADD) tool to ascertain the deleteriousness of the nucleotide changes that caused the DNMs. Additionally, we used Sorting Intolerant From Tolerant (SIFT) and Polymorphism Phenotyping (PolyPhen2) tools to predict how damaging the amino acid changes (resulting from the DNMs) are to the protein; and finally used HOPE to identify the effect of the amino acid changes on the protein structure and function. DNMs in the ACTL6A, ARHGAP10, MINK1, TMEM5 and TTN were predicted to be among the top 0.1% most deleterious mutations in the human genome (see Table 2 for CADD scores). The DNMs in the DHRS3, SHH, TP63 and TULP4 genes were predicted to be among the top 1% most deleterious mutations in the human genome (see Table 2 for CADD scores) while the other genes (DNMs in ACAN, FKBP10, KMT2D and RECQL4) are among the top 10% deleterious mutations (Table 2). Among the missense DNMs, only variants in ACAN and FKBP10 were predicted by both SIFT and PolyPhen2 to be well tolerated and are benign. Other missense DNMs were predicted by at least one of the two in-silico tools to be deleterious or damaging (Table 2).

Our analysis of the effect of the amino acid changes due to these missense DNMs showed that the resulting alterations in the protein structures could impact the functions of the molecules encoded by the mutated genes. Notably, the secondary structures and the protein interactions are affected (Supplementary Figure 1). The amino acid changes identified in DHRS3, DLX6, SEMA3C, SEMA4D, SHH and TP63 occur at highly conserved region and/or critical domains (Supplementary Figure 1).

The SHH mutation p.Ser362Leu we discovered lies in a highly conserved region of the protein; the serine residue is the only amino acid at this location in available vertebrate orthologs going back to Zebrafish. This DNM occurs within the Hedgehog (Hh) domain located at the carboxyl terminal of the polypeptide. This region is important for protein auto processing, thus modifying of the N terminal of the protein which is critical for the protein interactions which mediates Hh signaling^23,24. Thermodynamic analysis showed the value of folding free energy change (∆∆G) associated with this p.Ser362Leu variant to be 4.984 kcal/mol with a standard deviation of 0.221. The folding free energy change measures the protein stability resulting from the amino acid change due to this mutation. The mutation causes a positive folding free energy change (∆∆G) which indicates that it is destabilizing the protein structure ²⁵ (Figure 3).

Palate development and neural crest migration biological processes are significantly disrupted by DNMs

Our gene set enrichment analysis (GSEA) identified significantly enriched processes (p < 0.05) involved in normal development of the lip and palate (Figure 4). Among these biological processes showing at least nominal significant enrichment, we identified palate development and neural crest migration (p values are 0.02 and 0.04; respectively). These biological processes have been causally linked to the etiopathogenesis of OFCs as disruption could manifest as craniofacial dysmorphology such as OFC.

Among our list of prioritized genes, the genes known to play significant roles in the development of the palate are DHRS3, DLX6, EPHB2 and SHH, while those contributing to neural crest migration include SEMA3C, SEMA4D and SHH. This GSEA analysis identified DLX6, EPHB2, SEMA3C and SEMA4D as potential cleft candidate genes.

SysFACE analysis informs on gene expression in facial tissue development

Using Systems tool for craniofacial expression-based gene discovery (SysFACE), we found that several genes with DNMs exhibit expression in mouse facial tissue development and are likely to contribute to the development of lip and palate(Figure 5 and Supplementary Figure 2). The expression profiles showed that except for that of TULP4 (whose ortholog was not detected in mouse), all the genes in Table 2 were found to be significantly expressed in several facial tissues (Figure 5A). Furthermore, majority of these candidates exhibit elevated expression in the E10.5 maxillary columnar epithelium compared to the E10.5 mandibular columnar epithelium (Figure 5B). Similarly, majority of these candidates showed elevated expression in E10.5 maxillary arch compared to E10.5 mandibular arch (Figure 5C). Several genes showed highest expression in the palate tissue (Figure 5A). Moreover, SysFACE identified several other candidate genes that were expressed in mouse facial development (Supplementary Figure 2).

To our knowledge, this case-parent trios’ analysis of the entire genome to identify high impact DNMs that could contribute to the risk of nsCL/P is a first of such analysis in an African population. Our approach utilized a trio-based study design and analysis to identify potentially pathogenic variants that can explain sporadic cases of nsCL/P. Our analysis also incorporated data from publicly available databases to identify those genes that could play a role in craniofacial development in animal models. We identified several high impact LOF and missense DNMs that appear to contribute to the risk of nsCL/P. Additionally, we used thermodynamic analysis to investigate the effect of the amino acid change on the protein stability.

The LOF DNMs were found in ACTL6A, ARHGAP10, MINK1, TMEM5 and TTN genes, all of which are loci with substantial annotation, functional and/or animal model data supporting their roles in orofacial clefts. ACTL6A encodes an actin-related protein involved in chromatin remodeling and knockout mice do not survive beyond E6.5²². ARHGAP10 is a member of the Rho GTPase family that is important in cell adhesion, migration and proliferation. This rho GTPase regulates WNT signaling by reducing the expression of CTNNB1²⁶. The dysregulation of WNT signaling has been well reported in the etiology of cleft²⁷. MINK1 encodes misshapen-like kinase 1 which functions in cell-cell adhesion and migration. Mutant mice showed an abnormal tooth morphology indicating this gene plays some role in craniofacial development²². TMEM5 encodes a transmembrane protein and mutations have been implicated in neural tube defects with some affected individuals presenting with clefts²⁸, further GWAS found association between common SNPs in this gene family and CL/P⁸. Titin protein (encoded by TTN) is the largest protein molecule that plays a role in the development of the striated muscles. Genetic mutations in this gene cause congenital titinopathy: a birth defect characterized by myopathies (with cardiomyopathy)²⁹. Cleft palate has also been reported in some individuals with this birth defect³⁰. Mutant mice showed an increased apoptosis of cells in the frontonasal process, an important tissue that contribute to the development of the lip and palate and could easily contribute to development of OFCs³¹. Our expression analysis using the SysFACE tool provides additional evidence suggesting some role for the DNMs in ACTL6A, ARHGAP10, MINK1 and TMEM5 in the etiopathogenesis of nsCL/P. However, experimental evidence in model animals would confirm the role of these genes in the development of the lip and palate; and may shed light on these DNMs in the etiology of cleft.

Our analysis also found several missense DNMs in genes recognized as contributing to the risk of nsCL/P. We found a damaging mutation in DHRS3, which encodes an enzyme important in the metabolism of retinol. The DNM we identified is in the catalytic domain which is critical for enzymatic function. Mouse knockout models for this gene resulted in the cleft palate phenotype seen at E14.5³². Here, we report a damaging missense DNM in TULP4 which encodes tubby-related protein 4 which functions in post-translation modification. This gene has only been reported in association studies where it was found to be associated with orofacial cleft in Filipinos and in Africans (Ethiopia, Ghana and Nigeria)^33,34. We also identified damaging missense DNMs in cleft candidate genes SHH and TP63. The damaging mutation in TP63 lies within a highly conserved sterile alpha motif (SAM) domain. Damaging mutations within this SAM domain have been reported to cause clefts^35,36 and genome-wide approaches found significant association between nsCL/P and common SNPs within TP63^9,37,38. The damaging DNM discovered in SHH gene (p.Ser362Leu) in this study has been previously reported as the cause of a syndromic cleft(OMIM #142945) ³⁹. Using computational methods, we determined that the mutation significantly affects the protein’s stability and predicted to be disease-causing²⁵. Following this discovery, a detailed review of medical records of the case carrying this DNM was done but we found no evidence of any other structural birth defect. This suggests that this mutation may contribute to the risk of syndromic as well as nonsyndromic clefts.

Other genes which have damaging DNMs identified in our study include KMT2D and RECQL4. KMT2D encodes a methyl transferase which functions in transcription activation. It has been reported to be associated with Kabuki syndrome in both humans and mice⁴⁰. LOF Kmt2d in the neural crest cells result in a fully penetrant cleft palate phenotype⁴¹. RECQL4 encodes a helicase which plays a role during DNA replication. Mutations in this gene have been associated with autosomal recessive Rapadilino syndrome (OMIM #266280) which in addition to limbs, joints and knee anomalies, and affected individuals may present with cleft palate^42,43. Mutant mice recapitulated most of these phenotypes (including cleft palate)⁴⁴. Although the DNMs reported in FKBP10 and ACAN that encode a binding protein and an extracellular matrix protein (aggrecan) are predicted to be benign, studies have suggested they play roles in craniofacial development⁴⁵. In-vitro studies and knockout experiments in mice provide evidence of the role of aggrecan in the etiology of cleft^46,47.

Gene-set enrichment analysis identified other genes on our list whose DNMs may also contribute to the risk of nsCL/P. These genes are involved in biological processes; palate development and neural crest migration which have been directly linked with the etiopathogenesis of orofacial clefts⁴⁸. Although the distal-less homeobox 6 (Dlx6) mouse knockouts show a number of craniofacial defects, cleft phenotypes have not been reported. The ephrin type B receptor 2 (Ephb2) is a member of the ephrin cell membrane receptors which bind to each other and initiate the Eph and ephrin signaling. The forward signaling of the bidirectional (forward-Eph and reverse-ephrin)signaling pathways is critical for normal palatogenesis^49,50. The truncation of Ephb2 in an Ephb3 null (EphB2^lacZ/lacZ/EphB3^−/−) mice inhibited the proliferation of the palatal mesenchyme which resulted in cleft palate⁴⁹. The sema domain (semaphorin) 3C and 4D ( SEMA3C and SEMA4D) mouse knockout do not display craniofacial defects but are expressed in the 1st branchial arches which contribute indirectly to the development of the lip and palate²². Additionally, homozygous Sema3c knockout is perinatally lethal. These genes are significantly expressed in the embryonic tissues critical to normal development of the lip and palate. With the aid of this SysFACE tool, we also identified novel genes contributing to the development of the lip and palate. We deduced DNMs in these genes may contribute to the risk of cleft in humans. Among other genes identified with the SysFACE tool, MMP9 has been reported to be a key extra cellular matrix remodeling protein that could play a role in lip and palate development⁵¹. Junb, Jup and Dnajc3 are among the genes significantly expressed during the 3-way fusion that forms the lamdoidal junction in the developing face⁵². This 3-way fusion comprises fusion of the medial nasal process (MNP), lateral nasal process (LNP) and the maxillary process (MxP): MNP and LNP; MNP and MxP; LNP and MxP and is critical in the development of the lip and palate, ⁵².

In conclusion, our analysis of the WGS in African nsCL/P case-parent trios led to the discovery of novel pathogenic genetic mutations likely to contribute to the risk of OFC. The findings of nsCL/P-risk DNMs in some of these genes for the first time expands our knowledge of the genetic architecture of sporadic nsCL/P and provides further evidence to support the role of de novo mutations in the risk of the most common craniofacial birth defect.

Study samples

All the individuals recruited were of African ancestry from the 2 participating countries (Ghana and Nigeria). As per the established AfriCRAN protocol developed by Butali and colleagues, recruitments of infants with OFC and their parents were done during the evaluation of the affected child for surgical repair of their clefts. We recruited children affected with nsCL/P and the unaffected parents (father and mother) i.e., case-parent trios for this genetic study. In some situations, we recruited just mother and affected child i.e., dyads and in rare occasions, we recruited other family members like siblings and grand-parents. For the current study, only case-parent trios were included (Figure 1a). Before a trio was recruited, the parents must have reported no family history of any major birth defect. Following the study design, ethical approvals were obtained at the local institution review boards (IRBs) at the participating sites: Lagos University Teaching Hospital (ADM/DCST/HREC/VOL.XV/321), Obafemi Awolowo University Teaching Hospital (ERC/2011/12/01), Kwame Nkrumah University of Science and Technology (CHRPE/RC/018/130) and University of Iowa (IRB ID #: 201101720). The methods used in the recruitment at the different centers were carried out in accordance with statutory guidelines and regulations. Informed consent was obtained from all subjects included in this study.

A case-parent trio was recruited following deep phenotyping of the type of cleft and ruling out other congenital anomalies. This ensured the case (affected child) had a nonsyndromic cleft phenotype while the parents were unaffected. A standardized phenotyping protocol was used by the surgeons during the physical examination, taking clinical photographs and detailing the cleft phenotypes in a clinical database as reported in our previously published works ^10,53. Echocardiography was used to rule out congenital cardiac defects. For each trio, the cleft status describing the type of cleft was recorded. Table 1 shows the number of trios, site of recruitment and their cleft status. The distribution of the cases based on cleft types is as shown in Figure 1b.

Saliva samples were collected from the parents and the affected child using the Oragene saliva tool kits. Each case-parent trio was assigned a unique identifier number and their epidemiological and clinical information were remotely uploaded into a secure REDCap database. Following de-identification, the saliva samples were shipped to the Butali laboratory at the University of Iowa for processing.

DNA Extraction and XY Genotyping

Saliva samples received from the recruitment centers were labeled with their unique identifier (UNID) number. The DNA was isolated from the saliva samples using the Oragene DNA extraction protocol. Extracted DNA samples were quantified using Qubit (http://www.invitrogen.com/site/us/en/home/brands/Product-Brand/Qubit.html; ThermoFisher Scientific, Grand Island, NY). Stocks and working aliquots of each DNA samples were made for future testing.

We confirmed the reported sex using the TaqMan XY genotyping. Confirmation of the sex is an inhouse quality control (QC) used in the Butali laboratory. Working aliquots (25μl) passing QC with DNA concentration ≥250ng were shipped to the Broad Institute for whole genome sequencing supported by the Gabriella Miller Kids First program.

Whole-Genome Sequencing and Variant Calling

Our nsCL/P case-parent trios’ DNA samples were part of the cohorts sequenced under the Gabriella-Miller Kids First (GMKF) Pediatric Research Consortium (https://kidsfirstdrc.org/) This consortium was established and funded to address the knowledge gaps in the understanding role of the genetics in the etiology of structural birth defects and pediatric cancers. The WGS was conducted at the Broad Institute with entire genome sequenced an average of 30 times (30x WGS). The binary alignment map (BAM) and sequence alignment map (SAM) files were obtained after the sequence data were aligned to the Human genome assembly GRCh38 (hg38). Alternate alleles (i.e., variants from the reference genome), were called when present using the GenomeAnalysisToolKit (GATK) pipelines at the Broad Institute (https://software.broadinstitute.org/gatk/best-practices/workflow). Briefly, these variants include single nucleotide variants (SNVs) and Insertions or deletions (Indels), were called using the HaplotypeCaller in GVCF mode and GenotypeGVCFs for single-sample variant calling and the multiple-sample joint variant calling respectively. Variants were stored in a variant call format (VCF) file which was used for further analyses.

Quality Control

The quality control of the case-parent nsCL/P African trios WGS data was done using PLINK v.1.9. Each individual in a case-parent trio were evaluated on variety of quality metrics. Individuals with missingness > 10%, inconsistency between the sex reported and the average homozygosity of X-chromosome or Hardy-Weinberg Equilibrium (HWE) < 1E-06 were dropped. Also, trios showing deviation from the expected degree of relatedness between the case (offspring) and parents, or case-parent trios with Mendelian errors outside three standard deviations from the mean were dropped. Additionally, individuals with variant calls beyond 4 standard deviations from mean heterozygote/homozygote ratio were dropped.

A case-parent trio was retained only when the offspring and parent samples meet these quality control thresholds. If at least one sample in a trio does not meet these thresholds, the entire case-parent trio was dropped. After the quality control, 130 out of 150 case-parent trios were retained for downstream analyses.

Analyses for De novo Mutations (DNMs) contributing to risk of nsCL/P

Following the variant calling, we filtered for high confidence protein-altering DNMs using the data filtration pipeline in Figure 1C. Variants were first filtered based on genotype quality (GQ) ≥ 20 and a read depth (DP) ≥ 10. The high-quality variants were then filtered for mutations present in the affected case but absent in the unaffected parents (DNMs). Potential DNMs passing these filtering steps were then examined for high impact/ protein-altering mutations. Such mutations are within the coding region of the genes and the selected consequences are loss of function (LOF) and missense mutations creating altered gene products.

Following the identification of these coding DNMs, we filtered for those variants with minor allele frequency (MAF) ≤ 1% (0.01). We did this by comparing the identified DNMs to variants reported in the 1000 Genome database (https://www.internationalgenome.org/), Exome Variant Server database (https://evs.gs.washington.edu/EVS/) and Genome Aggregation Database (https://gnomad.broadinstitute.org/). Allelic frequencies in these public databases contains whole genome sequencing data from over 7000 African and African-American controls including individuals from Ghana and Nigeria.

We then identified those genes with DNMs with some evidence of involvement in human craniofacial development. This was achieved by mining the DECIPHER database (https://www.deciphergenomics.org/) to identify those genes with copy number variants (CNVs), indels and SNVs reported in individuals with craniofacial anomalies. We prioritized those genes recognized as associated with lip and palate anomalies or anomalies in other craniofacial structures. In a bid to identify the contribution of these DNMs to the risk of nsCL/P, we also mined the Mouse Genome Informatics (MGI) database. We focused on genes among our list with cleft phenotype in mouse knockouts.

Next, we predicted the functional consequence of these DNMs on protein structure and functions using the bioinformatic tools such as Sorting Intolerant From Tolerant, SIFT (http://sift.jcvi.org/)⁵⁴, Polymorphism Phenotyping, PolyPhen2 (http://genetics.bwh.harvard.edu/pph2/ ) ⁵⁵ and Combined Annotation Dependent Depletion, CADD (https://cadd.gs.washington.edu/) ⁵⁶. We identified DNMs predicted to be deleterious, damaging or among the topmost deleterious mutations in the human genome.

Furthermore, we investigated the effect of missense DNMs on the protein structure and function. We used the bioinformatic tool, Help you Protein Explained: HOPE (https://www3.cmbi.umcn.nl/hope/) ⁵⁷ to predict effects of the amino acid change on the protein structure and function. Additional computational methods were used to predict the structural effects of the DNM on one of the most reported cleft candidate gene products. Starting from the predicted protein structures generated by AlphaFold2, we locally optimized the structure to relax its backbone torsions and performed sidechain optimization (i.e., sidechain repacking) to find the most favorable position for each sidechain and improve MolProbity scores ⁵⁸. Both optimizations were done with the AMOEBA polarizable force field^59,60. We then used the optimized protein structure to calculate the protein stability change due to the amino acid changes. Folding free energy changes (i.e., protein stability changes) were measured using NAnoscale Molecular Dynamics (NAMD) by calculating the free energy change due to mutation for a folded and unfolded state of the wildtype and mutant protein⁶¹. The protein stability change (∆∆G) is defined as ∆∆G = ∆G_folded - ∆G_unfolded. Generally, protein stability changes that are greater than 1 kcal/mol are more likely to cause disease. This analysis offered insight into the effect of the DNMs on protein functionality.

Sanger-Sequencing Validation

To eliminate false positive DNMs, we conducted Sanger-sequencing validations of selected high impact DNMs discovered through WGS in our case-parent trios. Briefly, we designed primers around the DNMs by including 500 base pairs upstream and downstream of the mutation locus. The primers were designed using primer3 (https://primer3.ut.ee/) and optimized for the application of the regions containing the DNMs. The optimized primers were used to amplify these loci in DNA samples using a DNA concentration of 4 ng/μl in a 10 μl polymerase chain reaction. A YRI HapMap sample was added to the plate as a negative control. Details of the primers and annealing temperatures are available from the Butali laboratory upon request. The PCR products were sent to Functional Biosciences, Inc., Madison, WI (https://functionalbio.com/) for sequencing. Sequence data were investigated to confirm the DNMs.

Gene Set Enrichment Analysis (GSEA) and SysFACE based craniofacial gene expression analysis

We did a gene set enrichment analysis to identify those processes significantly enriched within our DNMs gene set. We used the Database for Annotation, Visualization and Integrated Discovery (DAVID) to identify the biological processes significantly enriched within gene sets with the DNMs. The list of the genes with DNMs were entered in as a query in the DAVID Bioinformatics Resources 6.8 (https://david.ncifcrf.gov/) and the analysis was run. The biological processes with at least a nominal p-value (p < 0.05) were selected, among others. Among those with suggestive significance value (p < 0.05), we identified those processes involved in lip and palatal development where genes on our list were involved.

To gain biological insights on the candidate genes among the DNM gene lists, we examined the expression in relevant craniofacial tissues using SysFACE (Systems tool for craniofacial expression-based gene discovery), as previously published ^16,62. Mouse craniofacial transcriptomics microarray data for maxilla, frontonasal and palate at embryonic (E) and post-natal (P) stages was used for examining gene expression. Transcriptomics data from public databases such as FaceBase (https://www.facebase.org) and NCBI GEO (https://www-ncbi-nlm-nih-gov.udel.idm.oclc.org/geo/) was meta-analyzed as previously described⁶². The following FaceBase datasets (FB00000352, FB00000353, FB00000107, FB00000254, FB00000264, FB00000468.01, FB00000474.01, FB00000477.01, FB00000905) and NCBI Gene Expression Omnibus (GEO) datasets (GSE7759, GSE55965, GSE22989, GSE31004, GSE11400) were considered in the analysis. Craniofacial tissue expression, presented in fluorescence intensity units, were used to generate heatmap representation.

Acknowledgements

We are grateful to all families who voluntarily participated in this study from Nigeria and Ghana. We are also grateful to all the administrative and research staff, students, nurses and resident doctors who assisted with participant recruitment, consent, and data collection. We thank all members of Butali Lab for their helpful comments and suggestions.

Author Contributions

W.A. and A.B. contributed to the conception and design, data acquisition, analysis, and interpretation, drafted and critically revised the manuscript; P.M., J.B.H., L.J.J.G., M.A.E., W.L.A., A.A., E.Z., O.A., T.N., D.A, C.A., T.B., M.L., A.P., B.S.A., R.O.B., F.O.O., A.O.O., A.O., S.K., J.O., M.H., J.P., P.D., F.K.N.A., S.O-Y., D.K.S., P.A., G.P-R., A.A.O., R.A.G., T.H.B., M.T., M.L.M., M.J.S., S.A.L., A.A. and J.C.M. contributed to the conception, data acquisition, analysis and interpretation and critically revised the manuscript. All authors gave final approval and agreed to be accountable for all aspects of the work.

Mossey, P. A., Little, J., Munger, R. G., Dixon, M. J. & Shaw, W. C. Cleft lip and palate., 374, 1773–1785 https://doi.org/doi:10.1016/s0140-6736(09)60695-4 (2009).
Smarius, B. et al. Accurate diagnosis of prenatal cleft lip/palate by understanding the embryology. World J Methodol, 7, 93–100 https://doi.org/doi:10.5662/wjm.v7.i3.93 (2017).
Rahimov, F., Jugessur, A. & Murray, J. C. Genetics of nonsyndromic orofacial clefts. Cleft Palate Craniofac J, 49, 73–91 https://doi.org/doi:10.1597/10-178 (2012).
Christensen, K., Juel, K., Herskind, A. M. & Murray, J. C. Long term follow up study of survival associated with cleft lip and palate at birth., 328, 1405 https://doi.org/doi:10.1136/bmj.38106.559120.7C (2004).
Hunt, O., Burden, D., Hepper, P. & Johnston, C. The psychosocial effects of cleft lip and palate: a systematic review. Eur J Orthod, 27, 274–285 https://doi.org/doi:10.1093/ejo/cji004 (2005).
Wehby, G. L. & Cassell, C. H. The impact of orofacial clefts on quality of life and healthcare use and costs. Oral Dis, 16, 3–10 https://doi.org/doi:10.1111/j.1601-0825.2009.01588.x (2010).
Berk, N. W. Marazita ML The costs of cleft lip and palate:personal and societal implications Wyszynski, D. F. (ed) Cleft lip and palate: from origin to treatment (Oxford University Press, New York, 2002).
Yu, Y. & e t al. Genome-wide analyses of non-syndromic cleft lip with palate identify 14 novel loci and genetic heterogeneity. Nat Commun 8, 14364, doi:10.1038/ncomms14364 (2017)
Leslie, E. J. et al. Genome-wide meta-analyses of nonsyndromic orofacial clefts identify novel associations between FOXE1 and all orofacial clefts, and TP63 and cleft lip with or without cleft palate. Hum Genet, 136, 275–286 https://doi.org/doi:10.1007/s00439-016-1754-7 (2017).
Butali, A. et al. Genomic analyses in African populations identify novel risk loci for cleft palate. Hum Mol Genet 28, 1038–1051 https://doi.org/doi:10.1093/hmg/ddy402 (2019).
Howe, L. J. et al. Investigating the shared genetics of non-syndromic cleft lip/palate and facial morphology. PLoS Genet 14, e1007501 https://doi.org/doi:10.1371/journal.pgen.1007501 (2018).
Rojas-Martinez, A. al. Genetic risk factors for nonsyndromic cleft lip with or without cleft palate in a Mesoamerican population: Evidence for IRF6 and variants at 8q24 and 10q25. Birth Defects Res A Clin Mol Teratol, 88,, 535–537 https://doi.org/doi:10.1002/bdra.20689 (2010). t
van Rooij, I. A. & e t al. Non-Syndromic Cleft Lip with or without Cleft Palate: Genome-Wide Association Study in Europeans Identifies a Suggestive Risk Locus at 16p12.1 and Supports SH3PXD2A as a Clefting Susceptibility Gene. Genes (Basel) 10, doi:10.3390/genes10121023 (2019)
Al Mahdi, H. B. et al. Identification of Causative Variants Contributing to Nonsyndromic Orofacial Clefts Using Whole-Exome Sequencing in a Saudi Family. Genet Test Mol Biomarkers, 24,, 723–731 https://doi.org/doi:10.1089/gtmb.2019.0233 (2020).
Aylward, A. et al. Using Whole Exome Sequencing to Identify Candidate Genes With Rare Variants In Nonsyndromic Cleft Lip and Palate. Genet Epidemiol 40, 432–441 https://doi.org/doi:10.1002/gepi.21972 (2016).
Liu, H. et al. Exome sequencing provides additional evidence for the involvement of ARHGAP29 in Mendelian orofacial clefting and extends the phenotypic spectrum to isolated cleft palate. Birth Defects Res, 109, 27–37 https://doi.org/doi:10.1002/bdra.23596 (2017).
Mossey, P. A. & Modell, B. Epidemiology of oral clefts 2012: an international perspective. Front Oral Biol, 16, 1–18 https://doi.org/doi:10.1159/000337464 (2012).
Riley, B. M. et al. Impaired FGF signaling contributes to cleft lip and palate. Proc Natl Acad Sci U S A, 104, 4512–4517 https://doi.org/doi:10.1073/pnas.0607956104 (2007).
Leoyklang, P., Siriwan, P. & Shotelersuk, V. A mutation of the p63 gene in non-syndromic cleft lip. J Med Genet, 43, e28 https://doi.org/doi:10.1136/jmg.2005.036442 (2006).
Bishop, M. R. et al. Genome-wide Enrichment of De Novo Coding Mutations in Orofacial Cleft Trios. The American Journal of Human Genetics, 107, 124–136 https://doi.org/doi:10.1016/j.ajhg.2020.05.018 (2020).
Campbell, M. C. & Tishkoff, S. A. African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu Rev Genomics Hum Genet, 9, 403–433 https://doi.org/doi:10.1146/annurev.genom.9.081307.164258 (2008).
Bult, C. J., Blake, J. A., Smith, C. L., Kadin, J. A. & Richardson, J. E. Mouse Genome Database (MGD) 2019. Nucleic Acids Res, 47, D801–d806 https://doi.org/doi:10.1093/nar/gky1056 (2019).
Perler, F. B. Protein splicing of inteins and hedgehog autoproteolysis: structure, function, and evolution. Cell 92, 1–4 https://doi.org/doi:10.1016/s0092-8674(00)80892-2 (1998).
Sasai, N., Toriyama, M. & Kondo, T. Hedgehog Signal and Genetic Disorders. Frontiers in Genetics 10, https://doi.org/doi:10.3389/fgene.2019.01103 (2019).
Duan, J., Lupyan, D. & Wang, L. Improving the Accuracy of Protein Thermostability Predictions for Single Point Mutations. Biophys J 119, 115–127 https://doi.org/doi:10.1016/j.bpj.2020.05.020 (2020).
Teng, J. P. et al. The roles of ARHGAP10 in the proliferation, migration and invasion of lung cancer cells. Oncol Lett, 14, 4613–4618 https://doi.org/doi:10.3892/ol.2017.6729 (2017).
Kurosaka, H., Iulianella, A., Williams, T. & Trainor, P. A. Disrupting hedgehog and WNT signaling interactions promotes cleft lip pathogenesis. J Clin Invest, 124, 1660–1671 https://doi.org/doi:10.1172/jci72688 (2014).
Vuillaumier-Barrot, S. et al. Identification of mutations in TMEM5 and ISPD as a cause of severe cobblestone lissencephaly. Am J Hum Genet, 91,, 1135–1143 https://doi.org/doi:10.1016/j.ajhg.2012.10.009 (2012).
Oates, E. C. et al. Congenital Titinopathy: Comprehensive characterization and pathogenic insights. Ann Neurol 83, 1105–1124 https://doi.org/doi:10.1002/ana.25241 (2018).
Chauveau, C. et al. Recessive TTN truncating mutations define novel forms of core myopathy with heart disease. Hum Mol Genet 23, 980–991 https://doi.org/doi:10.1093/hmg/ddt494 (2014).
May, S. R., Stewart, N. J., Chang, W. & Peterson, A. S. A Titin mutation defines roles for circulation in endothelial morphogenesis. Dev Biol, 270, 31–46 https://doi.org/doi:10.1016/j.ydbio.2004.02.006 (2004).
Billings, S. E. et al. The retinaldehyde reductase DHRS3 is essential for preventing the formation of excess retinoic acid during embryonic development. Faseb j 27, 4877–4889 https://doi.org/doi:10.1096/fj.13-227967 (2013).
Vieira, A. R. et al. Fine Mapping of 6q23.1 Identifies TULP4 as Contributing to Clefts. Cleft Palate Craniofac J, 52, 128–134 https://doi.org/doi:10.1597/13-023 (2015).
Gowans, L. J. et al. Association Studies and Direct DNA Sequencing Implicate Genetic Susceptibility Loci in the Etiology of Nonsyndromic Orofacial Clefts in Sub-Saharan African Populations. J Dent Res, 95,, 1245–1256 https://doi.org/doi:10.1177/0022034516657003 (2016).
Tsutsui, K. & e. A novel p63 sterile alpha motif (SAM) domain mutation in a Japanese patient with ankyloblepharon, ectodermal defects and cleft lip and palate (AEC) syndrome without ankyloblepharon. Br J Dermatol, 149, 395–399 https://doi.org/doi:10.1046/j.1365-2133.2003.05423.x (2003). t al
Zheng, J. al. Tooth defects of EEC and AEC syndrome caused by heterozygous TP63 mutations in three Chinese families and genotype-phenotype correlation analyses of TP63-related disorders. Mol Genet Genomic Med, 7, e704 https://doi.org/doi:10.1002/mgg3.704 (2019). t
Marazita, M. L. et al. Genome-scan for loci involved in cleft lip with or without cleft palate in consanguineous families from Turkey. Am J Med Genet A 126a, 111-122, doi:10.1002/ajmg.a.20564 (2004)
Marazita, M. L. et al. Genome scan, fine-mapping, and candidate gene analysis of non-syndromic cleft lip with or without cleft palate reveals phenotype-specific differences in linkage and association results. Hum Hered, 68, 151–170 https://doi.org/doi:10.1159/000224636 (2009).
Roessler, E. al. The mutational spectrum of holoprosencephaly-associated changes within the SHH gene in humans predicts loss-of-function through either key structural alterations of the ligand or its altered synthesis. Hum Mutat 30, E921–935 https://doi.org/doi:10.1002/humu.21090 (2009).
Bjornsson, H. T. et al. Histone deacetylase inhibition rescues structural and functional brain deficits in a mouse model of Kabuki syndrome. Sci Transl Med 6, 256135 https://doi.org/doi:10.1126/scitranslmed.3009278 (2014).
Shpargel, K. B., Mangini, C. L., Xie, G., Ge, K. & Magnuson, T. The KMT2D Kabuki syndrome histone methylase controls neural crest cell differentiation and facial morphology. Development 147, https://doi.org/doi:10.1242/dev.187997 (2020).
Wang, L. L. et al. Association between osteosarcoma and deleterious mutations in the RECQL4 gene in Rothmund-Thomson syndrome. J Natl Cancer Inst, 95, 669–674 https://doi.org/doi:10.1093/jnci/95.9.669 (2003).
Maciaszek, J. L. et al. Enrichment of heterozygous germline RECQL4 loss-of-function variants in pediatric osteosarcoma. Cold Spring Harb Mol Case Stud, 5, https://doi.org/doi:10.1101/mcs.a004218 (2019).
Mann, M. B. et al. Defective sister-chromatid cohesion, aneuploidy and cancer predisposition in a mouse model of type II Rothmund-Thomson syndrome. Hum Mol Genet 14, 813–825 https://doi.org/doi:10.1093/hmg/ddi075 (2005).
Lietman, C. D. et al. Connective tissue alterations in Fkbp10-/- mice. Hum Mol Genet 23, 4822–4831 https://doi.org/doi:10.1093/hmg/ddu197 (2014).
Rittenhouse, E. et al. Cartilage matrix deficiency (cmd): a new autosomal recessive lethal mutation in the mouse. J Embryol Exp Morphol, 43, 71–84 (1978).
Bueno, D. F. et al. Human stem cell cultures from cleft lip/palate patients show enrichment of transcripts involved in extracellular matrix modeling by comparison to controls. Stem Cell Rev Rep, 7, 446–457 https://doi.org/doi:10.1007/s12015-010-9197-3 (2011).
Deshpande, A. S. & Goudy, S. L. Cellular and molecular mechanisms of cleft palate development. Laryngoscope Investig Otolaryngol 4, 160–164 https://doi.org/doi:10.1002/lio2.214 (2019).
Risley, M., Garrod, D., Henkemeyer, M. & McLean, W. EphB2 and EphB3 forward signalling are required for palate development. Mechanisms of Development, 126, 230–239 https://doi.org/doi:10.1016/j.mod.2008.10.009 (2009).
Benson, M. D. & Serrano, M. J. Ephrin regulation of palate development. Front Physiol 3, 376 https://doi.org/doi:10.3389/fphys.2012.00376 (2012).
Smane-Filipova, L., Pilmane, M. & Akota, I. MMPs and TIMPs expression in facial tissue of children with cleft lip and palate. Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub, 160, 538–542 https://doi.org/doi:10.5507/bp.2016.055 (2016).
Li, H., Jones, K. L., Hooper, J. E. & Williams, T. The molecular anatomy of mammalian upper lip and primary palate fusion at single cell resolution. Development 146, https://doi.org/doi:10.1242/dev.174888 (2019).
Awotoye, W. et al. Genome-wide Gene-by-Sex Interaction Studies Identify Novel Nonsyndromic Orofacial Clefts Risk Locus. J Dent Res, 220345211046614, https://doi.org/doi:10.1177/00220345211046614 (2021).
Sim, N. L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40, W452–457 https://doi.org/doi:10.1093/nar/gks539 (2012).
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7, Unit7.20, doi:10.1002/0471142905.hg0720s76 (2013)
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res, 47, D886–d894 https://doi.org/doi:10.1093/nar/gky1016 (2019).
Venselaar, H., Te Beek, T. A., Kuipers, R. K., Hekkelman, M. L. & Vriend, G. Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces. BMC Bioinformatics, 11, 548 https://doi.org/doi:10.1186/1471-2105-11-548 (2010).
Tollefson, M. R. et al. Structural Insights into Hearing Loss Genetics from Polarizable Protein Repacking. Biophys J 117, 602–612 https://doi.org/doi:10.1016/j.bpj.2019.06.030 (2019).
Ponder, J. W. et al. Current Status of the AMOEBA Polarizable Force Field. The Journal of Physical Chemistry B, 114, 2549–2564 https://doi.org/doi:10.1021/jp910674d (2010).
Shi, Y. et al. Polarizable Atomic Multipole-Based AMOEBA Force Field for Proteins. Journal of Chemical Theory and Computation, 9, 4046–4063 https://doi.org/doi:10.1021/ct4003702 (2013).
Phillips, J. C. et al. Scalable molecular dynamics on CPU and GPU architectures with NAMD. J Chem Phys, 153, 044130 https://doi.org/doi:10.1063/5.0014475 (2020).
Cox, L. L. et al. Mutations in the Epithelial Cadherin-p120-Catenin Complex Cause Mendelian Non-Syndromic Cleft Lip with or without Cleft Palate. Am J Hum Genet, 102, 1143–1157 https://doi.org/doi:10.1016/j.ajhg.2018.04.009 (2018).

Table 1

Distribution of the case-parent trios based on the country of origin and cleft status.

Country	Cleft Status		Total
Ghana	Cleft Lip	Cleft Lip and Palate	105
Ghana	41	64	105
Nigeria	14	18	32
	55	82	137

Table 2

List of novel variants which have evidence of involvement in craniofacial development and role in development of nsCL/P

SIFT and Polyphen2 SCORE Interpretation: Tolerated Deleterious

No competing interests reported.

Supplementarydata.docx

Download PDF

Editorial decision: Major revision
01 Feb, 2022
Reviews received at journal
30 Jan, 2022
Reviews received at journal
20 Dec, 2021
Reviewers agreed at journal
25 Nov, 2021
Reviewers agreed at journal
25 Nov, 2021
Reviewers invited by journal
22 Nov, 2021
Editor assigned by journal
22 Nov, 2021
Editor invited by journal
22 Nov, 2021
Submission checks completed at journal
22 Nov, 2021
First submitted to journal
09 Nov, 2021

You are reading this latest preprint version

Whole-genome Sequencing Reveals De-novo Mutations Associated with Nonsyndromic Cleft Lip/Palate

Status:

Version 1

Abstract

Figures

Introduction

Results

Samples and Variants Filtration

Novel DNMs contribute to the risk of nsCL/P

Bioinformatics analysis showed pathogenicity of identified DNMs

Palate development and neural crest migration biological processes are significantly disrupted by DNMs

SysFACE analysis informs on gene expression in facial tissue development

Discussion

Materials And Methods

Study samples

DNA Extraction and XY Genotyping

Whole-Genome Sequencing and Variant Calling

Quality Control

Analyses for De novo Mutations (DNMs) contributing to risk of nsCL/P

Sanger-Sequencing Validation

Gene Set Enrichment Analysis (GSEA) and SysFACE based craniofacial gene expression analysis

Declarations

Acknowledgements

Author Contributions

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1