The ability of CRISPR-Cas12a to detect mutations
Literature searches identified RNF213 as a susceptibility gene for MMD. In addition, two SNP loci, rs112735431 (R4810K) and rs148731719 (A4399T), were closely related to MMD [5-7]. crRNA was designed for two SNP point mutation loci in RNF213. The cleavage efficiency of crRNA was then verified by wild-type and mutant-target DNA; results are presented in Figure 1. The level of fluorescence derived from the mutant was twice that of the wild type, thus indicating that the CRISPR-Cas12a system constructed by crRNA was successful and could be used to detect whether there was a mutation at this locus in a clinical sample.
Detection of RNF213 gene locus by CRISPR-Cas12a and Sanger
We collected 34 samples of patients who had been clinically diagnosed with MMD and 37 healthy control samples from Liaocheng People's Hospital. In addition, 20 samples, acquired during physical examinations, were collected from Wuhan Medical Examination Center. DNA was extracted from these samples and The RNF213 gene of samples was test by CRISPR and sanger(Table 2). The coincidence rate of the test results for the rs112735431 locus and the rs148731719 locus was 94% and 97%, respectively. The 100% coincidence between the CRISPR-Cas12a system and Sanger Sequencing with regards to detecting mutant samples indicated that the CRISPR-Cas12a detection method is accurate and has a higher sensitivity .
Table 2
Analysis of RNF213 Gene Mutation Results by CRISPR test and Sanger sequencing
|
|
Genotype
|
CRISPR test results
|
Sanger sequencing results
|
Liaocheng Case group
(n=34)
|
LiaochengControl group
(n=37)
|
Wuhan control group
(n=20)
|
LiaochengCase group
(n=34)
|
Liaocheng control group
(n=37)
|
Wuhan control group
(n=20)
|
R4810K
|
Mutation
|
10(29.4%)
|
0 (0%)
|
0 (0%)
|
8 (23.5%)
|
0 (0%)
|
0 (0%)
|
wild
|
24(70.6%)
|
37(100%)
|
20 (100%)
|
26(76.4%)
|
37 (100%)
|
20 (100%)
|
A4399T
|
Mutation
|
5 (14.7%)
|
3(8.1%)
|
1 (5%)
|
4 (11.8%)
|
3 (8.1%)
|
1 (5%)
|
wild
|
29(85.3%)
|
34(91.9%)
|
19 (95%)
|
30(88.2%)
|
34 (91.9%)
|
19 (95%)
|
Analysis of the correlation between gene(RNF213,MMP3) mutations and MMD
Analysis sanger test results of RNF213,as shown in Figure 2a, there was a C>T mutation at locus rs112735431 and a G>A mutation at locus rs148731719 of the RNF213 gene. T-tests showed that the P value for the rs112735431 locus mutation was < 0.05 when comparing between the case group and the healthy control group from the Liaocheng area. In contrast, there was no significant difference between the groups with respect to rs148731719. This indicated that the rs112735431 mutation within the RNF213 gene was significant and that the R4810K is a significant mutation locus for MMD in the RNF213 gene. T-tests of healthy control samples in the Liaocheng area and Wuhan area were not significant (P> 0.05) (Table 3a).
We identified a base insertion mutation (rs3025058) in the MMP3 gene (Figure 2b). This mutation was identified by Sanger sequencing and detected in 67.6% of the 34 patients with MMD in Shandong province, and 5.4% of the 37 controls; the detection rate was 85% in the control group (20 cases) in Wuhan City; P values were all < 0.05, indicating statistical significance. The 1171 (6A/6A) mutation in the MMP3 gene is associated with the risk of MMD; furthermore, the risk of the (6A/6A) genotype is higher than that of the (5A/6A) genotype (Table 3b). Analysis of MMP3 gene mutation results in the two regions (Shandong and Wuhan) indicated that MMD may be a regional disease; however, this needs to be verified in future research involving a larger sample size.
Table 3
Analysis of RNF213 and MMP3 gene mutation
|
Gene
|
Genotype
|
Sanger sequencing results
|
Liao cheng
|
Wuhan
|
Case group
|
Control group
|
Control group
|
(n=34)
|
(n=37)
|
(n=20)
|
RNF213
|
R4810K
|
Mutation
|
8 (23.5%)
|
0 (0%)
|
0(0%)
|
wild
|
26 (76.4%)
|
37 (100%)
|
20(100%)
|
p value
|
<0.05
|
|
|
>0.05
|
A4399T
|
Mutation
|
4(11.8%)
|
3(8.1%)
|
1(5%)
|
wild
|
30(88.2%)
|
34(91.9%)
|
19(95%)
|
p value
|
>0.05
|
|
|
>0.05
|
MMP3
|
6A6A
|
23(67.6%)
|
2(5.4%)
|
17(85%)
|
5A6A
|
11(32.4%)
|
35(94.6%)
|
1(5%)
|
5A5A
|
0 (0%)
|
0 (0%)
|
2(10%)
|
p value
|
<0.05
|
|
|
<0.05
|
6A allele frequency
|
57(83.8%)
|
39(52.7%)
|
35(87.5%)
|
5A allele frequency
|
11(16.2%)
|
35(47.3%)
|
5(12.5%)
|
p value
|
<0.05
|
|
|
<0.05
|
WES
In this study, draw the family maps of 12 samples and perform ct test on the patients, the results are shown in the Figure 3.The average sequencing depth of the 12 samples exceeded 100×, and the coverage of regions exceeding 10× exceeded 99%. All exonic regions and Untranslated Regions (UTR) regions of all samples were covered effectively (Table S2). The number of SNVs and Indels obtained from each sample after data analysis is shown in Table S3.
Screening for candidate pathological changes
All SNVs and Indels were annotated to remove at least one mutation with a frequency higher than 1% from 1000g_all, esp6500siv2_all, gnomAD_ALL, and gnomAD_EAS. This practice removed diverse loci among individuals and identified rare mutations that were most likely to be pathogenic. Exonic or splicing (10bp up- and downstream) variants were retained. Synonymous SNP mutations that were not predicted by the software as affecting splicing and were located in regions that were not well conserved, were removed. The non-frameshift InDel mutation with a small fragment (< 10bp) of the repeat region was removed. Mutation loci were screened in accordance with the scores predicted by SIFT, PolyPhen, MutationTaster, and CADD. For the locus to be retained, at least half of the scores derived by these four software packages needed to show that the locus may be harmful. Splicing mutations needed to be no further than 2 bp (±1 - 2bp) from the exonic region. In addition, the dbscSNV software needed to predict that a given mutations would exert impact on splicing. Table S4 provides data related to the screening process.
Screening of mutation-related genes
Candidate loci were screened further according to the process shown in Figure 4. Recessive hereditary pathogenicity is known to be characterized by a normal parental phenotype and the presence of pathogenic loci. Offspring inherit the pathogenic homozygous locus of their parents and present as diseased individuals. Recessive genetic diseases involve the homozygous variation of genes and compound heterozygous variation. We also took X-linkage into account when we screened for recessive patterns. A recessive pattern screening refers to the preservation of a homozygous mutation in a family with a normal human heterozygous mutation or a locus without a mutation as a candidate locus if a monogenic disease is inherited in a family with a recessive pattern. Compound heterozygous pattern screening refers to the preservation of loci that are not homozygous mutations in patients and normal individuals if the monogenic disease is inherited in a recessive pattern within the family, and requires that a gene has at least two heterozygous mutation loci in the patient. The distribution of mutation loci in this gene in patients cannot be the same as that of any normal control, nor can it be a subset of mutation loci in any normal person. Using this strategy, we identified multiple recessive pathogenic genes; of these, the loci mutation-related genes were within the TTN gene (rs771533925, rs559712998 and rs72677250) (Table 4).
Global population frequencies for rs72677250、rs559712998 and rs771533925
Next, we analyzed the risk alleles (rs72677250, rs559712998 and rs771533925) in accordance with the Exome Convergence Alliance (EXAC) database [43]. We identified significant differences in frequency across the global population. The highest frequency of rs72677250 in South Asian population was 0.00003269, the highest frequency of rs559712998 in the East Asian population was 0.002574, the highest frequency of rs771533925 in the East Asian population was 0.00005568, and the total frequency of rs559712998 mutations was 0.000192; this was the highest frequency of the three mutation sites (Table 5). According to age analysis of the three loci within the global population, we found that the rs72677250 mutation site was predominant in subjects aged 50-55 years, the rs559712998 mutation site was predominant in subjects aged 30-80 years, and the rs771533925 mutation site was predominant in subjects aged 65-70 years (Figure 5a,b and c).
The deleterious effects of rs771533925, rs559712998 and rs72677250
SIFT (http://provean.jcvi.org/index.php) [44] PROVEAN (http://provean.jcvi.org/index.php) [45] and PolyPhen (http://genetics.bwh.harvard.edu/pph2/) [46] algorithms were used to predict the effects of amino acid substitutions on protein function (Table 6). rs771533925 was considered potentially damaging by all three databases, rs559712998 was considered to be tolerated in the above mentioned databases, while rs72677250 was considered to be tolerated in the SIFT database but potentially harmful in the PROVEAN and PolyPhen databases.
Table 6
Hazard prediction of RS771533925, RS559712998 and RS72677250 mutations
|
ID
|
Gene
|
PROVEAN Prediction
|
SIFT Prediction
|
Polyphen
|
rs771533925
|
TTN
|
Deleterious
|
Damaging
|
possibly_damaging
|
rs559712998
|
TTN
|
Neutral
|
Tolerated
|
benign
|
rs72677250
|
TTN
|
Deleterious
|
Tolerated
|
possibly_damaging
|
Notes:
PROVEAN (Protein Variation Effect Analyzer) is a tool to predict whether biomolecular structure Variation affects Protein function;SIFT(sorts intolerant from tolerant) is a tool for predicting non-synonymous variations based on sequence homology;PolyPhen (Polymorphism Phenotyping ) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.
Functional analysis of TTN
GO analysis was conducted using Cytoscape 3.8.2 software with the ClueGO [47] plugin. The database (GO Plasmodium falciparum) used for the GO analysis was acquired from the GO Consortium. GO analysis was conducted using a two-sided hypergeometric test with Bonferroni correction. The GO term levels were from five to ten. The minimum number of genes to form a cluster was set at one. GO analysis showed that the targets for TTN were involved in a range of important biological processes, including myosin thick filament assembly in the skeletal muscle, the positive regulation of protein transport, serine/threonine kinase activity, and cardiac muscle fiber development (Figure 5d).
Validation of candidate loci by CRISPR-Cas12a
The test results obtained by the CRISPR-Cas12a system for mutation loci in the TTN gene in family samples (Figure 6) were consistent with those obtained from WES sequencing,the presence of mutations in the family samples was verified. We then used the CRISPR-Cas12a system to test a total of 50 sporadic samples for gene mutations. No mutations were found at rs771533925, rs559712998 and rs72677250 of TTN gene in sporadic samples (Figure 7). Therefore, we can conclude that mutations within the TTN locus may play an important role in the pedigree inheritance of MMD. Therefore, this technology is suitable for identifying pedigree genetic patients and assessing the genetic risk of MMD in large-scale screening strategies.