In order to conduct this study, 30 blood samples were collected at the rate of 5 cc from three different genealogies from Fars, Tehran and Mazandaran provinces. A sample from each family was selected as a control and examined in the study. Patients with Restless Legs Syndrome were identified based on their symptoms and family history with the approval of a neurologist. Before sampling, patients were examined for non-involvement of non-hereditary factors such as iron deficiency, which is the most important non-genetic cause of the disease. Patients were justified in terms of the type of research and its stages, and special research questionnaires were completed with the help of the research community. Questionnaire questions were selected based on the research design. The questionnaire included questions from patients about symptoms, time and severity of symptoms, family history, medications, history of certain diseases such as diabetes, hemodialysis, Parkinson's and mental disorders. Also, patients with restless legs syndrome who were receiving drug treatment were identified and the type of drug used was determined. Age, sex, geographical location and family relationship of patients with each other were determined and genealogy was drawn. Based on the interpretation of the questionnaire and the study of previous research in other countries, the role of heredity in the disease was confirmed and blood samples were collected from patients and transferred to Tehran for further treatment. Prior to blood sampling, patients with restless legs syndrome in these families were identified through a questionnaire. Patients were evaluated for other environmental factors affecting the disease such as iron deficiency and diabetes and other environmental factors and according to the genealogy and ensuring the role of heredity in these patients, sampling was performed. Samples were placed in tubes for collecting human k3 blood samples containing EDETEA and marked for ease of use of samples 1 to 30.
Preparation of samples
At this stage, the blood samples were taken out of the freezer at room temperature to return from freezing and return to normal. The specimens were placed vertically under the hood. In the laboratory, the study site was a laminar hood.
Extraction of genomic DNA from blood
After taking blood samples from individuals, DNA extraction from the samples began. At this stage, according to the protocol in the kit, first RBC was lysed in several stages and the samples were centrifuged and after complete lysis, the obtained sample was used for extraction.
Mapping the genome and conducting studies on it to obtain a specific genetic link
In this study, we have investigated the molecular causes of Restless Legs Syndrome using the NGS method using the experimental method. For this purpose, the DNA sample of a person with severe restless legs syndrome was selected and NGS was performed. Then, by analyzing the obtained data, 3 genes:TANC1,ATXN1 and MEIS1 were selected as the final genes and a primer was designed for these genes. Of these three genes, only the MEIS1 gene is a common gene known in Restless Legs Syndrome, and the other two genes were tested for the first time and new variants were identified.
The final product of PCR, after electrophoresis, samples with suitable bands were selected for sequencing or sequencing and sent for sequencing. Finally, sequence analysis was performed and new variants effective in the incidence of disease were identified.
Study on candidate genes
After selecting the final genes, primers were designed and PCR was applied to all samples and the PCR product was sent for sequencing. Then, by analyzing the obtained sequences, a study was performed on the candidate genes and the effect of these genes on the incidence of the disease.
NGS
Exom sequencing data, due to the high volume of information they need, requires special software to study and study. The primary data obtained from sequencing devices is often in FASTQ format, in which there are components of each sequence that is a hack of four types of openings, along with other information that indicates the reading quality of each base. FASTQ data are readings of a certain length, the number of which varies depending on the size of the cover (for example, with a coverage of 150, about 100 million readings are obtained). The stages of analyzing exome data are generally divided into 5 stages: quality control and correction of readings, alignment, calling variants, screening (filtering), writing reports of each variant.
Exom data were analyzed using CLC genomics workbench and Biomedical genomic workbench software, which generally includes the following 6 steps:
A) Quality control and correction of readings
At this stage, the quality of each reading as well as each of the bases was evaluated. After determining the qualities, the correction was performed, so that if the game had a score of less than 20, it was removed. A person's score is calculated according to the following equation:
Q = 101og p / 1-p
At this stage, other information that showed the accuracy of the sequencer was checked, such as G-C content, reading length and number of repetitions. Adapters that were added to each of the readings in the sequencing step were also removed in this step.
B) Alignment
After the corrections made in the previous step, the readings that have passed that step are aligned with the reference human genome, which uses different algorithms.
Some examples of common algorithms for aligning exom data are:
- Burroes-Wheeler Transfermation (BWT)
- Smith-Waterman (SW)
- Dynamic programming, Hash-based index
- Seed-and-extend approach.
The human reference genome in this study was hg19 (Human reference genome).
C) Post-alignment processes
These processes included the following 2 steps:
1-Elimination of duplicates: When readings are attached to the reference genome to match it, there are too many connections in some parts of the genome, and this causes problems in later stages due to the high volume of these sequences, so with Given the quality and degree of similarity of duplicate sequences, some of them are omitted.
2- Re-alignment: In the alignment stage, some bases are placed awkwardly and others are placed correctly. The mismatch indicates that our sequence is similar to the reference genome sequence, and that mismatch may indicate a SNP. However, in some cases, this is due to a poorly aligned process. Rearranging is an attempt to eliminate these inconsistencies and empty spaces created during alignment by using algorithms. The alignment performed at this stage is called the local alignment.
D) Calling variants
At this point, all the bases that are awkwardly aligned with the reference genome sequence are identified.
They turn. These base changes can take the form of single nucleotide polymorphisms, addition, deletion, variation in the number of replications, and structural variation.
E) Screening
In humans, there can be variations of the game, and this variation does not cause disease. At this stage, non-pathogenic variants can be isolated using common variants obtained from the genomic projects of different individuals. These projects include the Thousand Genomes Project, the EXAC Project, the NHLBI Exome Project, the SNP Database, the DiscoverEHR Project, as well as local projects involving people in a specific region (such as Iranum). Another concept in the screening phase is the minimum allelic frequency. This means that the frequency of the desired alleles that are studied in different projects, if it is less than a minimum value, that allele is considered rare. Screening is also performed at this stage, based on the type of inheritance pattern, the elimination of large recurrences, and the isolation of areas that are likely to be unrelated to the disease.
C) Write reports of each variant
The last step in analyzing Exome data is to write reports for each variant. At this stage, the number of variants has been greatly reduced due to their screening compared to the first case. Using different databases with information on different genes and alleles, we begin to examine the variants. These databases contain information on biological, molecular and cellular pathways and through them the relationship between the desired variant and the disease can be investigated. These databases include the SNP RSID database,
The DDD (Deciphering Developmental) study cited articles in various databases such as pubMed, the OMIM database, and COSMIC cancer information.
In addition to examining the above databases, predicting the pathogenicity of a variant is done by different algorithms. The most famous of them are SIFT, GWAVA, POLYPHEN and CADD. Each of these patterns scores the variants, which indicates the pathogenicity of that variant.
The WES test was performed with a 150X cover with the Illumina Hiseq 4000 system platform with a read length of 30 million and 8.5 GB of data was generated.
data analysis:
In order to align the sequences, custom software such as NextGenMap was used to create SAM / BAM files.
Variant calling was performed using the GENOME ANALYSIS TOOLKIT (GATK) tool and databases including NCBI OMIM, HGMD and 1000 Genomes were used.
VCF files were annotated using ClinVar and EmVar.
PolyPhen and SIFT software were used to predict whether the displaced amino acid was likely to be detrimental to the function of the protein produced, based on the degree of retention of the displaced evolution during evolution.
The quality of the sequenced data was first checked by FastQC software version 0.11.3 (Andrews, 2010). Versatile software version 0.36 was used to filter the raw data credit and sort the low open quality data at both ends, and readings less than 36 were not considered.
Low-reading readings (Phred Score, Q≥30) were used for bioinformatics studies.
SANGER SEQUENCING
After analyzing the data and identifying the candidate genes, 30 samples were examined, sequenced and using websites ensembl,ncbi,mutation tester,varsome and finchtv,clc,dnstar and igv softwars,final results were extracted.