We adopted a three-step methodology to explore potential comorbidities between the retina and the brain. First, we performed a comprehensive literature search using NCBI/PubMed. Next, we leveraged the DisGeNET discovery platform to investigate gene-disease and disease-disease associations. Finally, we evaluated clinical relevance through ClinVar databases and conducted gene variant analysis using Karyosoft tools.
DisGeNET Cytoscape App
The DisGeNET Cytoscape app (Janet Piñero et al. 2021) is an open-source tool designed for Cytoscape 3.x. It allows users to visualize, query, and analyze networks representing gene-disease and variant-disease associations from DisGeNET. The SQLite version of the DisGeNET database structure that supports the app is available on the DisGeNET website. A detailed tutorial on using the app can be found online. Once installed and launched, the DisGeNET control panel features two tabs to guide queries over different network views: gene-disease and variant-disease networks. These networks are bipartite graphs with two types of vertices (genes/variants and diseases) and edges connecting different kinds of vertices (e.g., a gene with a disease). Multiple edges can connect two vertices, representing various evidence for the associations, such as different database sources, publications, or DisGeNET association types.
DisGeNET is a comprehensive discovery platform that houses one of the largest publicly available collections of genes and variants associated with human diseases. It integrates data from expert-curated repositories, GWAS catalogs, animal models, and scientific literature. The current version, DisGeNET v7.0, includes 1,134,942 gene-disease associations (GDAs) involving 21,671 genes and 30,170 diseases, disorders, traits, and clinical or abnormal human phenotypes. Additionally, it contains 369,554 variant-disease associations (VDAs) between 194,515 variants and 14,155 diseases, traits, and phenotypes.
In this project, the web interface’s Search and Browse functionalities were utilized to study gene-disease and disease-disease associations by computing the number of shared genes and variants between retinal and brain diseases. Disease-disease associations (DDAs) were explored through the search panel by searching for both diseases.
Curated data in DisGeNET includes gene-disease associations (GDAs) provided by expert-curated resources. This involves data annotation, publication, and presentation to maintain its value over time and remain available for reuse and preservation. Several vital resources contribute to this curated data: UniProt/SwissProt offers curated information on protein sequences, structures, and functions; the Comparative Toxicogenomics Database (CTD) provides manually curated data on gene-disease relationships, focusing on the effects of environmental chemicals on humans; Orphanet serves as a reference portal for rare diseases and orphan drugs; The Clinical Genome Resource (ClinGen) aims to define the clinical relevance of genes and variants for precision medicine and research; Genomics England PanelApp is a publicly available knowledge base for creating, storing, and querying virtual gene panels related to human disorders; The Cancer Genome Interpreter (CGI) identifies known oncogenic alterations; and PsyGeNET is a resource for exploring psychiatric diseases and their associated genes.
Gene Variant using NCBI.
NCBI databases were accessed using the NCBI website. The search bar was used to enter the gene name (ABCA4 & HUWE1 ) and/or accession number. An appropriate database (nucleotide) was selected, and relevant search results were navigated between formats like Gen Bank or FASTA for easier viewing. Further, for variant viewing, the gene name, chromosomal location, or variant ID was entered into the search box of the variation viewer. The results were filtered, and results were sorted based on criteria like clinical significance, molecular consequence, or population frequency. We picked the Sequence Read Archive (SRA) human database that is maintained by the National Center for Biotechnology Information (NCBI) and stores high-throughput sequencing data from various sequencing technologies like Illumina, Ion Torrent, and PacBio [The Sequence Read Archive (SRA) (nih.gov)]. However, we picked only Illumina data as it is one of the most widely used sequencing technologies known for its high throughput and accuracy. We chose a DNA source (sequencing data originates from DNA samples), where sequencing data was generated using a paired-end sequencing approach. Both ends of a DNA fragment are sequenced, providing more information and improving the accuracy of the sequence assembly. The downloadable file extracted was studied using the Karyosoft suite.
Variant analysis in Human Database using NCBI and Karyosoft Tools
In this study, we utilized Karyosoft’s Variants, a web-based no-code platform for discovering SNPs and Indels. This platform employs an accelerated Genome Analysis Toolkit (GATK) workflow optimized for GPU-based cloud servers, allowing the parallel processing of up to 8 samples simultaneously, significantly reducing analysis time. The Variants platform automates the discovery and interpretation of SNPs and Indels.
We also used Karyosoft’s Variants Mining Studio, a web-based no-code platform that serves as a central repository for the millions of discovered SNPs and Indels. Typically, the SRR number was generated from the NCBI SRA database, with selection criteria focused on a curated human database backed by clinical data. This unique identifier, assigned to a specific sequencing run (SRR) in the NCBI Sequence Read Archive (SRA), was used to upload VCF files of SNPs and Indels into the Variants Mining Studio. This platform simplifies the management and analysis of genomic variants, such as SNPs and Indels, in genes associated with the co-morbidity of retinal and brain diseases. It was used to identify variants by gene name, annotation effect, chromosome number, and homozygous and heterozygous states.