Data Mining
Raw gene expression data of Down syndrome samples and normal samples were downloaded from the Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) of the National Center for Biotechnology Information (NCBI). For the analyses performed in the present study, we selected the human genes encoding for the five high mobility group N (HMGN) proteins previously consigned in the Gene Entrez of the NCBI database (https://www.ncbi.nlm.nih.gov/gene), (table 1). Moreover, for all calculations we used the log2 transformed expression values of free a access DNA microarray experiment whose registration code in the GEO database is GSE59630 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59630), previously deposited by Olmos-Serrano et al. 2016. [25].
According with information consigned in the GEO database, the selected microarray experiment included gene expression data of more than 17,000 probes from 58 post-mortem brain samples of DS patients (25 from females and 33 from males) and 58 euploid samples as normal controls (25 from females and 33 from males), that were classified by gender, age and also by some brain areas including the hippocampus (HIP), cerebellar cortex (CBC), dorsolateral prefrontal cortex (DFC), orbital prefrontal cortex (OFC), ventrolateral prefrontal cortex (VFC), medial prefrontal cortex (MFC), primary somatosensory cortex (S1C), inferior parietal cortex (IPC), primary visual cortex (V1C), superior temporal cortex (STC), Inferior temporal cortex (ITC). Nevertheless, for the present study we decided to analyze, not only the brain as a whole but also OFC, MFC, HIP and CBC brain regions which are highly associated to neurophenotype of Down syndrome.
Data preprocessing
The robust multiarray analysis (RMA) algorithm [30] in Affymetrix Power Tools (APT; http://www.affymetrix.com/) was applied to perform background correction and standardization for all raw data, aiming to filter false-positive data. The applied criterion was as follows: at least half the samples had PLIER signal intensity values greater than 100. [27].
Quantification of the differential HMGN genes expression
Raw intensity log2 data of each experiment were used for the calculation of Z-score [28]. Z-scores of the protein coding genes analyzed, were calculated according to the equation (1):
![](https://myfiles.space/user_files/69519_bce2c0439cd956a6/69519_custom_files/img1628682042.PNG)
Equation 1. Z-score formula
All Z-score values were normalized on a linear scale -3.0≤0≥+3.0 (two-tailed P value <0.001). From Z-score data we calculated the mean values per gene and per structure in brain samples of DS and euploid controls. This data was used to calculate the Z-ratio (Equation 2) with is a measure to estimate differential gene expression; genes with values over 1.96 are considered over-expressed [28].
![](https://myfiles.space/user_files/69519_bce2c0439cd956a6/69519_custom_files/img1628682048.PNG)
Equation 2. Z-ratio formula
Gene-dosage imbalanced quantification
To find out the gene dosage imbalance of that HSA21 genes that coexpressed with the five HMGN genes along the cerebral structures of DS brain samples, first we calculate the M values according to the equation 2.1, and then we used the M value to calculate the ratio of the dosage imbalance R (DS Control ratio) as shown in Equations 2.1 and 2.2 [29].
Equation 2.1 M-value formula
![](https://myfiles.space/user_files/69519_bce2c0439cd956a6/69519_custom_files/img1628682058.PNG)
Equation 2.2. R (DS/Control ratio) formula
![](https://myfiles.space/user_files/69519_bce2c0439cd956a6/69519_custom_files/img1628682086.PNG)
R values ranging from 0.80 to 1.30 were considered as normal balanced (two copies of gene); on contrary, if R values ranged 1.4≤1.5≥1.7, genes are dosage-imbalanced by triplication (three copies per gene), but if R ratio is greater than 1.8 genes are amplified (more than three copies).
Construction of HMGN genes network using GeneMania
To build the gene interaction networks of HMGN genes, we applied the free access platform GeneMANIA (http://www.genemania.org) a real-time multiple association network actively developed at the University of Toronto, in the Donnelly Centre for Cellular and Biomolecular Research that use a mssive set of functional association data. [30]. All calculation carried out in the present study were processed using the updated 2018 version
Protein-Protein interaction analysis
To computational simulate the interaction between each HMGN with several histones of the core particle and H1, we obtain the data from BioGRID (Database of Protein, Chemical, and Genetic Interactions) a free access database (https://thebiogrid.org/) [31]. BioGRID is an interaction repository with data compiled through comprehensive curation efforts. The current index is version 3.5 and permits to perform searches of 69,922 publications for 1,706,694 protein and genetic interactions from humans. All data are freely provided via their search index and available for download in standardized formats. The different searches performed in the present study, were from data updated by January of 2019. [31]
Statistical analysis
For comparing mean values of Z-ratio of DS brain, we performed mulivariated statistical analyses among the different brain cortex structures between DS patients and euploid controls. The Z test/Two-tailed was used to calculate differences in HMGN differential expression. The p-values were calculated using the web tool P-value from Z-score Calculator (https://www.socscistatistics.com/pvalues/normaldistribution.aspx). In all cases we use an alfa of 0.05 to test the significance of H0. To calculate the statistical differences in the mean log2 values of DS and Controls for gender, age, hippocampus, cerebellar and brain cortex structures, we apply the t-test for two paired samples/Two-tailed test with an alfa of 0.05.
Principal Component Analysis (PCA), and a Hierarchical Cluster (Heat-map) were employed as a computational procedure for the classification of multiclass gene expression in brain structures between DS and control samples per sex and age. All analyses were run in SPSS program version 25.0 (https://spss.softonic.com/) and Cytoscape 3.6 (https://cytoscape.org/release_notes_3_6_0.html).