Gene expression profile and probe labeling
Three microarray datasets (GSE88837, GSE88940 and GSE109597) were downloaded from the gene expression database (https://www.ncbi.nlm.nih.gov/geo/) for analysis. GSE88837 was extracted from the U133 + 2.0 sequence of gpl570 Affymetrix human genome for gene expression. In our study, subjects with BMI ≥ 30 were defined as obese (Jensen et al. 2014). A total of 30 subjects (including 15 obese and 15 healthy controls) were analyzed. We used the Affy package in R (Gautier et al. 2004) to convert the cel files into an expression value matrix and the RMA method to normalize the matrix. The Bioconductor package in R software was used to convert probe data into genes (Gentleman et al. 2004). If a gene corresponded to several probes, we choose the average expression value for further analysis. GSE88940 extracted from gpl13534, a human methylation 450 gene chip, was used for DNA methylation analysis, which consisted of 15 objects and 15 lean controls. All data processing was done in GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/). In these two data sets (GSE88837 and GSE88940), there are 20 samples of the same person. We only analyze these 20 samples. GSE109597 was used as the validation data set, and the analysis method was the same as used for GSE 88837.
Differential expression and analysis of methylated genes (DEMGs)
We compared obese with control subjects to explore the differential expression genes (DEGs) of the marginal envelope in R (Miao et al. 2019). The threshold value was set as | log2 fold change |≥ 2, P < 0.05. GEO2R was used to determine the methylation sites (DMPS) by comparing the differences between normal and obese subjects. DMPS located in gene regions were assigned to corresponding genes, which were defined as differentially methylated genes (DMGs). The threshold value was set as |log2 fold-change| (Δβ) > 0.05, P < 0.05. Then, we matched the DEGs with DMGs, and only the matched genes (DEMGs) were selected for further analysis.
Functional enrichment analysis
All functional enrichment analysis on DEGs was performed on cluster profiler and dose package in R (Yu et al. 2012). The complete functional enrichment analysis includes gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) approach, and Disease Ontology (DO). The threshold value of analysis was set as adjust-p < 0.05 and error detection rate (FDR < 0.05).
Protein-protein interaction (PPI) network and module analysis
We used the string database (version 11.0) (Szklarczyk et al. 2019) to explore protein prediction and experimental interactions. There are many methods of database prediction, including co-expression experiment, text mining, co-occurrence, gene fusion, database, and neighborhood. , we used the combination fraction to reveal the protein pair interactions in the database. Then, we localized DEMGs to PPIs to identify the key genes in the network with the cut-off value set to a comprehensive score > 0.9 (Miao et al. 2018). As a valuable method, a degree is used to study the role of protein nodes in the network. Using the molecular complex detection (MCODE) on Cytoscape (version 3.71), the most significant clustering module and the main clustering module were explored (Shannon et al. 2003; Bader et al. 2003). For further analysis, we set ease ≤ 0.05 and set ≥ 2 as the cutoff value and MCODE score > 8 as the threshold value.
Validation of DEMGs
We used prism 8.0 GraphPad (Miao et al. 2019) for scatterplots of methylation and gene expression to detect the relationship between methylation and gene expression. Then calculate the correlation equation to judge whether the equation has statistical significance. The DEMGs were validated in GSE109597, which contained 84 unrelated samples. After grouping according to BMI (> 30 and < 30), the expression of DMEGs in the two groups was compared with ggplot2 in R.