The overall procedure of our method is illustrated in Fig. 1. In the following sub-sections, details are presented:
mRNA expression profiles of GBS patients and healthy controls
mRNA Microarray analyses were performed on the PBMCs of 4 GBS patients and 4 age- and gender-matched healthy controls.
GBS patients fulfilled the standard diagnostic criteria for GBS were recruited from Tianjin Medical University General Hospital [12]. Blood was sampled when patients were within the peak timing of manifesting GBS and before treatment with intravenous immune globulin (IVIG), plasma exchange or glucocorticoid. Informed consent was obtained at enrollment from all patients or legally acceptable surrogates. The present study was approved by the ethical review committees of Tianjin Medical University General Hospital. Human peripheral blood mononuclear cells (PBMCs) were isolated from all GBS patients and healthy controls. The labeled cRNAs were hybridized onto the human LncRNA Expression Microarray V3.0 (Arraystar, Rockville, MD), which was designed for the global profiling of human lncRNAs, mRNAs and protein-coding transcripts.
Totally an expression profile dataset of 8 samples, 21,620 probes was obtained. Then signal intensity was log2 transformed and normalized and 14,707 genes were derived from source probes.
The Mrmr Method
We employed the mRMR (Maximum Relevance Minimum Redundancy) method [5, 13–15] to rank the importance of all the 14,707 genes examined, according to the Maximum Relevance Minimum Redundancy criterion. In this procedure, each gene was regarded as a feature. The Maximum Relevance criterion selects features most important in discriminating GBS samples and controls. The Minimum Redundancy criterion excludes features containing redundant information among the selected features. Briefly, to rank features using mRMR criteria, two values were calculated for each feature: value A for relevance and value B for redundancy. Then, the value A-B is used to measure the feature; the higher the value A-B is, the higher the feature ranks. For details of the mRMR method, please refer to [5, 13–15].
Two ordered list were generated by the mRMR method, one was called the MaxRel Table, and the other was called mRMR Table. In the MaxRel table, all the features were ranked only by the Maximum Relevance criterion. In the mRMR Table, they were ranked by the mRMR criteria, i.e., a feature with a smaller index in such a table indicated that it had a better trade-off between the maximum relevance and the minimum redundancy and thus could be more important. The two tables were provided in Supporting Information S1. In this study, we select the top 5%, i.e., 735 features, from the mRMR Table. The 735 genes were regarded as significant differentially expressed genes from the expression profiles and were analyzed in the next procedures.
In the 735 significant genes, there were 271 up-regulated genes and 464 down-regulated genes, producing 1,619 and 2,590 protein products respectively. The up-regulated genes and down-regulated genes were analyzed respectively in the next procedures. The number of the genes and proteins were summarized in Table 1.
Table 1
Number of genes or proteins in each step of our computational procedures.
|
Significant differentially expressed genes by mRMR
|
Shortest path proteins
|
Shortest path proteins with betweenness > threshold
|
Final GBS related genes
|
|
genes
|
proteins(ENSP)
|
Up-regulated
|
271
|
1,619
|
858
|
20
|
20
|
Down-regulated
|
464
|
2,590
|
1,273
|
24
|
23
|
Ppi Data From String
STRING (Search Tool for the Retrieval of Interacting Genes) [16] (http://string.embl.de/), is an online database resource which compiles both experimental and predicted protein-protein interactions with a confidence score to quantify each interaction confidence. A weighted PPI network can be retrieved from STRING, in which proteins in the network are represented as nodes, while interactions between proteins are given as edges marked with confidence scores if they are in interaction with each other. Interacting proteins with high confidence scores in such a PPI network are more likely to share similar biological functions than non-interactive ones [16–18]. This is because the protein and its interactive neighbours may form a protein complex performing a particular function or may be involved in the same pathway.
In this study, we constructed a graph G with the PPI data from STRING (version 9.0). In such a graph, proteins were represented as nodes; however the weight of each interaction edge was assigned a d value instead of a confidence score (s). The d value was derived from the confidence score s according to the equation . Thus, the d value can be considered as representing protein distances to each other: the smaller the distance, the higher the interaction confidence score and the more similar the functions they have.
In this study, we analyzed in such a graph every two protein interactions from the significant differentially expressed proteins.
Shortest Path Tracing
The Dijkstra algorithm were often used to find the shortest path in the graph G between two given proteins. In this study, the Dijkstra algorithm was implemented with R package 'igraph'. A shortest path was traced from each of the 1,619 proteins to all the other 1,618 proteins in the graph, which was for the up-regulated genes. For down-regulated genes, the shortest path of each of the 2,590 proteins to all the other 2,589 proteins was traced in the graph.
Then we picked out all proteins existing on the shortest paths and ranked these proteins according to their betweenness. For up-regulated genes, 858 shortest path proteins were retrieved, while for down-regulated genes, 1,273 shortest path proteins were retrieved, as list in Table 1. The 858 and 1,273 shortest path proteins were ranked by betweenness respectively.
The betweenness threshold should be set in order to select significant ones from a ranked list. By a computational method, we can set the threshold differently to yield different number of gene results. The more the threshold is, the less the genes are. Generally speaking, it is practical to select the top most 20 to 30 significant genes for further analysis or for experimental validation. Furthermore, the threshold values should be different for up-regulated genes and for down-regulated ones, because the number of path tracing proteins and the number of shortest path proteins were different.
In this study, for shortest path proteins from the analysis of up-regulated genes, top 20 proteins (20 genes) with betweenness > 1,400 were selected, while for down-regulated genes the top 24 proteins (23 genes) with betweenness > 4,000 were selected. These 20 + 23 genes were regarded as the final significant GBS-related genes in this study and they were list in Table 2, respectively.
Table 2. The 20 GBS-related genes identified by PPI network from the analysis of up-regulated expressed genes from the expression profiles
ENSP
|
Gene
|
Betweenness
|
ENSP00000269305
|
TP53
|
13,036
|
ENSP00000344818
|
UBC
|
6,065
|
ENSP00000344456
|
CTNNB1
|
4,569
|
ENSP00000275493
|
EGFR
|
3,294
|
ENSP00000263253
|
EP300
|
2,807
|
ENSP00000326366
|
PSEN1
|
2,456
|
ENSP00000417281
|
MDM2
|
2,445
|
ENSP00000270202
|
AKT1
|
2,350
|
ENSP00000221494
|
SF3A2
|
2,184
|
ENSP00000264657
|
STAT3
|
2,182
|
ENSP00000339007
|
GRB2
|
2,150
|
ENSP00000324806
|
GSK3B
|
2,094
|
ENSP00000284981
|
APP
|
2,046
|
ENSP00000357879
|
PSMD4
|
1,754
|
ENSP00000350941
|
SRC
|
1,655
|
ENSP00000356425
|
UCHL5
|
1,614
|
ENSP00000361626
|
YBX1
|
1,574
|
ENSP00000338018
|
HIF1A
|
1,444
|
ENSP00000262613
|
SLC9A3R1
|
1,438
|
ENSP00000252486
|
APOE
|
1,410
|
The 23 GBS-related genes identified by PPI network from the analysis of down-regulated expressed genes from the expression profiles
ENSP
|
Gene
|
Betweenness
|
ENSP00000269305
|
TP53
|
36,055
|
ENSP00000344818
|
UBC
|
15,309
|
ENSP00000275493
|
EGFR
|
12,072
|
ENSP00000344456
|
CTNNB1
|
12,052
|
ENSP00000270202
|
AKT1
|
10,629
|
ENSP00000339007
|
GRB2
|
10,496
|
ENSP00000221494
|
SF3A2
|
9,484
|
ENSP00000206249
|
ESR1
|
8,165
|
ENSP00000263253
|
EP300
|
7,353
|
ENSP00000264657
|
STAT3
|
6,262
|
ENSP00000350941
|
SRC
|
6,111
|
ENSP00000417281
|
MDM2
|
6,002
|
ENSP00000362649
|
HDAC1
|
6,001
|
ENSP00000348461
|
RAC1
|
5,995
|
ENSP00000329357
|
SP1
|
5,560
|
ENSP00000361626
|
YBX1
|
5,343
|
ENSP00000264033
|
CBL
|
5,062
|
ENSP00000337825
|
LCK
|
4,852
|
ENSP00000314458
|
CDC42
|
4,798
|
ENSP00000304903
|
CD2BP2
|
4,549
|
ENSP00000358490
|
CD2
|
4,549
|
ENSP00000324806
|
GSK3B
|
4,281
|
ENSP00000046794
|
LCP2
|
4,043
|