Data resources
A gene expression dataset includes 47323 genes and 60 patients, 32 of who received standard treatment and 28 of whom received standard therapy with lithium, which was obtained as a result of Affymetrix analysis, which researchers can download from the GEO website under the name GSE45484. Of these patients, four people in the control group and nine people in the treatment group responded to the treatment. These data were first analyzed by Beach et al (B). This study is designed to classify patients based on genes whose expression in peripheral blood can be early markers for response to lithium treatment in patients with BD. Although changes in peripheral blood gene expression may not be directly related to mood symptoms, differences in treatment response at the biochemical level may mask some heterogeneity in clinical response to lithium (16).
Feature Selection
The negative aspect of data generation technologies is the inclusion of irrelevant variables (17, 18). These extraneous variables decrease the model's performance, increase the complexity of the model, and reduce the understanding of appropriate relationships. Therefore, removing irrelevant variables is very important. High-dimensional data sets are often prone to the "large number of variables, small sample size" problem. This problem is explicitly addressed in PLS regression. This method is a supervised method that was developed to solve the prediction problem in multivariate problems. This study used the PLS regression and VIP index to select essential variables (19). After choosing the necessary variables to determine the homogeneous and identical patterns in the selected genes, we used the biclustering method, and the representative of each cluster was discovered based on the principal component analysis method. The classification was done based on the Bayesian Dirichlet network model with uninformed priors using these representatives and the treatment response status of the patients. Finally, the accuracy of the classification was obtained using Roc curve analysis. The general steps of feature extraction and data classification in this study are as follows (Fig. 1):
• Gene expression data preprocessing; Examining the normality and dispersion of the data.
• Selection of a set of genes containing information; Removal of neutral and unchanged genes.
• Biclustering of data: discovering gene sets with the same patterns using the plaid algorithm.
• Cluster representative selection: using principal component analysis to select a representative.
• Classification of data: using the DBNmodel.
• Evaluation and validation: using Roc curve analysis.
Statistical Analysis
We used R3.6.2 software for data analysis, and Bioconductor, Biclust, bnlearn, ropls, pROC, and Bayesian Network software packages were used.
A Bayesian network is called a directed graph (DAG) \(\varpi\) whose parameters, in the form of conditional probability, convert this structure from a qualitative state to a quantitative state. The vertices of this graph are random variables, and its directed edges represent the dependence between the vertices. The parameters in the network determine the degree of this dependence. The Bayesian network helps to learn causal relationships and is intuitively understandable due to its graphical structure. A Bayesian network consists of two parts: Bayesian structure and Bayesian probability. The structure of Bayesian networks does not have a directed circuit and Xi can be arranged so that the ancestors of Xi are in the set {X1,X2,...,X (i−1)} and its descendants are in the set {X (i+1),...,Xn }, so according to the law of total probability, the joint probability function can be written as follows:
$$P\left({X}|\varpi \right)=\prod _{i=1}^{n}P\left({X}_{i}|{X}_{1},\dots ,{X}_{i-1}\right)=\prod _{i=1}^{n}P\left({X}_{i}| \prod {X}_{i}\text{a}\text{n}\text{c}\text{e}\text{s}\text{t}\text{o}\text{r}\text{s} \right)$$
Learning the Bayesian network for data is usually done based on the Bayesian approach, and by using a score-based structure, the goal is to find a network for \(\varpi\) that has the highest posterior distribution probability \(P\left({X}|\varpi \right)\) when D is a sample from X.
$$p\left(D|\varpi \right)=\prod _{i=1}^{n}P\left({X}_{i}| \prod {X}_{i}\text{a}\text{n}\text{c}\text{e}\text{s}\text{t}\text{o}\text{r}\text{s} \right)=\prod _{i=1}^{n}\left[\int P\left({X}_{i}|\prod {X}_{i}\text{a}\text{n}\text{c}\text{e}\text{s}\text{t}\text{o}\text{r}\text{s}, {\theta }\right)P\left(\theta |\prod {X}_{i}\text{a}\text{n}\text{c}\text{e}\text{s}\text{t}\text{o}\text{r}\text{s}\right)d\theta \right]$$
If the Dirichlet conjugate prior is used, the posterior distribution is also Dirichlet and we have
$$BD\left(\varpi ,\text{D};\text{a}\right)=\prod _{i=1}^{n}BD\left({X}_{i}|\prod {X}_{i}\text{a}\text{n}\text{c}\text{e}\text{s}\text{t}\text{o}\text{r}\text{s}; {\text{a}}_{i}\right)=\prod _{i=1}^{n}\prod _{j=1}^{{q}_{i}}\left[\frac{{\Gamma }\left({a}_{ij}\right)}{{\Gamma }({a}_{ij}+{n}_{ij})}\prod _{k=1}^{{r}_{i}}\frac{{\Gamma }({a}_{ijk}+{n}_{ijk})}{{\Gamma }\left({a}_{ijk}\right)}\right]$$
that
\({r}_{i}\) number of status of \({x}_{i}\)
\({q}_{i}\) is the number of possible values of \(\prod {X}_{i}\text{a}\text{n}\text{c}\text{e}\text{s}\text{t}\text{o}\text{r}\text{s}\) that are assumed to be 1 if Xi has no ancestors
and a is Dirichlet parameter(15, 20).