BRCA1 and BRCA2 are genes with tumor suppressor activity, and they are involved in
a considerable number of biological processes allowing the regulation of the cell
replication cycle. A mutation in one of these two genes has a significant probability of
causing cancer. We have set up within the platform a machine learning algorithm based
on the random forest to predict pathogenicity in colorectal, melanoma, lung, and glioma
cancer. but this algorithm has revealed its limits when we want to predict on more
complex genes like BRCA1 and BRCA2. To help the biologist in the classification of
tumors, we decided to develop a deep learning algorithm.
The question we ask ourselves when we want to construct a neural network is how
many hidden layers and neurons should we use. If the number of inputs and outputs is
defined by the problem that we require to resolve, the number of hidden layers and
neurons is difficult to define because there is no pre-established rule. The number of
hidden layers and neurons that make up each layer of the neural network has an
influence on the performance of system predictions. There are different methods for
finding the optimal architecture like grid search or based on empirical equations. All
these techniques can be very time-consuming. In this paper, we will present the two
packages that we have developed, the genetic algorithm (GA) and the particle swarm
optimization (PSO) to optimize the parameters of the neural network for the prediction
of the pathogenicity of the BRCA1 and BRCA2 genes. We will compare the results
obtained by the two algorithms. We used datasets collected from our NGS analysis of
BRCA1 and BRCA2 genes to train deep learning models. This represents a data
collection of 11,875 BRCA1 and BRCA2 variants (BRCA1 benign 2,632, BRCA1
pathogenic 2,660, BRCA2 benign 3,446, BRCA2 pathogenic 3,137). Our preliminary
results show that the PSO provided the most significant architecture in terms of hidden
layers and the number of neurons compared to grid search and GA. The optimal
architecture found by the PSO algorithm is composed of 6 hidden layers with 275 hidden
nodes with an accuracy of 0.98, precision 0.99, recall 0.98, and a specificity of 0.99.