The use of machine learning techniques better known by its Anglicism "Machine Learning" is becoming more and more popular. It applies to a wide variety of fields, from image processing to processing human language, including computer security and the video game industry [2]. One of the main purposes of applying machine learning techniques is prediction. Any machine learning process can be subdivided into the following steps: data preprocessing, model design and training, and model evaluation. The preprocessing phase is still known as “Pre-processing” very often consists of choosing the most informative variables from the set of incoming vectors and normalizing them if necessary. This technique is also called "feature selection / feature engineering". The purpose of the selection of informative variables is to eliminate nonessential variables which can be considered as noise by the model and thus reduce its accuracy of prediction. Several research studies have been carried out in recent decades to propose either algorithms or criteria for the selection of informative variables.
In [6], authors are offered a technology for extracting informative variables based on the ontology of the field studied. The first phase of this technology consists of a preliminary extraction by the domain expert, of the variables which he deems useful without any other restriction than the variables chosen must be objects or attributes of the ontology. The second phase, still described in [6], consists of aggregating and filtering the preselected variables during the previous phase. The mathematical translation of this procedure is as follows: either Ω = { 𝜔1 ,…… ,𝜔𝑛} - the class labels, 𝑋 = { 𝑋1, …, 𝑋𝑛}- the set of variable identifiers. xis - a value taken by Xi. ℵi - the definition domain of Xi(xis ϵ ℵi). Let ∑ be the set of learning vectors.
For a xis value of Xi: xis ϵ ℵi and in the class ωk an aggregate ℵi(ωk) ϵ ℵi is defined as follows:
xis i(ωk) if and only if for ∀ωv∈Ω, v ≠ k : p(ωk/xis) > p(ωv/ xis)+∆, (1)
Where Δ a positive real number defining the dominance threshold.
At the end of phase 2, the author introduces the unitary predicates Bi (ωk) which take the value 'true' (1) if and only if xis ϵ ℵi(ωk). The final results of the second phase will, therefore, be the aggregates ℵi(ωk), the unitary predicates Bi(ωk) and i ϵ I(ωk): I(ωk) - a subset of the indexes of the variables Xi that passed the test of inequality (1).
During Phase 3 a probabilistic approach was introduced to determine cause-and-effect dependencies in the conjunction of the predicates Bi(ωk) and i ϵ I(ωk), Ω = { 𝜔1 ,…… ,𝜔𝑛 } and ωj ϵ Ω.
Let { ℵi(ωj) }| i ϵ I(ωk), j = 1, ..., m - the set of aggregates where p(ℵi(ωj))=|ℵi(ωj)|/|ℵi|, where | | represents the standard of the corresponding assembly. It is obvious that: p(Bi(ωj)) = p (ℵi(ωj)).
In the following three filters will be introduced to select the predicates:
The authors of [7] in Chap. 6 entitled "Dimensionality reduction" discuss the PCA (Principal Component Analysis [8]) algorithm based on the analysis of linear dependence. The idea is to replace the redundant variables with a new one that sums up the information contained in the original vector space. One of the first concepts to appear is SVD (Singular Value Decomposition) [9]. Let X be a matrix of dimension n * d; n- being the number of lines and d - the dimension of the initial vector space to be reduced. Let x be a column vector (transposed from one of the rows of X) and let v - be one of the new vector entities (main components) that we are looking for. According to the SVD theorem, matrix X can be decomposed sum follows 𝑋 = 𝑈𝛴𝑉𝑇, U and V being orthogonal matrices such that 𝑈𝑇𝑈 = 𝐼 𝑒𝑡 𝑉𝑇𝑉 = 𝐼. ∑ a diagonal matrix containing the singular values of X
In the following, the author develops the main stages of PCA [8]. The first step is to refocus the data in the initial vector space. The second step is the linear projection of the initial data vector x into new vectors v. The third step is to maximize the variance of the coordinates. The equation of the coordinate vector after projection is 𝑧=Xv. |
The objective function for the main components is maxw WTW, where WTW = 1. It appears that the optimal form of W is the eigenvector of XTX.
In [10] the authors based on Bayes' theorem:
and on Information theory [11], in particular, the notion of self-information 𝐼(𝑥𝑖)=−𝑙𝑜𝑔2𝑃(𝑥𝑖) proposes an algorithm supposed to improve the forecasting accuracy of the NBC (Naive Bayesian Classifier) model [12].
The algorithm divided into two stages consists of its first phase of calculating the weight of each variable 𝑊𝐹𝑖 = − 𝑙𝑜𝑔2𝑃(𝐶|𝐹𝑖). The second phase consists of selecting the variables whose weight 𝑊𝐹𝑖 would have exceeded a certain threshold δ determined by the user.
A new method called LFE (Learning Feature Engineering) is proposed in [13]. At the heart of the LFE [14], we find a set of multilayer classifying perceptrons. Each perceptron corresponding to a transformation. LFE takes as input a dataset and recommends a set of paradigms allowing reconstructing a subset of the informative variables of the initial dataset. Each paradigm is made up of a transformation and the ordered list of variables for which the transformation was most efficient.
In the following, we will propose a model based on the machine learning methods already mentioned to predict the number of COVID-19 cases in Morocco per day. Through the use of the library dedicated to distributed learning methods Spark ML [3].