To evaluate the dataset, numerous traditional and Neural Networks are developed. The experimental setting and detail of models are explained in this section.
4.1 Machine Learning Classifiers
We assessed the effectiveness of seven traditional learning classifiers namely Decision Tree (DT), Passive Aggressive (PS), Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), AdaBoost (AB) and Naive Bayes [22][23]. The detail of these approaches is given below.
1) Decision Tree Classifier
It is a well-known machine learning classifier that follow a tree-like structure to perform detection. We used decision tree algorithm, trained on a labeled fake and real news dataset, where the features and their corresponding labels are used to build the tree. Decision tree classifiers have the advantage of being easy to interpret and explain, which can be important in applications such as fake news detection where transparency and accountability are important [24].
2) Naïve Byes Classifier
It is a commonly used algorithm for text classification tasks. In identification of misinformation, Naive Bayes classifier works by analyzing the text of news articles and determining the probability that a given article is authentic or false based on the occurrence of certain words or phrases [25].
3) Logistic Regression
For binary classification issues, the supervised learning algorithm logistic regression is used. This classifier works by modeling the likelihood of the output variable (e.g., fake or real) given the input features. The output of the classifier is a probability score either 0 or 1, with scores near to 0 suggesting a high likelihood of news article being real and scores near to 1 suggesting a high likelihood of news article being fake.
4) Random Forest
A supervised learning classifier called a Random Forest [24] is used for both classification and regression tasks. In simple terms, a Random Forest classifier works by creating multiple decision trees, each been trained on a different random news sample and then merging the outcome of these trees to make an actual output. Such technique increases the model's accuracy while reducing the effects of overfitting.
5) SVM Classifier
Finding a hyperplane that could most effectively classify the data points into different groups is the basic goal of Support Vector Machine (SVM). The hyperplane is selected in such a way that the margin which is the distance between the hyperplane and the closest data points is maximized. By applying a kernel function to convert the samples into a higher-dimensional space, SVM is capable of handling both linear and non-linear classification issues.
6) Passive Aggressive Classifier
A passive aggressive (PA) can be used for fake news identification by training the classifier on a labeled news articles dataset, in which each article is categorized either real or fake. This algorithm will then learn to differentiate among real and fake news based on attributes such as language used, the sources cited, and the structure of the article. During training, the algorithm uses a passive aggressive update strategy to adjust its weights and biases to better classify each article.
7) AdaBoost
An ensemble learning algorithm known as AdaBoost that merge various base classifiers to build a meta classifier. It gives maximum weights to the examples that were misclassified by the previous weak classifiers, and it focuses on the examples that are difficult to classify.
4.2 DEEP LEARNING CLASSIFIER
Few neural networks like Recurrent neural network (RNN), Bi-LSTM, and Long short - term memory (LSTM) is implemented in this research. The models are built using the ADAM optimizer with a loss function of binary cross entropy and a learning rate of 0.001. The last output layer contains neurons and the sigmoid activation function. Over the course of 10 epochs, these neural networks are trained using a batch size of 64.
1) RNN
A sort of neural network called a recurrent neural network (RNN) is designed to handle sequential data such as time series or natural language. Unlike feedforward neural networks, which process each input independently, RNNs maintain an internal state that allows them to process sequences of inputs and capture temporal dependencies.
Layers used in RNN model.
-
Embedding Layer: Pass each token through an embedding layer to convert it into a dense vector representation. This can be a pre-trained embedding or an embedding layer that learns the embeddings during training.
-
Recurrent Layer: Pass the sequence of embeddings through a recurrent layer (e.g., LSTM, GRU) that processes the sequence and captures interrelationship between words.
-
Attention Layer: Add an attention mechanism to the recurrent layer to bring attention to important aspects of the input sequence. This can strengthen the model's capability to recognize crucial data for identifying between false and authentic news.
-
Dense Layers: Pass final hidden state of the recurrent layer through one or more dense layers to make a prediction. The output can be a binary classification (fake or real) or a probability score indicating the likelihood of the input being fake news.
-
Training: Train the neural network using a labeled news dataset optimizing a suitable loss function like binary cross-entropy.
2) LSTM
The issue of vanishing gradients in traditional RNNs was tackled by the creation of the LSTM architecture, which can make it difficult to train these networks on long sequences of data. The key innovation of the LSTM is the addition of memory cells, which can store information over long periods of time and selectively forget or remember information as needed.
Our strategy is to train a LSTM approach on a developed dataset of news articles, both real and fake, and use this model to classify new articles as real or fake. The LSTM model can take in the text of the article and produce a probability score indicating the likelihood that the article is authentic or unauthentic.
The Dropout unit receives an output of the embedding layer and performs calculations as shown in Fig. 11. The Sigmoid activation function was assigned to the main output layer. With batch sizes of 64, the model was trained across 10 epochs. This model's final (average) accuracy is 91.8%.
3) BiLSTM
A subtype of Recurrent neural network (RNN) known as a Bi-LSTM model employs two LSTM layers, one of which analyses the input sequence in a forward direction and the other in a backward direction. The output of two LSTM layers is concatenated to produce a final output that takes into account both the previous and future context of the input sequence.
This model has been found to be efficient to detect fake news, due to their ability to capture both previous and future context. The Bi-LSTM unit receives the output of the embedding layer and performs calculations as shown in Fig. 12. The Sigmoid activation function was assigned to the main output layer. With batch sizes of 64, the model was trained across 10 epochs. This model's final (average) accuracy is 89.9%.
4.3 Ensemble Learning Approach
To create the best possible detection mechanism, we construct an ensemble approach that integrate multiple learning classifiers or models. This detection model is called meta model. When compared to the base learners alone, this meta model performs more effectively. The primary types of ensemble methods are bagging, boosting, and stacking. For deep and machine learning classifiers, we have adopted the ensemble stacking approach.
Ensemble learning approach can be effective for detection of false news, as they combine the predictions of multiple classifiers to improve accuracy and reduce overfitting.
4.4 Stacking Approach
In this approach, a classifier known as meta-classifier receives the final outcomes of base classifiers as input and seeks to discover the best way to combine the input results to get an improved output result.
1) Ensemble Machine Learning Approach
We trained seven machine traditional algorithms namely Decision Tree (DT), Passive Aggressive (PS), Logistic Regression (LR), Support Vector Machine (SVM), AdaBoost (AB), Random Forest (RF) and Naive Bayes [22][23][25] on our developed news dataset and examined the performance as shown in Fig. 13. For implementation of each classifier, we have utilized the scikit-learn python library.
2) Ensemble Deep Learning Approach
We combined three neural networks and trained them on developed dataset as shown in Fig. 14.