Artificial Neural Networks
The neurons in the human body served as an inspiration for the idea underlying the ANN. Through the synaptic connections at the end of the axon, these linked neurons transmit information from one cell body to another. The neural network system of the human body is extremely intricate and intricately coupled since each neuron has hundreds of connections (Graupe, 2007). Warren McCulloch and Walter Pitts created a condensed mathematical model of biological neurons in the 1940s and demonstrated that, in theory, they are capable of computing any arithmetic or logical function (McCulloch & Pitts, 1943). They are usually considered as pioneers in the field of neural networks since their idea inspired further development. This led to the development of the perceptron algorithm by Rosenblatt (1957). Rosenblatt demonstrated that a perceptron can classify input as long as it can do it linearly. Marvin Minsky and Seymour Papert's book "Perceptrons," which discusses perceptrons, contains a limitation of these single-layer perceptrons (Minsky and Papert, 1969). They demonstrated that one layer Perceptrons are incapable of handling simple "exclusive or" (XOR) difficulties. The acceptability and usage of neural networks began to decline as a result of these allegations made by Minsky and Papert. Neural networks did not take off once more until Hopefield's work on novel learning algorithms like back-propagation. Since this resurgence, neural network research and applications have grown tremendously. As previously said, artificial neural networks imitate the biological nervous system. They are made up of several elements working together interconnectedly. These neurons receive inputs from other sources, perform a generally nonlinear operation on them, and then output the results.
An input layer, a hidden layer or layers, and an output layer make up a typical artificial neural network (ANN). The leftmost layer in this network is known as the input layer, and the neurons that make up this layer are known as input neurons. The center layer is known as the hidden layer, while the rightmost layer is known as the output layer (containing the output neuron(s)). Figure 1 shows the typical ANN architecture for porosity and permeability prediction.
It has been assumed that large dataset will significantly have effect on the accuracy of the machine learning model, Bailly et al. (2022) in an experiment to understand the effect of large or small datasets on the performance of a model, discovered large dataset or small datasets does not significantly affect a machine learning model.
The neural network used in this study is the multi-layer neural network model which should contain more than one layer of artificial neuron or nodes. The first step in constructing a neural network model is to start by importing all relevant packages. In our case Tensorflow, Keras, Sklearn, pandas, numpy and matplotlib were the packages needed, followed by a Sequential, activation and dense function. The train_test_split was also import from Sklearn to split our data into training and test datasets.
XGBoost
XGBoost is a machine learning algorithm known for its efficiency and accuracy in solving regression and classification problems, has gained popularity due to its ability to combine multiple weak learners (such as decision trees) to create a robust and high-performing model. It has been successfully applied in various fields, including finance, natural language processing, and image recognition. Recently, its application to the petroleum industry for predicting reservoir rock properties has shown promising results.
XGBoost's objective function is utilized to optimize the model during training, comprising of two components: the loss function and the regularization term. The XGBoost update step is responsible for creating new trees to be added to the ensemble. It calculates the gradients and hessians based on the loss function Eq. 1 and then constructs a new tree to correct the residuals of the previous ensemble wei et al., 2022.
Loss(y_i, ŷ_i) = (1 / (2 * n)) * Σ(y_i - ŷ_i)^2 + γ * T + (λ / 2) * Σw_j^2 Eq. 1
XGBoost can be applied to various tasks related to reservoir rock properties, including predicting porosity, permeability, lithology, and fluid saturation. The application process involves the following steps:
Predicting with XGBoost
Data Collection and Preprocessing
Gather well-log data, core samples, and other relevant geological information from the reservoir. Preprocess the data to handle missing values, perform feature engineering, and create the target variables for regression tasks.
Model Training
Utilize the XGBoost algorithm to train the model on the prepared dataset. During training, XGBoost will create an ensemble of decision trees by minimizing the objective function. Adjust the complexity parameter and regularization term to control tree growth and avoid overfitting.
Model Evaluation
Assess the model's performance using metrics such as mean absolute error, mean squared error, or R-squared. Employ cross-validation techniques to ensure the model's generalizability.
Reservoir Properties Prediction
Once the model is trained and evaluated, it can predict reservoir rock properties for new data points or locations without available measurements.
Porosity Prediction from ANN (Artificial Neural Netwoks)
The total data available for training and test are 299 columns and 6 rows. However, the train_test_split from sklearn enabled us to split our data into 70% training set and 30% test set (209 columns and 6 rows for training datasets and 90 columns and 6 rows for test dataset). The test dataset were further divided into the 50% validation dataset (45 columns and 6 rows) and 50% test dataset (45 columns and 6 rows).
A data standardization was performed on the data to bring all input of our data to a common scale without distorting the difference between the ranges of the values of the data, moreso standardization bring all data between 0’s and 1, which are easily understood by machines (Fig. 2).
The training of the ANN network started with two (2) numbers of hidden layers, with “relu” & “sigmoid” as the activation function and dense values of (50, 50), which produce a total parameter of 3,211. Followed by three (3) numbers of hidden layers, with “relu” as the activation function for the three (3) hidden layers, and a dense values of (32, 64, 128), which produce a total number of 11,809 parameters. A five (5) number of hidden layers was also constructed with “relu” as the activation function, a constant dense value of (64), which produce a total number of 19,009 parameters. Followed by seven (7) numbers of hidden layers, with ‘tanh” as the activation function for the 7 hidden layers, and a constant dense value of (64) for each of the layer which produce a total number of 27, 329 parameters. Finally, ten (10) number of hidden layers was constructed, with “tanh” as the activation function for each of the layer, and a dense value of (64) which produce a total number of 39,809 parameters. A checkpoint_callback function was used in this model to save the best models at interval and to also correct the mistakes of the model each time an epoch is run. 1000 epoch and a batch size of 32 “to reduce the training time” was used for each experiment
(on each model) and result shows that the “tanh” model with 10 hidden layers appears to be the best model for the prediction of porosity. The performance metrics (loss, mae, mse, mape) result for each of epoch shows a significant decrease in error as the epoch increases. Furthermore, the performance metrics for the train data show that the mean_absolute_error (mae) has been reduced to 1.87 and mean_squared_error (mse) of 6.56, also, the test dataset has a mean_absolute_error (mae) of 4.08, mean_squared_error (mse) of 29.89 and the validation data has a mean_absolute_error (mae) of 2.90, mean_squared_error (mse) of 13.93 (Fig. 3). The performance metrics plots for train and validation dataset is shown in Fig. 4. This model was put to test on a dataset that has never been exposed to the model and the result in Table 1 were extracted. Figure 5 also shows the graphical plot of the model predicted porosity with the core data result and this shows a good trend between them.
Table 1
ANN porosity prediction vs porosity core data prediction
ANN_Pred | Por_insitu |
34.70487 | 36.06 |
34.616154 | 35.40 |
33.982235 | 34.83 |
33.095074 | 33.93 |
34.042873 | 35.06 |
33.80644 | 35.14 |
34.24234 | 34.21 |
32.945667 | 33.81 |
33.68787 | 33.55 |
34.491383 | 33.24 |
32.447056 | 33.44 |
26.60241 | 22.13 |
26.593842 | 20.68 |
27.017372 | 19.87 |
26.867367 | 30.36 |
26.699024 | 26.67 |
26.84349 | 28.82 |
XGBoost Porosity Prediction Report
In this XGBoost porosity prediction study, we applied the algorithm to well log data, including GR log, resistivity log, density log, sonic log, neutron log, and por_insitu. The latter served as core data for porosity at identical depths, used for validation.
Initial Prediction
The first XGBoost prediction provided an accuracy of 57.54%. Recognizing the need for improvement, we implemented performance metrics such as RMSLE, MSE, MAE, and R2 SCORE. Additionally, we increased the number of iterations to 100, set the cross-validation score to 10, and enabled verbose mode. These optimizations significantly enhanced the model's accuracy, resulting in 91.4154% accuracy, with MAE of 0.5925, MSE of 2.5518, and RMSLE of 0.0566052.
Hyperparameter Tuning
Further enhancing the model, we performed hyperparameter tuning, setting n_estimators to 20, max depth to None, n_jobs to -1, max_samples to None, and random_state to 42. The outcomes were remarkable, boosting the model's accuracy to an impressive 96.75%. The refined performance metrics were MAE of 0.766158, MSE of 1.047763, and RMSLE of 0.038324 (Fig. 6).
Our study demonstrates the successful application of XGBoost for porosity prediction, yielding outstanding accuracy levels of 96.75%. Leveraging well log data and core data validation, this model holds tremendous promise for reservoir characterization and hydrocarbon exploration. By continuously exploring new features and advanced techniques, the potential for further improvements is vast, propelling the petroleum industry into an era of enhanced predictive capabilities. The new model was put to test on a dataset that has never been exposed to the model and the result in Table 2 were extracted. Figure 7 also shows the graphical plot of the model predicted porosity with the core data result and this shows a good trend between them.
Table 2
XGBoost porosity prediction vs porosity core data prediction
ANN_Pred | por_insitu |
36.06 | 33.989372 |
35.4 | 34.83329 |
34.83 | 34.83329 |
33.93 | 34.337337 |
35.06 | 34.83329 |
35.14 | 34.83329 |
34.21 | 34.337337 |
33.81 | 34.337337 |
33.55 | 34.413708 |
33.24 | 34.254223 |
33.44 | 33.64256 |
22.13 | 22.089237 |
20.68 | 20.84871 |
19.87 | 22.444637 |
30.36 | 27.994835 |
26.67 | 26.32377 |
28.82 | 27.431532 |
Figure 8 displays the performance of both models by plotting the results from XGBoost's predicted porosity and ANN's predicted porosity against the Core data porosity. The ideal outcome in this plot should exhibit minimal deviation from the mean line drawn between the plots. In this study, the XGBoost results stand out as the best since its plots closely align with the mean line, outperforming the ANN predictions.