Advancing Reservoir Characterization: A Comparative Analysis of Xgboost and Ann for Accurate Porosity Prediction

doi:10.21203/rs.3.rs-3228291/v1

Download PDF

Research Article

Advancing Reservoir Characterization: A Comparative Analysis of Xgboost and Ann for Accurate Porosity Prediction

https://doi.org/10.21203/rs.3.rs-3228291/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

This study presents a comprehensive analysis of porosity prediction in reservoir characterization using two powerful machine learning algorithms, XGBoost and Artificial Neural Networks (ANN). Well log data, including GR log, resistivity log, density log, sonic log, neutron log, and por_insitu, were employed for training and validation, with the latter serving as core data for porosity at identical depths. The initial XGBoost prediction yielded an accuracy of 57.54%, prompting the implementation of performance metrics and hyperparameter tuning. Through these optimizations, the XGBoost model's accuracy skyrocketed to an impressive 96.75%, with refined performance metrics, including MAE of 0.766158, MSE of 1.047763, and RMSLE of 0.038324. Leveraging the well log data and core data validation, the XGBoost model demonstrated outstanding accuracy levels, holding immense promise for reservoir characterization and hydrocarbon exploration. On the other hand, ANN model underwent data standardization and architecture experimentation, ultimately finding the "tanh" model with ten hidden layers to be the best performer. This model achieved a mean absolute error (MAE) of 1.87 for the train data and 4.08 for the test data. Comparing the two models, the XGBoost model clearly outperformed the ANN model in accuracy for porosity prediction, making it the preferred choice for reservoir applications. The project's success highlights the efficacy of XGBoost in handling complex geological data and its ability to deliver accurate predictions for porosity. Additionally, the ANN model's performance demonstrates the importance of optimizing network architecture and activation functions in achieving accurate results. The findings of this study underscore the significance of machine learning in reservoir characterization, offering valuable insights for decision-making processes in the petroleum industry. The report provides valuable insights for reservoir engineers, geoscientists, and data scientists, offering a foundation for future advancements in reservoir characterization and resource exploration.

Machine Learning (ML)

Artificial Neural Networks (ANN)

Extreme Gradient Boosting(XGBooost)

Mean Square Error (MSE)

Mean Absolute Error (MAE)

Root Mean Square Logithirimic Error

Performance Metrics

Cross Validation (CV)

Checkpoint-CallBack Function

Hidden Layer Gamma Ray log (GR)

The field of reservoir characterization has undergone a significant transformation with the advent of advanced machine-learning algorithms. Among these, XGBoost and Artificial Neural Networks (ANNs) have emerged as powerful contenders, revolutionizing the prediction of reservoir rock properties (Wei et al., 2022). In this research article, we delve into the domain of porosity prediction, a critical parameter that plays a pivotal role in hydrocarbon reservoir evaluation and development. Porosity serves as a key indicator of a reservoir's storage capacity, permeability, and fluid flow potential. Traditionally, the parametric method has been employed to estimate porosity from well-log data, providing valuable insights into subsurface rock formations. However, the limitations of this conventional approach have spurred the exploration of more sophisticated techniques to enhance accuracy and efficiency. To bridge this gap, we conduct a comprehensive comparative analysis of XGBoost and ANN for predicting reservoir porosity. By harnessing the power of these machine learning algorithms, we seek to uncover the hidden patterns and intricate relationships within the well-log data, enabling more accurate and reliable predictions. Our study employs a diverse dataset comprising well-log measurements from multiple reservoirs (Wang, Z et al., 2022). We meticulously preprocess the data, ensuring its quality and consistency, and then train the XGBoost and ANN models.

The performance of the models is rigorously evaluated using several metrics, and their results are compared with the conventional parametric method (Topór et al, 2022). However, to accurately assess their real-world potential, we go beyond conventional evaluation practices and introduce a robust validation dataset known as core data. This dataset provides an opportunity to assess the models' ability to generalize to unseen reservoir conditions. This validation exercise takes us one step closer to deploying these machine-learning tools in practical reservoir characterization scenarios. Our research aims to unlock the full potential of XGBoost and ANN in porosity prediction. We provide a detailed comparison of their capabilities against the established parametric method, which offers valuable insights to petroleum engineers and geoscientists on the most effective methods for reservoir characterization. In the following sections, we present the methodology, results, and discussions, leading the way for a paradigm shift in reservoir characterization practices. Our research is a significant contribution to the energy industry's efforts to optimize hydrocarbon extraction, and it drives us closer to a more sustainable and efficient energy future.

Artificial Neural Networks

The neurons in the human body served as an inspiration for the idea underlying the ANN. Through the synaptic connections at the end of the axon, these linked neurons transmit information from one cell body to another. The neural network system of the human body is extremely intricate and intricately coupled since each neuron has hundreds of connections (Graupe, 2007). Warren McCulloch and Walter Pitts created a condensed mathematical model of biological neurons in the 1940s and demonstrated that, in theory, they are capable of computing any arithmetic or logical function (McCulloch & Pitts, 1943). They are usually considered as pioneers in the field of neural networks since their idea inspired further development. This led to the development of the perceptron algorithm by Rosenblatt (1957). Rosenblatt demonstrated that a perceptron can classify input as long as it can do it linearly. Marvin Minsky and Seymour Papert's book "Perceptrons," which discusses perceptrons, contains a limitation of these single-layer perceptrons (Minsky and Papert, 1969). They demonstrated that one layer Perceptrons are incapable of handling simple "exclusive or" (XOR) difficulties. The acceptability and usage of neural networks began to decline as a result of these allegations made by Minsky and Papert. Neural networks did not take off once more until Hopefield's work on novel learning algorithms like back-propagation. Since this resurgence, neural network research and applications have grown tremendously. As previously said, artificial neural networks imitate the biological nervous system. They are made up of several elements working together interconnectedly. These neurons receive inputs from other sources, perform a generally nonlinear operation on them, and then output the results.

An input layer, a hidden layer or layers, and an output layer make up a typical artificial neural network (ANN). The leftmost layer in this network is known as the input layer, and the neurons that make up this layer are known as input neurons. The center layer is known as the hidden layer, while the rightmost layer is known as the output layer (containing the output neuron(s)). Figure 1 shows the typical ANN architecture for porosity and permeability prediction.

It has been assumed that large dataset will significantly have effect on the accuracy of the machine learning model, Bailly et al. (2022) in an experiment to understand the effect of large or small datasets on the performance of a model, discovered large dataset or small datasets does not significantly affect a machine learning model.

The neural network used in this study is the multi-layer neural network model which should contain more than one layer of artificial neuron or nodes. The first step in constructing a neural network model is to start by importing all relevant packages. In our case Tensorflow, Keras, Sklearn, pandas, numpy and matplotlib were the packages needed, followed by a Sequential, activation and dense function. The train_test_split was also import from Sklearn to split our data into training and test datasets.

XGBoost

XGBoost is a machine learning algorithm known for its efficiency and accuracy in solving regression and classification problems, has gained popularity due to its ability to combine multiple weak learners (such as decision trees) to create a robust and high-performing model. It has been successfully applied in various fields, including finance, natural language processing, and image recognition. Recently, its application to the petroleum industry for predicting reservoir rock properties has shown promising results.

XGBoost's objective function is utilized to optimize the model during training, comprising of two components: the loss function and the regularization term. The XGBoost update step is responsible for creating new trees to be added to the ensemble. It calculates the gradients and hessians based on the loss function Eq. 1 and then constructs a new tree to correct the residuals of the previous ensemble wei et al., 2022.

Loss(y_i, ŷ_i) = (1 / (2 * n)) * Σ(y_i - ŷ_i)^2 + γ * T + (λ / 2) * Σw_j^2 Eq. 1

XGBoost can be applied to various tasks related to reservoir rock properties, including predicting porosity, permeability, lithology, and fluid saturation. The application process involves the following steps:

Predicting with XGBoost

Data Collection and Preprocessing

Gather well-log data, core samples, and other relevant geological information from the reservoir. Preprocess the data to handle missing values, perform feature engineering, and create the target variables for regression tasks.

Model Training

Utilize the XGBoost algorithm to train the model on the prepared dataset. During training, XGBoost will create an ensemble of decision trees by minimizing the objective function. Adjust the complexity parameter and regularization term to control tree growth and avoid overfitting.

Model Evaluation

Assess the model's performance using metrics such as mean absolute error, mean squared error, or R-squared. Employ cross-validation techniques to ensure the model's generalizability.

Reservoir Properties Prediction

Once the model is trained and evaluated, it can predict reservoir rock properties for new data points or locations without available measurements.

Porosity Prediction from ANN (Artificial Neural Netwoks)

The total data available for training and test are 299 columns and 6 rows. However, the train_test_split from sklearn enabled us to split our data into 70% training set and 30% test set (209 columns and 6 rows for training datasets and 90 columns and 6 rows for test dataset). The test dataset were further divided into the 50% validation dataset (45 columns and 6 rows) and 50% test dataset (45 columns and 6 rows).

A data standardization was performed on the data to bring all input of our data to a common scale without distorting the difference between the ranges of the values of the data, moreso standardization bring all data between 0’s and 1, which are easily understood by machines (Fig. 2).

The training of the ANN network started with two (2) numbers of hidden layers, with “relu” & “sigmoid” as the activation function and dense values of (50, 50), which produce a total parameter of 3,211. Followed by three (3) numbers of hidden layers, with “relu” as the activation function for the three (3) hidden layers, and a dense values of (32, 64, 128), which produce a total number of 11,809 parameters. A five (5) number of hidden layers was also constructed with “relu” as the activation function, a constant dense value of (64), which produce a total number of 19,009 parameters. Followed by seven (7) numbers of hidden layers, with ‘tanh” as the activation function for the 7 hidden layers, and a constant dense value of (64) for each of the layer which produce a total number of 27, 329 parameters. Finally, ten (10) number of hidden layers was constructed, with “tanh” as the activation function for each of the layer, and a dense value of (64) which produce a total number of 39,809 parameters. A checkpoint_callback function was used in this model to save the best models at interval and to also correct the mistakes of the model each time an epoch is run. 1000 epoch and a batch size of 32 “to reduce the training time” was used for each experiment

(on each model) and result shows that the “tanh” model with 10 hidden layers appears to be the best model for the prediction of porosity. The performance metrics (loss, mae, mse, mape) result for each of epoch shows a significant decrease in error as the epoch increases. Furthermore, the performance metrics for the train data show that the mean_absolute_error (mae) has been reduced to 1.87 and mean_squared_error (mse) of 6.56, also, the test dataset has a mean_absolute_error (mae) of 4.08, mean_squared_error (mse) of 29.89 and the validation data has a mean_absolute_error (mae) of 2.90, mean_squared_error (mse) of 13.93 (Fig. 3). The performance metrics plots for train and validation dataset is shown in Fig. 4. This model was put to test on a dataset that has never been exposed to the model and the result in Table 1 were extracted. Figure 5 also shows the graphical plot of the model predicted porosity with the core data result and this shows a good trend between them.

Table 1

ANN porosity prediction vs porosity core data prediction
ANN_Pred	Por_insitu
34.70487	36.06
34.616154	35.40
33.982235	34.83
33.095074	33.93
34.042873	35.06
33.80644	35.14
34.24234	34.21
32.945667	33.81
33.68787	33.55
34.491383	33.24
32.447056	33.44
26.60241	22.13
26.593842	20.68
27.017372	19.87
26.867367	30.36
26.699024	26.67
26.84349	28.82

XGBoost Porosity Prediction Report

In this XGBoost porosity prediction study, we applied the algorithm to well log data, including GR log, resistivity log, density log, sonic log, neutron log, and por_insitu. The latter served as core data for porosity at identical depths, used for validation.

Initial Prediction

The first XGBoost prediction provided an accuracy of 57.54%. Recognizing the need for improvement, we implemented performance metrics such as RMSLE, MSE, MAE, and R2 SCORE. Additionally, we increased the number of iterations to 100, set the cross-validation score to 10, and enabled verbose mode. These optimizations significantly enhanced the model's accuracy, resulting in 91.4154% accuracy, with MAE of 0.5925, MSE of 2.5518, and RMSLE of 0.0566052.

Hyperparameter Tuning

Further enhancing the model, we performed hyperparameter tuning, setting n_estimators to 20, max depth to None, n_jobs to -1, max_samples to None, and random_state to 42. The outcomes were remarkable, boosting the model's accuracy to an impressive 96.75%. The refined performance metrics were MAE of 0.766158, MSE of 1.047763, and RMSLE of 0.038324 (Fig. 6).

Our study demonstrates the successful application of XGBoost for porosity prediction, yielding outstanding accuracy levels of 96.75%. Leveraging well log data and core data validation, this model holds tremendous promise for reservoir characterization and hydrocarbon exploration. By continuously exploring new features and advanced techniques, the potential for further improvements is vast, propelling the petroleum industry into an era of enhanced predictive capabilities. The new model was put to test on a dataset that has never been exposed to the model and the result in Table 2 were extracted. Figure 7 also shows the graphical plot of the model predicted porosity with the core data result and this shows a good trend between them.

Table 2

XGBoost porosity prediction vs porosity core data prediction
ANN_Pred	por_insitu
36.06	33.989372
35.4	34.83329
34.83	34.83329
33.93	34.337337
35.06	34.83329
35.14	34.83329
34.21	34.337337
33.81	34.337337
33.55	34.413708
33.24	34.254223
33.44	33.64256
22.13	22.089237
20.68	20.84871
19.87	22.444637
30.36	27.994835
26.67	26.32377
28.82	27.431532

Figure 8 displays the performance of both models by plotting the results from XGBoost's predicted porosity and ANN's predicted porosity against the Core data porosity. The ideal outcome in this plot should exhibit minimal deviation from the mean line drawn between the plots. In this study, the XGBoost results stand out as the best since its plots closely align with the mean line, outperforming the ANN predictions.

XGBoost and ANN models were utilized for the porosity prediction study, using well log data and core data as the validation data. XGBoost initially achieved an accuracy of 57.54%, which was improved through the implementation of performance metrics and hyperparameter tuning, reaching an impressive accuracy of 96.75%. The implementation of refined performance metrics, including MAE, MSE, and RMSLE, confirmed the model's reliability and robustness. In contrast, the ANN model was subjected to data normalization and was tested using various configurations of hidden layers and activation functions. The 'tanh' model with ten hidden layers emerged as the top performer, registering a mean absolute error (MAE) of 1.87 for the training data and 4.08 for the test data.

Recommendation

Based on the results obtained, XGBoost model outperformed ANN model in terms of accuracy for porosity prediction. XGBoost model achieved an impressive accuracy of 96.75%, outperforming the ANN model's performance. As a result, it is recommended to prefer XGBoost model for future porosity prediction tasks. The report is comprehensive and well-documented, providing clear insights into the methodologies and findings. However, to highlight the performance differences between XGBoost and ANN models more explicitly, it would be beneficial to include a direct comparison between the two models. Additionally, providing further analysis on the reasons for XGBoost model's superior performance, such as feature importance rankings, would enhance the report's depth.

It would be valuable to explore and discuss the limitations and challenges faced during the project, as this can provide valuable context for interpreting the results and offer potential avenues for future research and improvements. Building upon the successes of the XGBoost model, further advancements can be made in the field of reservoir characterization and hydrocarbon exploration by continuously exploring new features, advanced techniques, and collaborating with domain experts. This will pave the way for enhanced predictive capabilities, making a lasting impact on the industry's decision-making processes.

Acknowledgements

This research is not connected to any profitable agency.

Author contributions: OT developed the idea, processing the data, methodology, visualization, writing original draft; AA: developed the idea, data acquisition, methodology, resources, supervision, writing original draft; KA: developed the idea, methodology, resources, supervision, reviewing and editing. All authors read and approved the final manuscript.

Funding: The authors declare that there were no funds or grants received during the preparation of this manuscript.

Declarations: I affirm that this thesis is my original work and has not been submitted to any journal house or article

Conflict of interest: On behalf of all authors, the corresponding author states that there is no conflict of interest.

Bailly A, Blanc C, Francis É, Guillotin T, Jamal F, Wakim B, Roy P (2022) Effects of dataset size and initeractions on the prediction performance of logistic regression and deep learning models, vol 213. Computer Methods and Programs in Biomedicine, p 106504
Graupe D (2007) Principles of Artificial Neural Networks, 2nd edn. World Scientific Publishing.10.1142/ASCAS, Singapore
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133
Minsky M, Papert S (1969) An introduction to computational geometry. Cambridge tiass. HIT 479:480
Rosenblatt JR (1957) On prediction of system performance from information on component performance. In Papers presented at the February 26–28, 1957, western joint computer conference: Techniques for reliability (pp. 85–94)
Topór T, Sowiżdżał K (2022) Application of machine learning tools for seismic reservoir characterization study of porosity and saturation type. Nafta-Gaz 78(3):165–175
Wang Z, Tang H, Cai H, Hou Y, Shi H, Li J, …, Feng Y (2022) Production prediction and main controlling factors in a highly heterogeneous sandstone reservoir: Analysis on the basis of machine learning. Energy Sci Eng 10(12):4674–4693
Wei H, Xie P, Wang C, Lin L, Zhang D (2022) A Data-Driven Adaptive Emotion Recognition Model for College Students Using an Improved Multifeature Deep Neural Network Technology. Computational Intelligence and Neuroscience, 2022, 1343358. https://doi.org/10.1155/2022/1343358

Download PDF

Version 1

posted

You are reading this latest preprint version

Advancing Reservoir Characterization: A Comparative Analysis of Xgboost and Ann for Accurate Porosity Prediction

Status:

Version 1

Abstract

Figures

INTRODUCTION

METHODOLOGY

Artificial Neural Networks

XGBoost

Predicting with XGBoost

Porosity Prediction from ANN (Artificial Neural Netwoks)

XGBoost Porosity Prediction Report

Initial Prediction

Hyperparameter Tuning

Conclusion

Recommendation

Declarations

References

Status:

Version 1