Porosity Prediction using Bagging Ensemble Machine Learning in CCUS Reservoirs. A Case Study: Darling Basin, Australia

doi:10.21203/rs.3.rs-4964600/v1

Machine learning (ML), a subset of artificial intelligence, has been utilised in many engineering fields, such as computer engineering, electrical engineering, civil engineering petroleum engineering. Bagging ensemble algorithms have been employed for parameter prediction, as they theoretically outperform traditional ML algorithms. Carbon dioxide capture and storage (CCS) is a strategy implemented to mitigate carbon dioxide emissions. A vital aspect of CCS assessment is determining carbon storage capacity, which estimates the amount of CO₂ that can be stored in the subsurface. Porosity is a critical parameter in calculating this capacity. In this study, the applicability of regression friendly bagging ensemble ML models; random forest regression (RFR) and extra tree regression (ETR) to estimate porosity of a sandstone layer as part of a CCS program was investigated. RFR models were developed considering caliper log (CAL), gamma ray log (GR), neutron log (NPHI), photoelectric factor log (PE) and deep laterolog (LLD) input features and calculated porosity as targets. Moreover, four traditional (classical) ML models, multilayer perceptron (MLP), support vector regression (SVR), k-nearest neighbor (KNN) and decision tree regression (DTR), were developed to compare them with the bagging ensemble models. The results showed that the RFR model achieved a testing model R² value of 0.9668, while ETR model achieved a resting model R² value of 0.9569. The higher R² value of the RFR model makes it a better choice for predicting porosity in CCS assessment projects. However, if computational time is a critical factor, ETR could be preferable, as it required only1/3 of the computational time that of the RFR model. Furthermore, when the performance of these models was compared with the four traditional ML models the two bagging ensembles distinctly outperformed the traditional models.

Machine learning

Ensemble learning

Bagging ensembles

Carbon capture and storage

Porosity

Machine learning (ML), which comes under artificial intelligence has become popular among researchers in recent times, because of its ability to identify patterns between a set of inputs and an output with high precision and speed (Ali et al., 2022; Zhan and Kitchin, 2022). Even though ML was a tool primarily used by computer scientists a few decades ago, at present it is being used by researchers in many STEM fields (Lu et al., 2021; Sun et al., 2021; Tao et al., 2021; Walsh et al., 2021). Most importantly researchers are trying to integrate ML to enhance the sustainability of certain industries. For instance, ML is being applied in the upstream oil and gas industry to decrease manpower and time (Tariq et al., 2021, Temizel et al., 2021). The first ML algorithm known as “the perceptron” was invented by Rosenblatt (1958). Since then, ML has seen a rapid development over the years. Along the way some key ML algorithms were invented, such as multi layer perceptron, support vector machines, k-nearest neighbor, decision trees, hybrid learning, ensemble learning and deep learning (Fix & Hodges, 1951; Rosenblatt, 1958; Schölkopf et al., 2013; Cover & Hart, 1967; Morgan & Sonquist, 1963; Dasarathy & Sheela, 1979; Psichogios & Ungar, 1992; Dechter, 1986). Algorithms such as multi layer perceptron, support vector machines and decision trees can be categorised as traditional (classical) ML models. The reason behind this categorisation is, apart from the fact that they were invented in the initial stages of the ML development timeline, they were also used as building blocks of several modern ML algorithms such as hybrid learning, ensemble learning and deep learning. The basic concept of an ML model is to take in a bunch of input data and develop a relationship with an output (El Naqa & Murphy, 2015; Mahesh, B. 2020). Interpreting how an ML model generate these outputs can be quite complex and it is often referred as a black box (Handelman et al., 2019; Hsu & Elmore, 2019). However, with the development of ML related techniques and concepts it is possible to describe an ML model more confidently by utilising several data produced during and after the process.

1.1 Ensemble machine learning algorithms

Ensemble learning is a branch of ML where weaker ML algorithms are amalgamated to produce a high performing model (Rincy & Gupta, 2020; Feng et al., 2021; Ganaie et al., 2022,). Bagging, also known as bootstrap aggregating ensemble learning, is a subset of ensemble learning which was widely used in recent studies (Hong et al., 2020; Xu et al., 2020; Ngo, Beard & Chandra, 2022). The concept was first put forward by Breiman (1996) and since then, the concept has been strengthened by the invention of multiple high performing algorithms. Bagging ensembles have the capability to address the issue of overfitting in traditional ML models (Ghojogh & Crowley, 2019, Mosavi et al., 2021). Overfitting occurs when the model performs well for the training set but gives unusually poor results when the test set is introduced. When the model exhibit overfitting it shows a low bias and a high variance. Figure 1 shows the architecture of a bagging type ensemble. In bagging ensemble algorithms initial dataset is divided into several samples, and they are introduced to each base model.

Generally, the base models in bagging ensembles are tree based (comprise of decision trees). Predictions from each base model are averaged at the end to get the final prediction (Ganaie et al., 2022). Random forest and extra tree (extremely randomised trees) are two bagging type ensemble learning algorithms. These two ML algorithms are capable of solving regression type problems. Random forest and extra tree have similar characteristics. However, the notable difference between the two algorithms is random forest uses bootstrap replicas, i.e. its samples are selected from the dataset with replacement, while extra tree draws samples without replacement. Table 1 shows the main differences among decision tree, random forest, and extra tree algorithms.

Table 1

Comparison of the characteristics of a decision tree and bagging ensemble algorithms
Algorithm	Feature
Algorithm	Number of trees	Number of features considered to split at each decision tree	Bootstrapping	Splitting procedure
Decision tree	One	All features	Not applicable	Best split
Random forest	Multiple	Random subset of features	Yes	Best split
Extra tree	Multiple	Random subset of features	No	Random split

1.2 Carbon capture and storage

The topic of sustainable development has become prominent in decision making of several industries, such as oil and gas. Along with sustainability, achieving carbon net-zero also has come into spotlight in recent times (Bergero et al., 2023; Dafnomilis et al., 2023; Xu et al., 2023). Carbon dioxide capture and storage (CCS) is a practice used worldwide to mitigate carbon dioxide emissions (Boot-Handford et al., 2014; Bahman et al., 2023). The oil and gas industry employs this method capture carbon dioxide emitted from production plants, liquify it, and inject it into the subsurface for long-term storage in suitable geological layers, as shown in (Shirmohammadi et al., 2020, Wilberforce et al., 2021).

To capture CO₂, chemical absorption methods are commonly used, typically aqueous amine solutions. The CO₂ is separated from the amine solution, dried and pressurised into liquid form. This liquified CO₂ is then transported via pipelines to both onshore and offshore sites (Gibbins & Chalmers, 2008; Bui et al., 2018). The CO₂ is injected into subsurface layers with interconnected pores, allowing the liquid to move within the layer.

During the initial assessment of a CCS project, a parameter called carbon storage capacity is estimated (Ringrose, 2020; AlNajdi & Worden, 2023). This parameter indicates how much CO₂ can be stored in the geological layer being considered. Porosity plays a crucial role in estimating storage capacity. Traditional methods of estimating porosity, such as core analysis, are expensive, time consuming and can be affected by damaged core samples at certain depths (Erofeev et al., 2019; Agbadze et al., 2022). With the vast amount of data generated during the initial assessments of CCS projects, integrating ML techniques to predict porosity is feasible. This predicted porosity can then be used to estimate the carbon storage capacity.

Currently, there is no research focusing on the usability of bagging ensembles to predict porosity in CCS projects or on how feasible these models are for characterising, the CO₂ storing layer. Further, a comparison of bagging ensembles with traditional ML models for porosity prediction in CCS assessment can accelerate ML advancement and support future studies.

In this study, two regression friendly ensemble ML models, random forest regression (RFR) and extra tree regression (ETR), were developed to predict porosity of sandstone dominated layers in the Darling basin, Australia. Data was collected as part of a CCS assessment program. Predications from these two bagging ensemble models will be compared with those from four traditional models: multi layer perceptron (MLP), support vector regressor (SVR), K nearest neighbors regressor (KNN), and decision tree regressor (DTR).

ML based studies generally consist of four stages as shown in Fig. 2. Those are data acquisition, data preprocessing, ML model development and data analysis. Data processing stage usually consists, data cleaning, multicollinearity check and data division.

2.1 Data acquisition

Data was acquired from Mena Murtee – 1 well drilled in Pondie Range trough in Darling basin in New South Wales, Australia. This bore was drilled as a part of New South Wales carbon dioxide storage assessment program. Sand-rich intervals were found from the depth levels of 1388 to 1641 m (measured depth). It has to be noted that sandstone layers were interbedded with shale beds. These layers showed a porosity of 5% – 12% and a permeability in the order of 10s − 100s mD (Bell & Knight 2014). Location of the Mena Murtee – 1 well is shown in Fig. 3.

Original dataset had 2787 datapoints. Well log data were used as input features and lab corrected porosity values (PHIF) were used as the target values. Input features consisted of 6 features. The input features in the original dataset and their abbreviations are shown in Table 2.

Table 2

Input features and their abbreviations
Input features	Abbreviations
Caliper log	CAL
Gamma ray log	GR
Neutron log	NPHI
Photoelectric factor log	PE
Deep laterolog	LLD
Shallow laterolog	LLS

2.2 Data preprocessing

Original dataset was cleaned by removing missing values and outliers. Outliers were detected by using the interquartile range method. Cleaned dataset had 921 datapoints. To understand the relationship between input features and output a pair plot was plotted as shown in Fig. 4.

After cleaning the dataset, it was checked for multicollinearity. During multicollinearity check highly colinear input features were eliminated from the data set. Multicollinearity check is recommended in ML based studies since it helps to confidently define the relationship between input features and the output (Garg & Tai, 2013; Bowie, 2018). In this study a threshold of 0.7 was considered for multicollinearity check. To identify the collinearity among input features, heatmap was plotted as shown in Fig. 5. The multicollinearity between LLS and LLD was 0.96 and LLS was removed from the dataset to maintain a multicollinearity score less than 0.7 among input features. Hence the final dataset had five input features, GR, NPHI, LLD, CAL and PE.

As highlighted previously, two regression friendly ML algorithms RFR and ETR were developed. These models were developed using Python supported scikit-learn library. RFR model was developed using RandomForestResgressor module and the ETR model was developed using ExtraTreeRegressor module. To compare the performance of bagging ensembles 4 traditional ML modes MLP, SVR, KNN and DTR were developed. These 4 models were also develops using the scikit-learn library in Python. MLP was developed using the MLPRegressor module, SVR was developed using SupportVectorRegressor module KNN was developed using KNeighborsRegressor and the DTR was developed using the DecisionTreeRegressor module. Models were trained, validated and tested with the allocated data to obtain the porosity predictions. GitHub link to the codes developed during the study is provided under appendix.

The dataset was then divided in the ratio of 80:20 for training and testing before feeding into the algorithms. Hyperparameters of the models were optimized using grid search optimization. For this GridSeachCV module in scikit-learn library was used. Further, k-fold cross validation was implemented on the training set where k was assigned with 10. Hyperparameters used for each model is tabulated in Table 3.

Table 3

Hyperparameters of each algorithm and their values
ML algorithm	Hyperparameter	Hyperparameter value
RFR	n_estimators	600, 700, 800, 900
	min_samples_split	2, 4, 6, 8
	min_samples_leaf	1, 2, 3, 4
	max_features	2, 3, 4, 5
	max_depth	10, 20, 30, 40
ETR	n_estimators	600, 700, 800, 900
	min_samples_split	2, 4, 6, 8
	min_samples_leaf	1, 2, 3, 4
	max_features	2, 3, 4, 5
	max_depth	10, 20, 30, 40
MLP	hidden_layer_sizes	(50, 50, 50), (100, 100, 100), (150, 150, 150), (200, 200, 200)
	activation	identity, logistic, tanh, relu
	solver	lbfgs, sgd, adam
	learning_rate	constant, invscaling, adaptive
	momentum	0.2, 0.5, 0.7, 0.9
SVR	C	0.1, 1, 10, 100
	gamma	1, 0.1, 0.01, 0.001
	kernel	rbf, sigmoid, linear
KNN	n_neighbors	3, 5, 7, 9
KNN	leaf_size	20, 30, 40, 50
DTR	min_samples_split	2, 4, 6, 8
	min_samples_leaf	1, 2, 3, 4
	max_features	1, 2, 3, 4
	max_depth	10, 20, 30, 40

2.3 Data Analysis

After developing the ML models, their performances were evaluated. To assess the performances 3 parameters were considered, namely, coefficient of determination (R²), mean squared error (MSE) and mean absolute error (MAE). Statistical equations of each parameter are shown in Equation 1, Equation 2 and Equation 3 respectively. Here, is the actual value, is the predicted value and is the mean value of the actual data.

All 3 parameters, R², MSE and MAE gives measurement on how well the model replicates the actual values when a prediction is done. A model to be robust, R² has to be closer to 1. Even though there is no exact MSE or MAE value which will define a healthy ML model, as a rule of thumb lower the MSE and MAE value, healthier the model’s performance is. In this study R² was given the priority during the comparison of the ML models, since it seems to be the most preferred parameter in the literature for ML model comparison (Zolotukhin & Gayubov 2019; Tembely et al., 2020; Tran et al., 2020; Zhang et al., 2021; Mohammadian et al., 2022). Bagging models’ performances were compared to identify the better performing model out of the two. Further, the performances of the two models were compared with the performances of the 4 traditional ML models to investigate how well or poorly they behave compared to the traditional models during the prediction.

3.1 Bagging ensemble models

Performances of the train and test models of RFR and ETR are tabulated in Table 4. When comparing the performances, priority was given to the test models’ performance, since test models generate the predictions using an independent dataset which had never been seen by the model. Hence it will give a better understanding of the models’ performances to a “real-world” dataset.

Table 4

Train and test model performance of RFR and ETR models
ML model	Training			Testing
ML model	R²	MSE (%²)	MAE (%)	R²	MSE (%²)	MAE (%)
RFR	0.9960	0.0165	0.0891	0.9668	0.1314	0.2513
ETR	0.9999	1.2471×10^− 10	1.4644×10^− 6	0.9569	0.1705	0.2272

RFR’s test model showed a healthy R² value of 0.9668, which was clearly higher than that of ETR which was 0.9569. Theoretically the performance of the train model is higher than the test model since the training model is trained to fit the training set. This pattern is clearly visible in the results obtained. However, the test model’s MAE value of ETR is slightly less than that of RFR which deviates from the pattern observed in R² and MSE results. Overall, both RFR and ETR had strong predicting power having R² values over 0.9.

Behavior of the models were further analysed by observing the variation of the predicted porosity along the depth which is shown in Fig. 6. In the same plot actual porosity variation, i.e. lab corrected porosity along the depth is also depicted. Both RFR and ETR models followed the pattern well. However, in certain depths RFR showed much more robustness than ETR. This is also emphasised by the correlation plots shown in Fig. 7. Predictions from the RFR model scatter closely around the perfect correlation line compared to the ETR model.

To further define the behaviour of the bagging ML models, feature importance plots were obtained as shown in Fig. 8. Plot was developed using the permutation_importance module in scikit-learn. GR and LLD had the highest influence on the porosity prediction in both RFR and ETR models. PE seems to be having a less influence on the porosity prediction in both RFR and ETR models.

Computational time is another parameter which can be utilized to evaluate the efficiency of an ML model. Out of the two bagging ensemble models, ETR had the upper hand. There was a major computational-time difference in the two models. ETR showed a computational time of 0.6700 S and RFR showed a computational time of 2.0640 S, which is threefold more than that of ETR. This indicated that even though, accuracy-based parameters of RFR were higher than ETR, when “time” factor is added to the equation ETR becomes more significant.

3.2 Traditional ML models

Predictions from 4 traditional ML models were obtained to compare with the bagging ensemble models’ performance. This was helpful in understanding how superior (or inferior) bagging ensemble models compared to traditional algorithms. Performances of the 4 traditional ML models are tabulated in Table 5.

Table 5

Train and test model performance of traditional ML models
ML algorithm	Train			Test
ML algorithm	R²	MSE (%²)	MAE (%)	R²	MSE (%²)	MAE (%)
MLP	0.9560	0.1820	0.3278	0.9413	0.2325	0.3591
SVR	0.8374	0.6722	0.4451	0.8374	0.6436	0.4578
KNN	0.9347	0.2700	0.1845	0.9367	0.2505	0.2905
DTR	0.9852	0.0611	0.1724	0.9339	0.2616	0.3566

Even though the performances of the traditional ML models did not exceed that of bagging ensembles, still they were yielding healthy results. Out of the traditional ML models, MLP gave the best results with an R² of 0.9413. It had the lease MSE value out of all the traditional ML models as well. The least performance was shown by the SVR model, which had an R² value of 0.8374 and it had the highest MSE and MAE value out of all the models which emphasised its comparative poor performance.

Depth-porosity plots were plotted to do a visual comparison among the performances of three models as shown in Fig. 9. As depicted by the numerical values SVR model seems to have comparatively poor prediction throughout the shown depth level. Correlation plots are shown in Fig. 10 to have a wider understanding on the behaviour of predicted porosities in comparison to the actual values.

Correlation plots also indicated that SVR model performed the least out of the ML models, since considerable number of points positioned far from the perfect correlation line. Correlation between the predicted and the expected (actual) porosity values of the other three ML models had a closer distribution around the perfect correlation line indicating a better prediction power than the SVR model.

The variation of the performance of the two bagging ensemble ML models and the four traditional ML models are shown in Fig. 11 graphically. It can be clearly observed that the performance of the bagging ensemble models outperformed the traditional ML models. Further, RFR’s performance was clearly above all the other ML models considered in the study. In ML performance standpoint all the 6 algorithms responded well to the porosity prediction using the considered well log data since all of them had an R² more 0.8. Studies conducted by Chopra et al. (2018), Ma et al. (2021), Alfonso Perez & Colchero Paetz (2024) highlighted that ML models having an R² higher than 0.8 can be identified as high performing models. However, except SVR algorithm all the other models had a high R² which was above 0.9 which indicated that those models had a clear performance superiority than the accepted value of 0.8.

In this study, two regression-friendly bagging ensemble ML models Random Forest Regression (RFR) and Extra Tree Regression (ETR), were developed to predict porosity of sandstone dominated layers in the Darling basin, Australia, using well log data from a carbon capture and storage (CCS) assessment program.

The RFR model achieved an R² value of 0.9668, indicating higher accuracy compared to the ETR model, which had an R² value of 0.9569. However, the RFR model’s computational time was three times higher than of the ETR model. Therefore, the ETR could be a convenient tool when computational time is a critical factor during porosity prediction.

Based on the R² value, the performance of the ML model superiority descended in order from RFR followed by ETR, MLP, KNN, DTR, and finally SVR. This shows that RFR is a very good candidate to be used in CCS projects to predict porosity of the targeted geological layer.

Overall, the study demonstrates that both bagging ensemble and traditional machine learning models have high potential for subsurface characterisation in CCS projects, particularly in porosity prediction. The RFR algorithm is highly recommended when performance is the priority, while the ETR algorithm is preferable when time efficiency is crucial in the CCS assessment program.

Acknowledgement

Authors would like to thank Curtin University Malaysia and Curtin Malaysia Postgraduate Research Scholarship (CMPRS) for hosting and allocating the research grant for the study. Further, authors would like to thank Dr. Lionel Esteban and the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia for providing the Darling basin dataset.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding declaration

This research was funded by Curtin Malaysia Postgraduate Research Scholarship (CMPRS) of Curtin University Malaysia under the grant number R2/520/8/0054.

Appendix

The Authors would like to share open repositories folder containing the codes and resources for this study on GitHub and extend an invitation to collaborate through Open Knowledge sharing. The folder contains the code for the RFR, ETR, MLP, KNN, DTR and SVR models. The GitHub repositories with the developed codes in the study can be accessed via this link.

Agbadze, O. K., Qiang, C., & Jiaren, Y. (2022). Acoustic impedance and lithology-based reservoir porosity analysis using predictive ML algorithms. Journal of Petroleum Science and Engineering, 208, 109656.
Alfonso Perez, G., & Colchero Paetz, J. V. (2024). Velocity Estimations in Blood Microflows via Machine Learning Symmetries. Symmetry, 16(4), 428.
Ali, M., Jha, N. K., Pal, N., Keshavarz, A., Hoteit, H., & Sarmadivaleh, M. (2022). Recent advances in carbon dioxide geological storage, experimental procedures, influencing parameters, and future outlook. Earth-Science Reviews, 225, 103895.
AlNajdi, N., & Worden, R. H. (2023). Porosity in mudstones and its effectiveness for sealing carbon capture and storage sites. Geological Society, London, Special Publications, 528(1), SP528-2022.
Bahman, N., Al-Khalifa, M., Al Baharna, S., Abdulmohsen, Z., & Khan, E. (2023). Review of carbon capture and storage technologies in selected industries: potentials and challenges. Reviews in Environmental Science and Bio/Technology, 1-20.
Bell, J. H., & Knight, J. (2014). NSW CO2 Storage Assessment Program. Report on Stage B, 1.
Bergero, C., Gosnell, G., Gielen, D., Kang, S., Bazilian, M., & Davis, S. J. (2023). Pathways to net-zero emissions from aviation. Nature Sustainability, 6(4), 404-414.
Boot-Handford, M. E., Abanades, J. C., Anthony, E. J., Blunt, M. J., Brandani, S., Mac Dowell, N., ... & Fennell, P. S. (2014). Carbon capture and storage update. Energy & Environmental Science, 7(1), 130-189.
Bowie, B. (2018, March). Machine learning applied to optimize Duvernay well performance. In SPE Canada Unconventional Resources Conference? (p. D021S008R003). SPE.
Breiman, L., 1996. Bagging predictors. Mach. Learn. 24, 123–140
Bui, M., Adjiman, C. S., Bardow, A., Anthony, E. J., Boston, A., Brown, S., ... & Mac Dowell, N. (2018). Carbon capture and storage (CCS): the way forward. Energy & Environmental Science, 11(5), 1062-1176.
Chopra, P., Sharma, R. K., Kumar, M., & Chopra, T. (2018). Comparison of machine learning techniques for the prediction of compressive strength of concrete. Advances in Civil Engineering, 2018.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
Dafnomilis, I., den Elzen, M., & van Vuuren, D. P. (2023). Achieving net‐zero emissions targets: An analysis of long‐term scenarios using an integrated assessment model. Annals of the New York Academy of Sciences, 1522(1), 98-108.
Dasarathy, B. V., & Sheela, B. V. (1979). A composite classifier system design: Concepts and methodology. Proceedings of the IEEE, 67(5), 708-713.
Dechter, R. (1986). Learning while searching in constraint-satisfaction problems.
El Naqa, I., & Murphy, M. J. (2015). What is machine learning? (pp. 3-11). Springer International Publishing.
Erofeev, A., Orlov, D., Ryzhov, A., & Koroteev, D. (2019). Prediction of porosity and permeability alteration based on ML algorithms. Transport in Porous Media, 128, 677-700.
Feng, D. C., Wang, W. J., Mangalathu, S., Hu, G., & Wu, T. (2021). Implementing ensemble learning methods to predict the shear strength of RC deep beams with/without web reinforcements. Engineering Structures, 235, 111979.
Fix, E., & Hodges, J. L. (1951). Discriminatory analysis. Nonparametric discrimination: Small sample performance. Report A, 193008.
Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M., & Suganthan, P. N. (2022). Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115, 105151.
Garg, A., & Tai, K. (2013). Comparison of statistical and machine learning methods in modelling of data with multicollinearity. International Journal of Modelling, Identification and Control, 18(4), 295-312.
Ghojogh, B., & Crowley, M. (2019). The theory behind overfitting, cross validation, regularization, bagging, and boosting: tutorial. arXiv preprint arXiv:1905.12787.
Gibbins, J., & Chalmers, H. (2008). Carbon capture and storage. Energy policy, 36(12), 4317-4322.
Handelman, G. S., Kok, H. K., Chandra, R. V., Razavi, A. H., Huang, S., Brooks, M., ... & Asadi, H. (2019). Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. American Journal of Roentgenology, 212(1), 38-43.
Hong, H., Liu, J., & Zhu, A. X. (2020). Modeling landslide susceptibility using LogitBoost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Science of the total environment, 718, 137231.
Hsu, W., & Elmore, J. G. (2019). Shining light into the black box of machine learning. JNCI: Journal of the National Cancer Institute, 111(9), 877-879.
Lu, L., Meng, X., Mao, Z., & Karniadakis, G. E. (2021). DeepXDE: A deep learning library for solving differential equations. SIAM review, 63(1), 208-228.
Ma, Y., Song, K., Wen, Z., Liu, G., Shang, Y., Lyu, L., ... & Hou, J. (2021). Remote sensing of turbidity for lakes in northeast China using Sentinel-2 images with machine learning algorithms. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 9132-9146.
Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9(1), 381-386.
Mapchart. (2024). World map: simple. https://www.mapchart.net/world.html
Mohammadian, E., Kheirollahi, M., Liu, B., Ostadhassan, M., & Sabet, M. (2022). A case study of petrophysical rock typing and permeability prediction using machine learning in a heterogenous carbonate reservoir in Iran. Scientific reports, 12(1), 4505.
Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American statistical association, 58(302), 415-434.
Mosavi, A., Sajedi Hosseini, F., Choubin, B., Goodarzi, M., Dineva, A. A., & Rafiei Sardooi, E. (2021). Ensemble boosting and bagging based ML models for groundwater potential prediction. Water Resources Management, 35, 23-37.
Ngo, G., Beard, R., & Chandra, R. (2022). Evolutionary bagging for ensemble learning. Neurocomputing, 510, 1-14.
Psichogios, D. C., & Ungar, L. H. (1992). A hybrid neural network‐first principles approach to process modeling. AIChE Journal, 38(10), 1499-1511.
Rincy, T. N., & Gupta, R. (2020, February). Ensemble learning techniques and its efficiency in ML: A survey. In 2nd international conference on data, engineering and applications (IDEA) (pp. 1-6). IEEE.
Ringrose, P. (2020). How to store CO2 underground: Insights from early-mover CCS projects.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408. https://doi.org/10.1037/h0042519
Schölkopf, B., Luo, Z., & Vovk, V. (Eds.). (2013). Empirical inference: Festschrift in honor of Vladimir N. Vapnik. Springer Science & Business Media.
Shirmohammadi, R., Aslani, A., & Ghasempour, R. (2020). Challenges of carbon capture technologies deployment in developing countries. Sustainable Energy Technologies and Assessments, 42, 100837.
Sun, H., Burton, H. V., & Huang, H. (2021). Machine learning applications for building structural design and performance assessment: State-of-the-art review. Journal of Building Engineering, 33, 101816.
Tao, L., Chen, G., & Li, Y. (2021). Machine learning discovery of high-temperature polymers. Patterns, 2(4).
Tariq, Z., Aljawad, M. S., Hasan, A., Murtaza, M., Mohammed, E., El-Husseiny, A., ... & Abdulraheem, A. (2021). A systematic review of data science and ML applications to the oil and gas industry. Journal of Petroleum Exploration and Production Technology, 1-36.
Tembely, M., AlSumaiti, A. M., & Alameri, W. (2020). A deep learning perspective on predicting permeability in porous media from network modeling to direct simulation. Computational Geosciences, 24(4), 1541-1556.
Temizel, C., Canbaz, C. H., Palabiyik, Y., Aydin, H., Tran, M., Ozyurtkan, M. H., ... & Johnson, P. (2021, October). A thorough review of ML applications in oil and gas industry. In SPE Asia Pacific Oil and Gas Conference and Exhibition (p. D031S025R002). SPE.
Tran, H., Kasha, A., Sakhaee-Pour, A., & Hussein, I. (2020). Predicting carbonate formation permeability using machine learning. Journal of Petroleum Science and Engineering, 195, 107581.
Walsh, I., Fishman, D., Garcia-Gasulla, D., Titma, T., Pollastri, G., Harrow, J., ... & Tosatto, S. C. (2021). DOME: recommendations for supervised machine learning validation in biology. Nature methods, 18(10), 1122-1127.
Wilberforce, T., Olabi, A. G., Sayed, E. T., Elsaid, K., & Abdelkareem, M. A. (2021). Progress in carbon capture technologies. Science of The Total Environment, 761, 143203.
Xu, D., Abbas, S., Rafique, K., & Ali, N. (2023). The race to net-zero emissions: Can green technological innovation and environmental regulation be the potential pathway to net-zero emissions?. Technology in Society, 75, 102364.
Xu, S. B., Huang, S. Y., Yuan, Z. G., Deng, X. H., & Jiang, K. (2020). Prediction of the Dst index with bagging ensemble-learning algorithm. The Astrophysical Journal Supplement Series, 248(1), 14.
Zhan, N., & Kitchin, J. R. (2022). Uncertainty quantification in ML and nonlinear least squares regression models. AIChE Journal, 68(6), e17516.
Zhang, Z., Zhang, H., Li, J., & Cai, Z. (2021). Permeability and porosity prediction using logging data in a heterogeneous dolomite reservoir: An integrated approach. Journal of Natural Gas Science and Engineering, 86, 103743.
Zolotukhin, A. B., & Gayubov, A. T. (2019, November). Machine learning in reservoir permeability prediction and modelling of fluid flow in porous media. In IOP Conference Series: Materials Science and Engineering (Vol. 700, No. 1, p. 012023). IOP Publishing.

No competing interests reported.

Porosity Prediction using Bagging Ensemble Machine Learning in CCUS Reservoirs. A Case Study: Darling Basin, Australia

Status:

Version 1

Abstract

Figures

1. Introduction

1.1 Ensemble machine learning algorithms

1.2 Carbon capture and storage

2. Methodology

2.1 Data acquisition

2.2 Data preprocessing

2.3 Data Analysis

3. Results and Discussion

3.1 Bagging ensemble models

3.2 Traditional ML models

4. Conclusions

Declarations

References

Additional Declarations

Status:

Version 1