Image recognition of carbonate fossils and abiotic particles based on deep convolutional neural network mode

doi:10.21203/rs.3.rs-4129309/v1

Download PDF

Article

Image recognition of carbonate fossils and abiotic particles based on deep convolutional neural network mode

https://doi.org/10.21203/rs.3.rs-4129309/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Thin sections of carbonate rock offer a more precise and accurate method for identifying mineral characteristics, types of fossils, pore structures, inorganic grain types, and cementation in rocks. Geologists can interpret the depositional environment, diagenesis, and reservoir characteristics of carbonate formations based on the information obtained from thin sections. To accurately identify paleontological fossils in carbonate rocks, geologists need to conduct extensive research on paleontological morphology and undergo extensive training under a microscope for extended periods of time to identify fossils in thin sections. Sometimes, hundreds of carbonate flakes need to be described, which consumes a lot of manpower, resources and money, resulting in limited objectivity and efficiency of the study. Some studies have utilized machine learning to classify carbonate rock particles. However, they have encountered challenges such as using a large number of samples, developing overly complex models, which increases the cost of experiments, and being limited in the recognition of various particle types, particularly rare paleontological types. In this study, we implemented an algorithm based on deep convolutional neural networks to automatically classify paleontological fossils and abiotic particles from thin-section photographs. The model ensures high accuracy in recognition while maintaining a low cost. We trained two classical deep convolutional neural network (DCNN) architectures, VGG-16 and ResNet-18, on the original dataset (1,266 images) and the augmented dataset (6,330 images) containing 11 types, respectively. On the original dataset, the accuracy of the VGG-16 architecture is 79.8%, and the accuracy of the ResNet-18 architecture is 83.9%. On the improved dataset, the VGG-16 architecture achieved 98.8% accuracy, while the ResNet-18 architecture achieved 100% accuracy. This study demonstrates that even small sample datasets can yield strong training results and higher classification accuracies through data augmentation methods. Our findings could provide geologists with an easier and faster way to accomplish the complex and time-consuming task of identifying microscopic flakes.

Earth and environmental sciences/Solid earth sciences/Palaeontology

Earth and environmental sciences/Solid earth sciences/Petrology

Carbonate

Deep learning

Computer Vision

Geology

Image classification

• The deep convolutional neural network algorithm can automatically classify carbonate micro-particles with high efficiency. 1

• This method can be used to identify a large number of particle types. 2

• On the original data set, the accuracy of the VGG-16 architecture was 79.8%, while the accuracy of the ResNet-18 architecture was 83.9%. 3

• On the improved data set, the VGG-16 architecture achieved 98.8% accuracy, while the ResNet-18 architecture achieved 100% accuracy. 4

• In image classification, if there are only small sample data sets, the method of data enhancement can be used to obtain higher accuracy. 5

Analyzing thin sections of rocks under the microscope and classifying them is a particularly important task in geological research. The primary tool used in petrography is the light microscope, which is used to observe and characterize rocks. This includes analyzing the geometry and structure of the grains, identifying fossil paleontology, and examining the structural characteristics of the minerals. The microscopic section of carbonate rock can reveal the characteristics of the type and content of the particles, which can provide important reference information for the subsequent exploration work. In traditional geological exploration studies, this task is manually identified by petrology experts and describes hundreds or even more thin sections of rock. This process consumes a lot of time and human resources, resulting in limited objectivity and efficiency of the research. Cheng et al. (2018) found that the increasing number of new rock sections has led to a gradual increase in the number of sections that need to be analyzed and archived by the geological community. These processes not only raise the economic cost of scientific research, but also deplete researchers' energy with repetitive and burdensome tasks, diverting their focus from more creative problem-solving. It is crucial to address the problem in order to improve efficiency and enable researchers to focus on more innovative issues.

In recent years, machine learning and deep learning have continued to be applied in various fields. Machine learning involves algorithms that enable machines to learn features from big data in order to intelligently identify new samples or make predictions about the future. Machine learning models trained with large amounts of data can learn more useful features. Such as face detection (Li et al., 2015; Garcia et al., 2017; Lan et al., 2024), medical image analysis and applications (Havaei et al., 2017), and healthcare (Gasparini et al., 2018), among others. Meanwhile, machine learning has made significant contributions to the geology community. Marmo et al. (2005) applied a multilayer perceptron algorithm to analyze over 1,000 thin slices of carbonate rocks from various marine environments. The model attained an accuracy of 93.3% on 268 test sets and 93.5% on 215 test sets.To address the common challenge of identifying basalt textures, Singh et al. (2010) utilized a multi-layer perceptron algorithm for a geological classification task and achieved a classification accuracy of 92.2% on 140 rock slices. Budennyy et al. (2017) aimed to evaluate the structural properties of rocks. They utilized a random forest classifier to distinguish between sandstone, limestone, and dolomite, achieving up to 95% prediction accuracy for these three rock types. However, these methods still have drawbacks. For instance, the classification is not detailed enough, the model is overly complicated, and it relies on too many complex mathematical methods. Some scholars have found that deep convolutional neural networks have the ability to automatically classify carbonate rock particles from rock thin section photographs (Lima et al., 2019; Koeshidayatullah et al., 2020; Idgunji et al., 2021; Koeshidayatullah et al., 2022). Liu et al. (2020) used convolutional neural network to accurately classify 22 types of carbonate rock particles, achieving over 90% accuracy in all four models. However, the study utilized a sample size of up to 13,000 images. Yu et al. (2021) trained a ResNet model to classify 10 types of carbonate rock bio-fossils, achieving a combined accuracy of 86%. Ho et al. (2023) employed the TaxonNet deep convolutional neural network architecture to classify six types of carbonate rock bio-particles with multiple labels, achieving an accuracy of over 90%. The above methods have improved the efficiency of geological research to varying degrees, and achieved relatively high precision. However, there are still some shortcomings, such as the relatively homogeneous identification of fossil and abiotic grain types by previous authors, and the efficient identification of only a small number of fossil and abiotic grain types. In carbonate rock microscopic thin sections, there is a wide range of fossil and abiotic grains, making it a challenging task to efficiently identify as many types of particles as possible. (2) The number of samples used by previous researchers is excessively large, leading to high training costs. Obtaining samples in the field of geology requires significant human, material, and financial resources. Accessing large numbers of rare fossils and abiotic particle samples is particularly challenging compared to the rich data available in other areas of deep learning. Therefore, it is worth exploring how to achieve higher accuracy rates when dealing with small sample sizes. Achieving more difficult tasks and classifying detailed images using easy and efficient methods has become a major challenge in today's world.

Deep convolutional neural networks are a type of deep learning model (Krizhevsky et al., 2012; Lecun et al., 2015) has higher accuracy compared with traditional neural networks (Garcia et al., 2017). An important feature of CNN is that it can be perceived locally and shared globally, enabling it to automatically extract essential features from original images for accurate classification. Additionally, It greatly simplifies the network parameters and speeds up the network training speed (Krizhevsky et al., 2012; Simonyan et al., 2015; Szegedy & Ioffe et al., 2015; Szegedy et al., 2016; Szegedy et al., 2017). In this study, we aim to utilize deep convolutional neural networks for the classification of fossil and abiotic grain types in carbonate rock flakes under small sample conditions. Our goal is to achieve more efficient and accurate classification. We classify carbonate rock particles into 11 types: V-shaped Crinoid, Irregular Crinoid, Round Crinoid, Ooid, Gastropod, Ostracod, Trilobita, Peloid, Coral, Foraminifer, and Algae. There are a total of 1266 images. Thin-section data were obtained from dense coring of the middle and upper parts of the Upper Ordovician Lianglitag Formation in well TZ72 in the Tazhong area of the Tarim Basin. The core process of this method, DCNN, has advantages such as local sensing and global sharing nature. We utilize this feature to delve into deep modeling to extract features of different types of particles in carbonate rock flakes, enabling us to capture diverse feature information. Subsequently, we introduce nonlinearities by applying activation functions (e.g., ReLU). This step enables the network to learn complex features. Finally, the full join layer is used to transform the mapping of the extracted features in order to classify carbonate rock flake particles. In computer vision tasks, including image classification and object detection, DCNN accuracy has surpassed human accuracy. Deep convolutional neural networks can outperform shallow neural networks when analyzing large datasets. In conclusion, DCNN can still guarantee high efficiency and accuracy in complex classification tasks.

The two main goals of this study are: (1) Expand the paleontological species identified under the microscope of carbonate rocks, including corals, foraminifers, gastropods and other rare paleontological species, so as to provide a data set of carbonate flake particle images for relevant researchers, in order to provide data support for the subsequent development of convolutional neural networks. (2) Achieve a higher accuracy rate when trained under small sample conditions, aiming to establish a framework and foundation for the application of convolutional neural networks in geology. The feasibility of this method has been proved by many experiments. Our training on raw data using the VGG-16 and ResNet-18 deep neural network architectures achieved highly competitive results, with accuracies of 78.9% and 83.5% at 100 epochs, respectively. Ho et al. (2023) also employed deep learning to classify carbonate rock particles, but they trained on the original dataset and achieved an accuracy of only 53%. In addition, we expanded the dataset through data augmentation and trained it using VGG-16 and ResNet-18 architectures. This resulted in achieving 98.8% and 100% accuracy at 100 epochs, respectively. Yu et al. (2021) employed the ResNet architecture for classifying carbonate microorganisms and achieved 98.8% and 100% accuracy with the use of data augmentation. The accuracy was only 86% when data enhancement was utilized. The results of this experiment are highly competitive with the latest carbonate fossil and abiotic grain classification methods.

All the samples in this experiment were obtained from the core samples collected during our previous geological expedition. The sampling points are located in different parts of the 170 m core of TZ72 well in the middle and upper section of the Upper Ordovician Lianglitage Formation in the Taizhong region of the Tarim Basin. One sample was selected at approximately 2 m. The samples were carefully chosen to avoid cracks, dissolution holes, and calcite veins. A total of 85 rock thin sections were prepared, and various types of fossils and abiotic grains in the microscopic thin sections were photographed at different scales (Figure. 1).

We used CorelDRAW software to delineate the typical granular areas in each thin section. All images are adjusted to 224 × 224 × 3. The images are then fed into a deep learning model for optimal training and prediction. while preserving important image features. The database ultimately contained 1,266 images and 11 fossil and abiotic grain types: V-shaped Crinoid (3.6%), Irregular Crinoid (13.5%), Round Crinoid (1.8%), Ooid (9.1%), Gastropod (1.7%), Ostracod (5.6%), Trilobita (1.5%), Peloid (49.4%), Coral (19.7%), Foraminifer (4.8%), and Algae (7%). Table 1 presents specific types of carbonatite grains and the number of images for each category. Some of the photographs in the dataset depict various grain types of carbonate rocks, as shown in Figure. 1. The images are in JPG format.

3.1. Experimental setup

All analysis programs were run on a ROG STRIX G15CF Workstation desktop with an Intel64 Family 6 Model 151 Stepping 2 Genuine Intel processor, operating on Windows 11 Professional. ~The device features a with onboard GB and a 12th Intel® Core™ i7-12700 running at i7-12700, 2.10 GHz. The Python and TensorFlow versions used are 3.9.18 and 2.9.1, respectively. We used several libraries, including PyTorch (Paszke et al., 2019) and NumPy (Oliphant, 2006) for deep learning training. The study can use all of the code in the making (https://github.com/Taoye1997/Classification-of-paleontological-images-based-on-cnn).

3.2. Data augmentation

Data enhancement is an image preprocessing technique that involves "zooming in" on a dataset (Wong et al., 2016; Xu et al., 2016). When training a CNN model with a small dataset, various preprocessing methods can be employed to transform an image into multiple images for neural network training. Commonly used methods include: (1) random horizontal and vertical flipping of the image; (2) random image rotation; (3) image scaling adjustment; (4) random image cropping; (5) pixel matrix transformation (e.g., by subtracting the average value); and (6) color space adjustment.

In this study, the existing image is transformed several times to improve the diversity of data and enhance its generalization. We performed the following four preprocessing steps on the images: (1) resizing the images, (2) randomly flipping the images horizontally and vertically, (3) adjusting the color space of the images (e.g., brightness and contrast), and (4) transforming the pixel matrices. The process is illustrated in Figure. 2. The original dataset for carbonate rock particle image classification contained 1266 images, which was expanded to 6330 images, resulting in a significant increase in the dataset size. The original carbonate rock particle image classification dataset, which contained 1266 images, was expanded to 6330 images. This expansion resulted in the creation of the carbonate rock particle micrographic image dataset, which will provide data support for the subsequent training of the deep learning model.

The data is divided into three categories: the training set (70% ), the validation set (20%), and the test set (10%) (Table 1).

Table 1

Number of training sets, validation sets and test sets for 11 types of particles
Order	Classes	Training set(70%)	Validation set(20%)	Test set(10%)	Total
1	V-shaped Crinoid	161	46	23	230
2	Irregular Crinoid	599	171	85.5	855
3	Round Crinoid	81	23	11.5	115
4	Ooid	403	115	57.5	575
5	Gastropod	74	21	10.5	105
6	Ostracod	249	71	35.5	355
7	Trilobita	67	19	9.5	95
8	Peloid	2191	626	313	3130
9	Coral	88	25	12.5	125
10	Foraminifer	214	61	30.5	305
11	Algae	308	88	44	440
	Total	4431	1266	633	6330

3.3. Convolutional Neural Networks

Convolutional neural networks can be regarded as the fusion of feature extraction and pattern recognition. The feature extraction process includes multiple alternating convolutional and pooling layers (Figure. 3a). The convolutional layers utilize a kernel to execute a convolutional operation on the output image of the preceding layer. Each image is made up of a matrix of pixel values, usually with three channels representing RGB. A convolution kernel is similar to a "window" used in image processing. A convolution kernel operates on a single image, and a roll-up calculation involves multiplying and summing the values to obtain a matrix, which is then convolved to produce a feature map (Figure. 3b). The convolution operation involves applying a convolution kernel with a region of corresponding size to the original image, acting on the entire image with a specific step size (Figure. 3b). Each feature map is usually followed by a ReLU function, which is a nonlinear activation function, f(x) = max(0, x), where x is the input data and f(x) is the output data). (McCulloch & Pitts, 1943; Nair & Hinton, 2010). Convolutional operations can extract more essential features from an image.

The pooling layer down samples the image following the convolution operation, a step that combines similar features into a single feature, and eliminates redundant information from the image (Figure. 3c). The image is convolved and down sampled to its features. These features are then converted into one-dimensional vectors and used as inputs to the pattern recognition component through full connectivity. This enables the entire network to achieve the goal of recognition and classification (Figure. 3d).The process of classification from the input image to the final output is called forward propagation. The addition of a softmax layer predicts the image and generates the true probability vector.

DCNN has the advantages of high precision, good repeatability and avoiding bias when applied to image classification (Hsiang et al., 2019). The current classical convolutional neural network models include VGG, GoogleNet, ResNet, InceptionNet, and other models. The performance of these models has been verified in research and practice in recent years (Simonyan & Zisserman, 2014; He et al., 2016; Szegedy et al., 2015; Szegedy et al., 2016).

3.4. Image classification

Two DCNN architectures were used in this study: the VGG-16 (developed by the Visual Geometry Group at Oxford University; Simonyan & Zisserman, 2014) and the ResNet-18 (developed by Szegedy et al., 2016). Both architectures have been successfully applied to large-scale image classification tasks, such as the ImageNet dataset (Russakovsky et al., 2015). Therefore, the present study utilizes these existing architectures and pre-trained initialized weights to implement transfer learning.

VGG-16

The first experiment utilized a classical artificial neural network, the Visual Geometry Group-16 network (Simonyan & Zisserman, 2014), as the DCNN backbone feature extractor. VGG-16 is a classic convolutional neural network with a string-type architecture, which can be utilized to construct deep models by reusing basic building blocks. VGG-16 consists of 16 layers, each utilizing a small 3×3 convolutional kernel (stride 1) and a 2×2 max pooling layer (stride 2). Stacking small convolutional kernels requires fewer parameters. The more convolutional processes are performed, the more detailed the feature extraction will be, with an increased amount of nonlinear variations being added. This is the basic starting point of VGG-16. Since VGG-16 does not connect across layers but instead enhances performance by progressively deepening the network, the images contain positional information. For all networks in this experiment, we utilized the modified linear unit (ReLU) as the activation function (Figure. 3A). Previous experiments have often involved taking pre-trained DCNN parameters from larger datasets (e.g., ImageNet) and making necessary modifications to adapt them to our smaller dataset (Simonyan & Zisserman, 2014; Russakovsky et al., 2015). Transfer learning can achieve higher accuracy on smaller datasets, while reducing the requirements on the size of the training dataset and shortening the duration of training (Tan et al., 2018).

ResNet-18

ResNet is short for Residual Network, and its architecture comprises multiple convolutional blocks. An important feature of these blocks is the presence of cross-layer connections within them. The input image x can be equally mapped by passing it across layers (Figure. 4). After the convolutional processing, a nonlinear function F(x) is produced. The depth of the ResNet model is determined by the number of convolutional blocks (Szegedy et al., 2016). Text: Regardless of the number of layers in ResNet networks, they all share the following characteristics: (1) The network consists of a total of 5 convolutional groups, each containing one or more basic convolutional computational processes; (2) each convolutional group includes one downsampling operation to halve the feature map size; (3) the first convolutional group contains only one convolutional computation operation, and the first convolutional groups of the 5 typical ResNet structures are identical, with convolution kernels of 7 × 7 and step sizes of 2; (4) the 2nd to 5th convolution groups all contain multiple identical residual units.

ResNet-18 is a CNN architecture with 17 convolutional layers and one fully connected layer. It is composed of 18 layers and utilizes the ReLU activation function after each convolutional block. The model utilizes 3x3 convolutional kernels in the average pooling layer. The final layer employs a softmax activation function (Table 2).

Table 2

Schematic of the ResNet-18 architecture network structure (He et al., 2016)
Convolutional layer name	Output Size	18th layer
Convolutional Layer 1	112×112	7×7, 64, Step 2
Convolutional Layer 2_x	56×56	3×3, 64, Step 2 Maximum pool layer, Step 2
Convolutional Layer 2_x	56×56
Convolutional Layer 3_x	28×28
Convolutional Layer 4_x	14×14
Convolutional Layer 5_x	7×7
	1×1	Average pool layer, softmax function
Floating-point operations per second		1.8×10⁹

We applied basic enhancements to PyTorch. Four images were created for each original image through enhancement. In the first experiment, we separately trained the original dataset and the augmented dataset using the VGG-16 architecture for 100 epochs. In the second experiment, we trained the augmented dataset using the VGG-16 architecture and the ResNet-18 architecture, each for 100 epochs. In all experiments, the dataset size is the variable, and the epoch is a quantitative measure, enabling a comparison of the training results between the original dataset and the augmented dataset.

During the training process (feature extraction phase), various visualization graphs such as heat maps and feature maps are generated using the Keras and PyTorch libraries (Samek et al., 2016). These graphs not only illustrate the performance of the network, but also show the feature graphs extracted from the convolution operations of different blocks. The shallower convolution layer can detect the lower-level features of the image, such as edges, shading, contours, etc., while the deeper convolution layer can extract more abstract features. During training, various features are extracted for different categories to facilitate understanding and enhance confidence in the object recognition process (Figure. 5).

4.1. VGG-16

For the initial experiment, we trained the original dataset and the augmented dataset separately using the VGG-16 architecture. The accuracy of the training using the original dataset improved rapidly in the first 20 epochs, reaching nearly 90%. The accuracy of the training set improved as the number of epochs increased, reaching 98.5% after 100 epochs (Figure. 6A). The loss function of the training set gradually decreases with each epoch, and after 100 epochs, the loss function decreases to 0.08 (Figure. 6C). Overall, this appears to be a more positive training outcome. However, as shown in Figureure 6B, the accuracy of the validation set for the original dataset does not change significantly with the increase in epochs. After 100 epochs, the accuracy of the validation set only reached 78.9% (Figure. 6B). The loss function of the validation set shows an increasing trend after 40 epochs (Figure. 6D). We hypothesize that these occurrences are due to the small size of the validation set data, with some types of validation sets containing only a few data points. The amount of available data is limited, but the model structure is more complex and susceptible to overfitting. When we trained the model using the augmented dataset, We found a rapid improvement in the accuracy of the training set, reaching 88.3% within the first 10 epochs. The accuracy of the training set gradually increases with the number of iterations. After 50 epochs, the training accuracy with the original dataset was essentially the same as that with the augmented dataset (Figure. 6A). The training loss function decreased to 0.08 after 100 epochs (Figure. 6C). The accuracy of the validation set reached 98.8% after 100 epochs (Figure. 6B), while the loss function decreased to 0.02 (Figure. 6D). The dataset was expanded to five times its previous size without experiencing the overfitting described earlier.

4.2. ResNet-18

The second experiment opted to utilize the ResNet-18 architecture for training the original and augmented datasets separately.

The training performance of ResNet-18 is notably superior to that of VGG-16, as illustrated in Figure. 7. Initially, we trained ResNet-18 using the original dataset, and observed the convergence of the curves after 18 epochs. The accuracy of the training set reached 100% (Figure. 7A), while the validation set accuracy was 83.5% (Figure. 7B). The training set loss function gradually converges to 0 after 18 iterations (Figure. 7C), while the validation set loss function converges to 0.7 (Figure. 7D). This training result is similar to the original VGG-16 data, both of which exhibit overfitting. We ruled out data leakage and initially hypothesized that the overfitting phenomenon was due to the small amount of data and the complexity of the model.

We retrained the model using the augmented dataset, and after 10 epochs, the accuracy of the training set reached 100% (Figure. 7A), while the accuracy of the validation set also reached 100% (Figure. 7B). The loss functions for the training set and validation set both converge to 0 within 10 epochs (Figure. 7C, D), and there is no overfitting phenomenon. Liu et al. (2020) employed the ResNet architecture to classify carbonate rock particles under the microscope, achieving a training accuracy of 100% after 40 epochs.

5.1. Performance evaluation

Evaluation metrics are a method for visualizing, organizing, and selecting classifications based on the performance of artificial neural networks (Fawcett, 2006). It includes several metrics, such as TP, FP,TN, and FN, while designating one of the labels as the positive label (typically representing a class of interest) (Table 3). Several common metrics are derived from the confusion matrix, including accuracy, precision, recall, and the F1 score(Sarkar et al., 2018).

Table 3

Categorization task evaluation metrics
	Positive examples (Projections)	Negative examples (Projections)
Positive examples (Ture situations)	TP (ture positive) When the projection is positive and the true situation is positive	FN (false negative) When the projection is negative and the true situation is positive
Negative examples (Ture situations)	FP (false positive) When the projection is positive and the true situation is negative	TN (false negative) When the projection is negative and the true situation is negative

\(\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}: \frac{(\text{T}\text{P}+\text{T}\text{N})}{(\text{T}\text{P}+\text{F}\text{P}+\text{T}\text{N}+\text{F}\text{N})} = \frac{\text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t}\text{l}\text{y} \text{p}\text{r}\text{e}\text{d}\text{i}\text{c}\text{t}\text{e}\text{d} \text{s}\text{a}\text{m}\text{p}\text{l}\text{e}\text{s}}{\text{T}\text{o}\text{t}\text{a}\text{l} \text{s}\text{a}\text{m}\text{p}\text{l}\text{e} \text{s}\text{i}\text{z}\text{e}}\)

(1)

\(\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}: \frac{\left(\text{T}\text{P}\right)}{(\text{T}\text{P}+\text{F}\text{P})} = \frac{\text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e} \text{s}\text{a}\text{m}\text{p}\text{l}\text{e}\text{s} \text{w}\text{i}\text{t}\text{h} \text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t} \text{p}\text{r}\text{e}\text{d}\text{i}\text{c}\text{t}\text{i}\text{o}\text{n}\text{s}}{\text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{s}\text{a}\text{m}\text{p}\text{l}\text{e}\text{s} \text{w}\text{i}\text{t}\text{h} \text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e} \text{p}\text{r}\text{e}\text{d}\text{i}\text{c}\text{t}\text{i}\text{o}\text{n}\text{s}} \text{o}\text{r}\)

\(\frac{\left(\text{T}\text{N}\right)}{(\text{F}\text{N}+\text{T}\text{N})}=\frac{\text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{n}\text{e}\text{g}\text{a}\text{t}\text{i}\text{v}\text{e} \text{s}\text{a}\text{m}\text{p}\text{l}\text{e}\text{s} \text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t}\text{l}\text{y} \text{p}\text{r}\text{e}\text{d}\text{i}\text{c}\text{t}\text{e}\text{d}}{\text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{s}\text{a}\text{m}\text{p}\text{l}\text{e}\text{s} \text{w}\text{i}\text{t}\text{h} \text{n}\text{e}\text{g}\text{a}\text{t}\text{i}\text{v}\text{e} \text{p}\text{r}\text{e}\text{d}\text{i}\text{c}\text{t}\text{i}\text{o}\text{n}\text{s}}\)

(2)

\(\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}: \frac{\left(\text{T}\text{P}\right)}{(\text{T}\text{P}+\text{F}\text{N})} = \frac{\text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e} \text{s}\text{a}\text{m}\text{p}\text{l}\text{e}\text{s} \text{w}\text{i}\text{t}\text{h} \text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t} \text{p}\text{r}\text{e}\text{d}\text{i}\text{c}\text{t}\text{i}\text{o}\text{n}\text{s}}{\text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{s}\text{a}\text{m}\text{p}\text{l}\text{e}\text{s} \text{w}\text{i}\text{t}\text{h} \text{p}\text{o}\text{s}\text{i}\text{t}\text{i}\text{v}\text{e} \text{t}\text{r}\text{u}\text{e} \text{l}\text{a}\text{b}\text{e}\text{l}\text{s}}\)

(3)

F1-score (Combined Rrecision and Recall assessment) = \(\frac{2\ast \text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}\ast \text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}{\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}+\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}\)

(4)

The experiment's metrics, including accuracy, precision (P), recall (R), and F-Measure (F1), were computed using the provided formulas (Figures. 8 and 9).

Visualizing features extracted from the confusion matrix revealed that, despite the deep convolutional neural network's high generalization ability, the VGG-16 architecture shows unclear recognition in specific categories when trained with the original data, particularly with small samples (single-digit test sets): gastropod, round crinoid, irregular crinoid, trilobita, and v-shaped crinoid. In contrast, better recognition was observed for foraminifera, ostracoda, algae, coral, oolit, and peloid (Figure. 8A). In contrast, the ResNet-18 architecture demonstrates proficient recognition of foraminifera, ostracoda, algae, oolit, and peloid, but exhibits unclear recognition for other particle types (Figure. 8B). Within our 11-category image dataset, certain images exhibit a higher proportion of background and fewer discernible fossil features. The convolutional neural network architecture may misclassify features if the algorithm captures background noise instead of the actual fossil features. Addressing these issues can be achieved through the collection of additional data or the implementation of methods such as data augmentation (Liu et al., 2020).

Analysis of diverse data metrics, including F1-score, accuracy, precision, recall, and their integration with the confusion matrix (Figure.8), indicates that the performance evaluation of the VGG-16 architecture is inferior to that of the ResNet-18 architecture. Furthermore, the impact of overfitting in VGG-16 is more pronounced. In theory, a deeper network enhances feature extraction capabilities, but it is also prone to gradient "explosion," hindering network convergence. Consequently, caution is advised when employing the VGG architecture for training small datasets (Chatfield et al., 2014). Insufficient sample data can result in overfitting, exemplified by instances like the test set containing only 2 gastropods samples and 2 round sea lilys. This poses a significant challenge for machine classification. The dataset's structure influences machine classification performance. Notably, more common archaea, like peloids, dominate with a frequency reaching 49.4% in the dataset, while rarer types, such as trilobita, constitute only 1.56% of the entire dataset. Human image classification tends to exhibit high accuracy for common taxonomic units and lower accuracy for rare types. Human accuracy is heavily reliant on individual performance, introducing stochastic and unpredictable elements. While the DCNN structure also faces challenges with unbalanced data, its impact is relatively minor compared to human classifiers. To mitigate issues in machine classification, we employed data augmentation to alleviate overfitting, yielding effective results.

5.2. Improving accuracy with limited datasets

Data augmentation stands out as the prevailing method in computer vision for enhancing network generalization. Employing suitable data enhancement methods enhances both the generalization ability and accuracy of the model (Liu et al., 2020). The augmented dataset was trained using VGG-16 and ResNet-18 architectures. We computed evaluation metrics including F1-score, Accuracy, Precision, and Recall, and illustrated the average classification performance of the model (Figure. 9). In Figure. 9, the diagonal elements of the confusion matrix signify accurate predictions for each class, while the off-diagonal elements denote incorrect predictions.

VGG-16 accurately classified the majority of species, achieving values exceeding 90% for each category and even 100% for certain types. However, the trilobite category yielded a lower accuracy rate of 88.89% (Figure. 9A). Comparison with Figure. 9A revealed significant enhancement in the recognition of gastropods, trilobites, round crinoids, and V-shaped crinoids by the data-enhanced VGG-16 architecture. In contrast, the ResNet-18 architecture achieved complete accuracy in categorizing each class, with values of 100% for all categories (Figure. 9B). Contrasting with Figure. 8B, the ResNet-18 architecture significantly enhanced the classification of gastropods, irregular crinoids, round crinoids, trilobites, V-shaped crinoids, and coral. This indicates that even with a small sample dataset, employing data enhancement methods can yield favorable training outcomes and higher classification accuracy.

This study proposes an automated method for classifying carbonate microscopic thin-section particles through deep convolutional neural networks. The aim is to mitigate issues related to low efficiency, accuracy, and objectivity inherent in manual classification of carbonate microscopic paleontology. A key attribute of DCNN is its combination of local perception and global sharing. This not only automatically extracts essential features from the original image for accurate classification but also efficiently reduces the number of network parameters, speeding up network training. In this study, a highly restricted dataset comprising 11 types was utilized. Two classical DCNN models, namely the VGG-16 and ResNet-18 architectures, were trained on the original dataset (1266 images) and the augmented dataset (6330 images) individually. The original dataset achieved accuracies of 79.8% and 83.9% on the VGG-16 and ResNet-18 architectures, respectively. The enhanced dataset yielded accuracies of 98.8% and 100% on the VGG-16 and ResNet-18 architectures, respectively. In the context of image classification, the ResNet-18 architecture outperforms the VGG-16 architecture in terms of both training speed and classification accuracy. Additionally, this study investigates methods to enhance network performance and accuracy with limited datasets, suggesting that training with enhanced data yields superior results compared to training with original data using the same architecture. Transfer learning emerges as a potent technique for implementing intricate DCNN architectures in comparatively small datasets. The findings of this study demonstrate that even small sample datasets can yield favorable training outcomes and improved classification accuracy through data augmentation. These findings offer geologists an alternative and convenient approach to the routine and labor-intensive task of identifying microscopic thin sections.

Author Contribution

Ye Tao 1: Ye Tao 1 designed the research direction and details, conducted analysis, designed the program, and wrote the manuscript. Zhidong Bao 2: Zhidong Bao 2 provided financial support, reviewed and checked the manuscript, and proposed many revisions. Fukang Ma 3: Fukang Ma 3 provided assistance in writing code and software applications for this manuscript. Da Gao 4: Da Gao 4 provided the original data set required for this manuscript. Youbin He 5: Youbin He 5 contributes to data processing. Fengxiang Wang 6: Fengxiang Wang 6 provided the idea of this study, and modified and improved the code part.

Acknowledgments

The authors would like to acknowledge the School of Geosciences of China University of Petroleum (Beijing) for providing us with a good research platform. We would like to thank the PetroChina Tarim Oilfield Company for data support. We thank the reviewers that helped us improve the quality of the manuscript.

Data Availability

Name of the code/library: Classification-of-paleontological-images-based-on-cnn. Contact: [email protected] requirements: None.Program language: Python.Software required: Python 3.9.18 (available for Windows, Linux, and macOS).Program size: 23KB.The source codes are available for downloading at the link: https://github.com/Taoye1997/Classification-of-paleontological-images-based-on-cnn

Budennyy, S., Pachezhertsev, A., Bukharev, A., Erofeev, A., Mitrushkin, D., & Belozerov, B., 2017. Image processing and machine learning approaches for petrographic thin section analysis. In SPE Russian Petroleum Technology Conference? (p. D023S014R005). SPE. https://doi.org/10.2118/187885-MS.
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A., 2014. Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv. https:// doi.org/1405.3531. 10.5244/c.28.6.
Cheng, G., Yue, Q., Qiang, X., 2018. Research on feasibility of convolution neural networks for rock thin sections image retrieval. In 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC) (pp. 2539-2542). IEEE. https://doi.org/10.1109/imcec.2018.8469642.
Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 861-874. https://doi.org/10.1016/j.patrec.2005.10.010.
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., & Garcia-Rodriguez, J., 2017. A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv: 1704.06857. http://arxiv.org/abs/1704.06857.
Gasparini, S., Campolo, M., Ieracitano, C., Mammone, N., Ferlazzo, E., Sueri, C., et al, 2018. Information theoretic-based interpretation of a deep neural network approach in diagnosing psychogenic non-epileptic seizures. Entropy, 20(2), 43. https://doi.org/10.3390/e20020043.
Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., ... & Larochelle, H., 2017. Brain tumor segmentation with deep neural networks. Medical image analysis, 35, 18-31. https://doi.org/10.1016/j.media.2016.05.004.
He, K., Zhang, X., Ren, S., & Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://doi.org/10.1109/cvpr.2016.90.
Ho, M., Idgunji, S., Payne, J. L., & Koeshidayatullah, A., 2023. Hierarchical multi-label taxonomic classification of carbonate skeletal grains with deep learning. Sedimentary Geology, 106298. https://doi.org/10.1016/j.sedgeo.2022.106298.
Idgunji, S., Ho, M., Payne, J.L., Lehrmann, D., Morsilli, M., Al-Ramadan, K. and Koeshidayatullah, A., 2021. Deep Neural Networks for Hierarchical Taxonomic Fossil Classification of Carbonate Skeletal grains. In EGU General Assembly Conference Abstracts pp. EGU21-16394. https://doi.org/10.5194/egusphere-egu21-16394
Ioffe, S., & Szegedy, C., 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr. http://arxiv.org/abs/1502.03167.
Koeshidayatullah, A., Morsilli, M., Lehrmann, D.J., Al-Ramadan, K., Payne, J.L., 2020. Fully automated carbonate petrography using deep convolutional neural networks. Marine and Petroleum Geology 122, 104687. https://doi.org/10.31223/osf.io/necbm.
Koeshidayatullah, A., Trower, E.J., Li, X., Mukerji, T., Lehrmann, D.J., Morsilli, M., AlRamadan, K., Payne, J.L., 2022. Quantitative evaluation of the roles of ocean chemistry and climate on ooid size across the Phanerozoic: global versus local controls. Sedimentology. https://doi.org/10.1111/sed.12998.
Krizhevsky, A., Sutskever, I., & Hinton, G. E., 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25. LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436-444. https://doi.org/10.1145/3065386.
Lan, L., Wang, F., Li, S., Zheng, X., Wang, Z., and Liu, X., 2024. “Efficient prompt tuning of large vision-language model for fine-grained ship classification,”arXiv preprint arXiv:12403.08271.
Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G., 2015. A convolutional neural network cascade for face detection. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5325–5334). https://doi.org/10.1109/cvpr.2015.7299170.
Lima, P.R., Suriamin, F., Marfurt, K.J., Pranter, M.J., 2019. Convolutional neural networks as aid in core lithofacies classification. Interpretation 7 (3), SF27–SF40. https://doi.org/10.1190/int-2018-0245.1.
Liu, X., & Song, H., 2020. Automatic identification of fossils and abiotic grains during carbonate microfacies analysis using deep convolutional neural networks. Sedimentary Geology, 410, 105790. https://doi.org/10.1016/j.sedgeo.2020.105790.
Marmo, R., Amodio, S., Tagliaferri, R., Ferreri, V., & Longo, G., 2005. Textural identification of carbonate rocks by image processing and neural network: Methodology proposal and examples. Scientific Reports, 31(5), 649-659. https://doi.org/10.1016/j.cageo.2004.11.016.
McCulloch, W. S., & Pitts, W., 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5, 115-133. https://doi.org/ 10.7551/mitpress/12274.003.0011.
Nair, V., & Hinton, G. E., 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).
Oliphant, T.E., 2006. A guide to NumPy. Vol. 1. Trelgol Publishing, USA, p. 85.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., 2019. Pytorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32. http://arxiv.org/abs/1912.01703.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 211-252. https://doi.org/10.1007/s11263-015-0816-y.
Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R., 2016. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neur. Network. Learn. Syst. 28 (11), 2660–2673. https://doi.org/10.1109/tnnls.2016.2599820.
Sarkar, D., Bali, R., & Ghosh, T., 2018. Hands-On Transfer Learning with Python: Implement advanced deep learning and neural network models using TensorFlow and Keras. Packt Publishing Ltd.
Simonyan, K., & Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. http://arxiv.org/abs/1409.1556.
Singh, N., Singh, T. N., Tiwary, A., & Sarkar, K. M., 2010. Textural identification of basaltic rock mass using image processing and neural network. Computational Geosciences, 14, 301-310. https://doi.org/10.1007/s10596-009-9154-x.
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A., 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1). https://doi.org/10.1609/aaai.v31i1.11231.
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A., 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1). https://doi.org/10.1109/cvpr.2015.7298594.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826). https://doi.org/10.1109/cvpr.2016.308.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C., 2018. A Survey on Deep Transfer Learning. Artificial Neural Networks and Machine Learning-ICANN 2018. Springer International Publishing, Cham, pp. 270-279. https://doi.org/10.1007/978-3-030-01424-7_27.
Wong, S.C., Gatt, A., Stamatescu, V., McDonnell, M.D., 2016. Understanding Data Augmentation for Classification: When to Warp? 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, Australia, pp. 1-6. https://doi.org/10.1109/dicta.2016.7797091.
Xu, Y., Jia, R., Mou, L., Li, G., Chen, Y., Lu, Y., Jin, Z., 2016. Improved relation classification by deep recurrent neural networks with data augmentation. arXiv preprint arXiv: 1601. 03651. https://doi.org/10.1109/dicta.2016.7797091.
Yu, X., Ye, K., Du, C., Gong, H., Ma, Z., 2021. Microscopic image recognition of carbonate rock biofossils based on convolutional neural network. Petroleum Experimental Geology, 43(5): 880-885, 895. https://10.11781/sysydz202105880.

No competing interests reported.

Download PDF

Reviewers invited by journal
02 Apr, 2024
Editor assigned by journal
02 Apr, 2024
Editor invited by journal
30 Mar, 2024
Submission checks completed at journal
30 Mar, 2024
First submitted to journal
19 Mar, 2024

You are reading this latest preprint version

Image recognition of carbonate fossils and abiotic particles based on deep convolutional neural network mode

Status:

Version 1

Abstract

Figures

Highlights

1. Introduction

2. Data

3. Methods

3.1. Experimental setup

3.2. Data augmentation

3.3. Convolutional Neural Networks

3.4. Image classification

4. Results

4.1. VGG-16

4.2. ResNet-18

5. Disscussion

5.1. Performance evaluation

5.2. Improving accuracy with limited datasets

6. Conclusions

Declarations

Author Contribution

Acknowledgments

Data Availability

References

Additional Declarations

Status:

Version 1