Ovarian cancer remains a formidable challenge in the realm of oncology, with metastatic spread being a critical determinant of patient prognosis and treatment strategies. The accurate and early detection of metastatic lesions in ovarian tissue is imperative for effective clinical decision-making. In recent years, the advent of DL algorithms, particularly 3D CNN, perform excellently in the field of medical and health image analysis. This research leverages the power of 3D CNNs to address the vital task of metastatic ovarian tumor detection. By harnessing advanced computational methods and meticulously curated histopathological datasets, we aim to enhance the precision and reliability of MOT diagnosis. The proposed Enhanced 3D CNN is as shown in Figure.1A comprehensive account of our methods and materials, dataset, pre-processing procedures, architecture of the Enhanced 3D CNN, training methodology, and evaluation metrics employed to assess the performance of our model are presented as follows.
A. Dataset
The Ovarian Bevacizumab Response (OBR) dataset [19] comprises around 288 both benign and metastatic cases with hematoxylin and eosin stained (basic reason for tumor) whole slides, accompanied by clinical details from 78 patients. These slides were sourced from the tissue bank of the Tri-Service General Hospital in Taipei, Taiwan. Acquisition of these MR images was done using Leica AT2 scanner equipped with a 20x objective lens. The ovarian cancer slides have an average dimension of 54342x41048 pixels and measure approximately 27.34x20.66mm. The dataset were sourced from https://www.cancerimagingarchive.net/collection/ovarian-bevacizumab-response/
B. Data Augmentation
Data augmentation was instrumental in diversifying and expanding the MRI image dataset for the critical task of detecting metastatic ovarian tumors. This augmentation process involved a combination of techniques, including random cropping, resizing, and spatial deformation. Random cropping was employed to select random regions of the original images, followed by resizing to ensure uniformity. However, these techniques alone had limitations in capturing the complexity of real-world tumor appearance variations, especially for tumors with intricate morphologies and variations in imaging conditions. Recognizing this limitation, spatial deformation was introduced to the augmentation pipeline. Spatial deformation added realistic distortions, mimicking the nuanced variability seen in clinical scans. The decision to combine these techniques was justified by the necessity to strike a balance between dataset consistency and replicating genuine tumor appearance variations. This approach ensured that our deep learning model learned not only from a diverse range of images but also from augmented data that closely resembled the clinical complexity of metastatic ovarian tumors. Consequently, the model's ability to accurately detect tumors and generalize to unseen data was greatly enhanced.
The number of images generated through data augmentation varies due to the factors, together with the chosen augmentation parameters and the extent of augmentation applied to each original image. In one approach, random cropping and resizing are performed, resulting in approximately 5 augmented images produced from every original MR. Additionally, when applying spatial deformation, an average of 5 additional augmented images are produced from every original MR image. A sample of augmented images is shown in Figure.2.We started with an original dataset containing 632 MRI images, the full amount number of augmented MR images can be calculated as follows: For random cropping and resizing, it yields 3160 augmented images (632 original images multiplied by 5 augmented images each). For spatial deformation, a similar result of 3160 augmented images is obtained. When both augmentation techniques are combined, the total number of augmented images sums up to 6320.Table 2 provides the image samples used for this investigation.
Table 2
Image samples obtained from augmentation
Augmentation Technique | Number of Augmented Images per Original Image | Total Augmented Images |
Random Cropping and Resizing | 5 | 3160 |
Spatial Deformation | 5 | 3160 |
Combined (Both Techniques) | N/A | 6320 |
C. Enhanced 3 Dimensional CNN
A specialized deep learning model known as a 3 Dimensional CNN is well-suited for the task of identifying metastatic ovarian tumors in MRI images. In contrast to 2D CNNs, which analyze two-dimensional images, 3D CNNs are tailored to handle the huge amount of MRI data, enabling them to incarcerate spatial relationships in all 3 dimensions. These networks employ a sequence of convolutional layers to extract complex features, recognizing patterns and textures throughout MRI volumes. Data augmentation techniques are employed to enhance the model's adaptability, introducing variations in tumor attributes, including size, position, and orientation, within the training dataset. Through training and validation processes, the model learns and fine-tunes its parameters, while feature maps depict essential learned characteristics. Additionally, the model can perform segmentation, delineating tumor boundaries and regions of interest within the MRI volume. Ultimately, this model serves as a potent tool for early detection and improved patient outcomes in cases of ovarian cancer, thanks to its capability to process 3D data and capture crucial spatial relationships vital for tumor identification and localization. The Enhanced 3D CNN architecture was integrated with residual connections. It includes two specialized residual sectors, with one of them featuring an additional layer in its skip connection. The detailed representation of this architecture is referred in Table 3.
Table 3
Architecture Details of Enhanced 3D CNN
3D CNN Layer | Output Size |
Input Layer | (128, 128, 128, 1) |
Zero Padding | (132, 132, 132, 1) |
3D Convolutional | (64 ,64, 64, 1) |
Batch Normalization | (64, 64, 64, 1) |
Activation | (64, 64, 64, 1) |
3D max pooling | (32, 32, 32, 64) |
Residual Layer 1 & 2 | (32, 32, 32, 256) |
Residual Layer 3 & 4 | (16, 16, 16, 512) |
Residual Layer 5 & 6 | (8, 8, 8, 1024) |
Residual Layer 7 & 8 | (4, 4, 4, 2048) |
3D Average pooling | (2, 2, 2, 2048) |
Fully connected Layer | 1 |
The convolution process is employed on a three-dimensional input volume, MRI. The mathematical expression for an individual convolution operation at a particular position within the input volume is as follows: For each position (a, b, c) in the output feature map is the sum of products of elements in the input volume (J) and the corresponding elements in the 3D convolution kernel (L).A stride (s) was applied to determine the step size and padding (p) was included by adding zeros around the input volume.
$$\:\left(\:J\:\times\:L\right)\left(a,b,c\right)={\sum\:}_{i=0}^{n-1}{\sum\:}_{i=0}^{m-1}{\sum\:}_{i=0}^{s-1}(a.s+x-p.b.s+y-p,c.s+L-a.L\left(x,y,z\right))$$
1
Following the convolution operation, a non-linear activation function like the Rectified Linear Unit (ReLU) is employed on an element-by-element basis. The ReLU function (RL) is represented as:
This equation substitutes any negative values in the feature maps with zeros, facilitating the network’s ability to discern intricate patterns. Pooling layers serve to decrease the spatial dimensions of feature maps while preserving vital information. Max Pooling (M) selects the maximum value within a specified pooling window, and its formula is expressed as:
M (a,b,c) = max l,m,n (J(a.s + l,b.s + m.s + m,c.s + L)) (3)
s determines the stride, controlling how much the pooling window shifts in each dimension. (a,b,c) denotes the position in the resulting pooled output. A fully connected layer links the results from the convolutional and pooling layers to form a vector that has been flattened. The mathematical representation of a dense layer's operation is given by:
I is the input vector, Wt and m are weight matrix and bias vector respectively. In the final layer of a classification network, SoftMax activation is applied to transform the network’s initial scores into probabilities for each class. The SoftMax function (S) can be defined as follows: ti signifies the score for class i.CL represents the total number of classes.
S= \(\:\frac{{e}^{ti}}{\sum\:_{j=1}^{CL}{e}^{tj}}\) (5)
The generation of feature maps in this architecture primarily relies on 3-dimensional Convolution layers, effectively extracting critical image features from the volumetric MRI data.The enhanced 3-dimensional CNNs processed data in batches for higher classification accuracy. With batch processing, the convolution operation is performed on a batch of input volumes, and the result is computed for each volume in the batch. A bias term (m) is added to the output of the convolution operation for each filter. This bias term is an additional learnable parameter that affects the final output.
$$\:\left(\:J\:\times\:L+m\right)\left(a,b,c\right)={\sum\:}_{i=0}^{n-1}{\sum\:}_{i=0}^{m-1}{\sum\:}_{i=0}^{s-1}(a.s+x-p.b.s+y-p,c.s+L-a.L\left(x,y,z\right))+m$$
6
To determine the dimensions of the output feature map, width (Ow), height (Oh), and depth (Od) of the output are calculated based on the dimensions of the input volume, the convolution kernel, stride (S), width of the kernel (Wk), height of the kernel (Hk), depth of the kernel (Dk) and padding (P). These formulas ensure that the output feature map size is appropriately adjusted.
Ow = \(\:\:\:\frac{\:W-Wk+2.P}{8}\:+1\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\) (7)
Oh = \(\:\:\:\:\:\frac{W-Hk+2.P}{8}\:+1\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\) (8)
Od = \(\:\:\:\:\frac{W-Dk+2.P}{8}\:+1\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\) (9)
In essence, the convolution operation in an Enhanced 3D CNN involves sliding a 3D kernel over the input volume, performing element-wise multiplications, accumulating results, and adding a bias term when necessary. This process is repeated across the entire input volume to produce the final feature map. The image collections were divided into three distinct groups: 75 per cent of the dataset was used for training set, for validation 10 per cent, and 15 per cent for testing set. To enhance the training sets, augmentation techniques such as random rotations and Spatio deformation were applied. The validation set played a pivotal role in the selection of various model hyper parameters, including aspects like learning rate, architecture of the DL network, and decay steps. This process allowed us to identify and establish the best-performing 3D CNN model. Subsequently, the test subset was employed to evaluate the performance of the proposed model. The networks underwent a training process spanning 100 epochs, employing a batch size of 4.Adam optimizer along with learning rate of 0.00001 was used. Training concluded when the validation accuracy reached a point of stabilization and ceased to increase. To create the proposed model, we utilized Python programming language in conjunction with TensorFlow version 2.9.1 and the Keras framework version 2.9.0 tailored for neural networks. The DL processes were conducted on a computer device equipped with i7 core CPU running at 3.60 GHz, 64 GB of RAM, and RTX 3090 GPU.