3.1 EXPERIMENT ANALYSIS
3.1.1 CYCLEGAN-BASED MULTI-SEQUENCE DATA AMPLIFICATION
We use the image data of 374 patients for CycleGAN training, including 280 T1 MRI spatial sequences and 94 T2 MRI spatial sequences. We train a total of 120 times, in which the loss of the generator and the discriminator is shown in Figure 9. When the number of training reaches 90 epochs, the loss of the discriminator reaches its minimum and becomes stable.
We use 152 patient datawith labels(including 112 T1 MRI spatial sequences and 40 T2 MRI spatial sequences) to augment the data using the trained cyclegan model. As a result, there is a multi-sequence of 24 slices (12 T1 slices and 12 T2 slices) for each patient.The result (after 120 times of training) is shown in Figure 10.
Figure 10 shows the original MR image in two domains and the MR image reconstructed after two conversions by the domain converter. Visually, the difference between a real MR image and a transformed MR image is very small.
3.1.2 SEMI-SUPERVISED PITUITARY TUMOR TEXTURE IMAGE CLASSIFICATION BASED ON ADAPTIVELY OPTIMIZATED FEATURE EXTRACTION
After being amplified by CycleGAN, the dataset was then fed to the Auto-Encoder for feature extraction using unsupervised learning. Supervised learning is conducted during the CRNN texture classification stage.
To ensure reliable comparisons, all the models were trained 100 steps in the feature extraction stage. The training process of multi-sequences is shown in Figure 11, and the curve of the single-modal baseline is similar.
It can be seen from the figure that when the model is trained 100 steps, the loss curve reaches its lowest point, which is 0.01, and feature extraction network almost achieves the optimal solution.
The architecture of the experiment can be divided into three models, namely the multi-sequence model, the T1 domain model and the T2 domain model. The multi-sequence (medical image classification) model is compared to two single-modal baseline models:
(1) T1 domain model: We only consider the MRI spatial sequence of T1 domain of all patients, including the MRI spatial sequence generated from another domain converter.
(2) T2 domain model: We only consider the MRI spatial sequence of T2 domain of all patients, including the MRI spatial sequence generated from another domain converter.
(3) Multi-sequence model: We use the trained domain converter to construct an MRI multi-sequence in both T1 and T2 domains, including the MRI spatial sequence generated by the domain converters.
In the texture classification stage, there are many neural network model parameters in the experiment, but a small number of trained samples. This could potentially cause over-fitting. To avoid this issue, we use Dropout and EarlyStopping methods during the training process. The Droupout ratio is set to be 0.5, that is, for all the neural network units in model, they are temporarily discarded from the network with a probability of 50%. We set the patience value of earlyStopping to be 2 and the monitor to be 'val_loss'. That is, if the value of 'val_loss' does not decrease relative to the previous epoch during model training, the model is stopped after 2 epochs. The T1 domain, T2 domain, and multi-sequence model training process are shown in the following figures: See figures 12-14.
As can be seen from Figure 12-14, we performed 6 replicate experiments on the T1 domain, T2 domain and the multi-sequence domain. In our experiment, we randomly divide the dataset into training dataset (70%), test dataset (15%), and verification dataset (15%). We repeated this process 6 times, and recorded the average and variance of 6 classification accuracy rates. Table 1 shows the details of classification, and Table 2 shows precision, recall and F1-score of classfication:
TABLE 1 PITUITARY TUMOR CLASSIFICATION ACCURACY
|
Multi-sequence(%)
|
T1 domain(%)
|
T2 domain(%)
|
Train
Verification
Test
|
98.8±1.24
92.82±1.23
91.78±1.44
|
97.55±1.40
91.70±1.61
89.24±3.11
|
97.41±1.37
91.15±1.13
88.98±4.23
|
TABLE 2 PRECISION, RECALL AND F1-SCORE OF PITUITARY TUMOR CLASSIFICATION
|
Precision(%)
|
Recall(%)
|
F1-score(%)
|
T1 domain
T2 domain
Multi-sequence
|
86.81±3.67
87.07±3.71
89.89±4.02
|
93.33±5.96
94.44±5.02
95.55±5.44
|
89.80±2.64
90.41±2.15
92.46±1.74
|
TABLE 3 COMPARISONS OF CLASSIFICATION RESULTS OF DIFFERENT METHODS
Feature extraction
|
Texture classification
|
Accuracy(%)
|
Time(s)
|
——
|
VGG
|
69
|
113
|
——
——
|
ResNet
DenseNet
|
78.25
81.25
|
105
97
|
——
ResNet+ ResNet
DenseNet+DenseNet
|
CRNN
CRNN
CRNN
|
73.7
88.76
90.33
|
67
43
43
|
DenseNet+ResNet
DenseNet+ResNet
|
CRNN
RNN
|
91.78
89.12
|
42
42
|
As can be seen from the above table, our proposed DenseNet+ResNet+CRNN architecture significantly outperforms all other methods in terms of running time and classification accuracy. Our method has the fastest convergence rate and thus shortest running time. From the perspective of classification accuracy, we can see that adding an Auto-Encoder-based feature extractor before CRNN can considerably improve the performance. In summary, the comparative experiment suggests that our CycleGAN-based classification model and the adaptively optimized feature extraction has great potential of yielding accurate texture classification results for pituitary tumors.
In order to verify the clinical statistical significance of the experiment, we paired the method proposed in this article with the other methods in Table 3. We use Wilcoxon signed rank test to perform statistical test on paired samples, and the specific data are shown in Table 4.
TTABLE 4 STATISTICS OF WILCOXON SIGNED RANK TEST BASED ON PAIRED SAMPLES
Feature extraction
|
Texture classification
|
Z
|
P
|
——
|
VGG
|
-2.201
|
0.028
|
——
——
|
ResNet
DenseNet
|
-2.201
-2.201
|
0.028
0.028
|
——
ResNet+ ResNet
DenseNet+DenseNet
|
CRNN
CRNN
CRNN
|
-2.201
-2.201
-2.023
|
0.028
0.028
0.043
|
DenseNet+ResNet
DenseNet+ResNet
|
RNN
CRNN
|
-2.201
——
|
0.028
——
|
It can be seen from Table 4 that the P values obtained by statistics on various models are all less than 0.05, which is statistically significant. Results have clinical significance.
In order to reflect this contrast more clearly, we have drawn a forest plot, as shown in Figure 15.
As can be seen from the forest plot, our proposed method is more effective compared with other methods.