Resizing the training and validation images is essential for several reasons. Firstly, machine learning models, particularly convolutional neural networks (CNNs), require input images of the same size for the architecture to function correctly. Resizing ensures uniform dimensions, leading to more consistent and reliable training outcomes. Additionally, smaller, uniformly sized images reduce computational load and memory usage, making the training process faster and more efficient. Using a standard size like 128x128 ensures uniformity across the dataset, which is crucial for model training. It avoids the introduction of bias that could occur if images of varying sizes were used. Besides, Smaller image dimensions reduce the computational load and memory usage during training. By resizing to 128x128, the model can process images more quickly, which is particularly important when working with large datasets or complex models.
Standardizing image sizes also prevents inconsistencies that could affect the model's learning process. For CNNs, resizing ensures the model can consistently extract relevant features across all images. Furthermore, resizing to the specific size used by pretrained models (e.g., 224x224) ensures compatibility, allowing for effective transfer learning and leveraging pretrained weights. Overall, resizing is a crucial preprocessing step that optimizes the dataset for better model performance and efficient use of resources.
Developing a network from scratch without annotations offers flexibility, deep understanding, and optimization opportunities, enabling tailored architectures for specific tasks. It fosters skills enhancement, allows for unsupervised or self-supervised learning, encourages innovative approaches, and provides full control over the data processing and training pipeline, making it ideal for research and experimentation despite requiring significant effort and expertise. Radwan et al(34) proposed an automated unsupervised learning to explore other layers of CNN but results have been hindered by a cervical vertebrae annotation tool to trace the CVM stages resulting in poor CVM 4 and CVM 5 staging.
ROS aims to balance the class distribution by generating synthetic samples for minority classes, providing a more representative dataset. This adjustment is intended to improve the model's generalization capabilities and enhance validation performance, thereby reducing the gap between training and validation accuracy and leading to more reliable classification results across all classes. When performing ROS, the classification performance on the majority class can sometimes be quite low due to several factors. Firstly, random oversampling can lead to overfitting on the minority class as the algorithm might memorize the duplicated instances rather than learning generalizable patterns, resulting in poor performance on unseen data, especially affecting the majority class. Secondly, although oversampling balances the class distribution, it doesn't create new information, and the oversampled data might not represent the underlying distribution of the classes well, causing difficulty in generalization. Additionally, oversampling can amplify noise present in the minority class; duplicating noisy instances makes the model more prone to misclassify the majority class. Furthermore, by balancing the classes, the model might shift its focus towards the minority class, leading to better performance for the minority class but potentially lowering the performance for the majority class.
Initially, the model exhibited high training accuracy of 100%, but validation accuracy was substantially lower at 57%, indicating issues with class imbalance. To address this, ROS was implemented. This technique increased the number of images for each class to balance the dataset, enhancing its representation and improving model training. As a result, the final dataset consisted of 1,420 images for training and 356 for validation. The application of ROS led to a notable improvement in the model’s performance metrics, addressing the previously observed misclassifications, particularly in classes CVMS 2 through CVMS 6. The confusion matrix and performance metrics post-ROS demonstrated a more balanced and effective classification across all classes, reducing the number of misclassifications and improving overall model robustness.
Li et al. (35) had an impressive data collection of 10,200 radiographs, which was 9 times more than any other study conducted. They utilized YOLOv3 as the core operation for detecting regions of interest. However, they struggled with an overall performance accuracy of 70%, particularly having difficulty identifying specific CVM stages like CVS2 and CVS3. It was suggested to incorporate additional factors such as intervertebral disc space and dental age. In contrast, our custom network with ROS implementation achieved superior accuracy, with an overall testing performance of 88%. However, our customized model had slight difficulty accurately identifying CVM stages 5 and 6. This difficulty arises because these stages are the majority classes in the dataset, and the use of ROS, which balances the dataset by oversampling the minority classes, might not effectively address the inherent complexity or characteristics of the majority classes. This could lead to a model that is better at identifying minority classes but less effective at distinguishing between the more frequent majority classes.
The classification metrics provide a detailed overview of the model's performance across different classes. For CVMS 1, the model achieved perfect accuracy and recall, indicating flawless identification of this class with very few false positives or false negatives. CVMS 2 also performed exceptionally well, with a high precision of 95.8% and a recall of 98.6%, suggesting strong predictive capability and reliable classification. The CVMS 3 demonstrated balanced performance with a precision and recall of 96.0%, reflecting an effective and consistent ability to identify this class.
However, CVMS 4 showed slightly lower precision at 85.7% and recall at 87.5%, pointing to some challenges in minimizing false positives and negatives. The performance of CVMS 5 was notably weaker, with a lower precision of 77.0% and recall of 71.2%, indicating difficulties in accurately classifying this class, which may be due to complex feature differentiation. The CVMS 6 had moderate performance with precision and recall around 79.0% and 80.0%, respectively, suggesting reasonable effectiveness but room for improvement.
Misclassification significantly occurred in CVMS 6 due to the large age gap among the subjects. This age disparity led to high morphological variation within the dataset. As individuals grow, their morphological features can change considerably, especially in developmental stages covered by CVMS 6. These variations make it difficult for classification algorithms to categorize the subjects consistently and accurately. The high degree of morphological diversity introduces complexity in pattern recognition and model training, resulting in inconsistent classification results.
To mitigate these issues, alternative techniques or combinations of techniques can be considered. Methods like SMOTE (Synthetic Minority Over-sampling Technique) generate synthetic instances rather than duplicating existing ones, helping to create more diverse samples for the minority class. ADASYN (Adaptive Synthetic Sampling), a variant of SMOTE, focuses more on generating synthetic samples for minority class instances that are harder to classify. Ensemble methods, which combine the results of multiple models, can often balance the bias towards any particular class.
The overall accuracy of the model is 88.2%, reflecting a generally strong performance. The macro and weighted averages further confirm that the model is performing well across different classes, with the macro average indicating balanced performance across all classes and the weighted average accounting for class imbalances. The classification report highlights that while the model excels in several classes, attention should be given to improving the classification of CVMS 5 to enhance overall robustness.