Comparative Accuracy of Three Artificial Intelligence Algorithms for Automatic Cobb Angle Measurement in Scoliosis: An Analysis Using Disease-specific Teaching Data

doi:10.21203/rs.3.rs-4242742/v1

Download PDF

Article

Comparative Accuracy of Three Artificial Intelligence Algorithms for Automatic Cobb Angle Measurement in Scoliosis: An Analysis Using Disease-specific Teaching Data

https://doi.org/10.21203/rs.3.rs-4242742/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Spinal deformities, including adolescent idiopathic scoliosis (AIS) and adult spinal deformity (ASD), affect many patients. The measurement of the Cobb angle on coronal radiographs is essential for their diagnosis and treatment planning. To enhance the precision of Cobb angle measurements for both AIS and ASD, we developed three distinct artificial intelligence (AI) algorithms: AIS/ASD-trained AI (trained with both AIS and ASD cases); AIS-trained AI (trained solely on AIS cases); ASD-trained AI (trained solely on ASD cases). We used 1,612 whole-spine radiographs, including 1,029 AIS and 583 ASD cases with variable postures, as teaching data. We measured the major and two minor curves. To assess the accuracy, we used 285 radiographs (159 AIS and 126 ASD) as a test set and calculated the mean absolute error (MAE) and intraclass correlation coefficient (ICC) between each AI algorithm and the average of manual measurements by four spine experts. The AIS/ASD-trained AI showed the highest accuracy among the three AI algorithms. This result suggested that learning across multiple diseases rather than disease-specific training may be an efficient AI learning method. The presented AI algorithm has the potential to reduce errors in Cobb angle measurements and improve the quality of clinical practice.

Health sciences/Anatomy/Musculoskeletal system

Health sciences/Anatomy/Musculoskeletal system/Skeleton

AI algorithms

Cobb angle

adolescent idiopathic scoliosis

adult spinal deformity

Spinal deformities refer to morphological abnormalities in the coronal, sagittal, or axial position of the spine that result in the deviation from the normal spine position. Children aged 10–17 years with scoliosis of unknown cause are classified as having adolescent idiopathic scoliosis (AIS), which has an overall prevalence of 0.47–5.2% [1]. AIS can cause respiratory symptoms such as shortness of breath and diminished lung volumes, and patients with curves > 100° are at a high risk of significant lung function abnormalities [2]. Meanwhile, adult spinal deformity (ASD) is a heterogeneous spectrum of abnormalities of the spine that occurs in adult patients [3], which has a reported prevalence of 68% in adults older than 60 years [4]. These abnormalities may cause an imbalance in the structural support of the spinal column, leading to back pain, neurological symptoms, and progressive deformity [3].

The Cobb angle measurement method is the gold standard for quantifying scoliotic curves. The Cobb angle (CA) is measured from the most tilted vertebra (end vertebra) above and below the apex (most laterally placed vertebra) of the curve on the coronal plane [5] of radiographs obtained in either the anterior-posterior or posterior-anterior view. A curve with CA > 10° is defined as scoliosis [1, 3]. Although a standing radiograph is important in the evaluation of spinal deformities, it is generally recommended to estimate the flexibility of the curvature in the supine, prone, and lateral bending positions to classify scoliosis and to determine the range of fixation [6, 7]. In addition to the major curve, which is the curve with the largest CA, the minor curves, which are the curves with a smaller CA than that of the major curve, should also be evaluated to classify scoliosis and determine treatment strategies [6].

Although the measurement of CA is essential for evaluating and treating scoliosis, there are problems with the accuracy and burden of performing the measurement. The accuracy and repeatability of manual measurements largely depend on the operator’s experience and judgment [8], and evidence has shown intra-observer and inter-observer errors of 3°–5° and 5°–7°, respectively, in manual measurement [9]. Measurement errors are due to multiple factors, including incorrect selection of the end vertebrae, inaccuracy in deciding the endplates [10], and the level of experience and ability of the operator. The magnitude of the curve and severity of degeneration may also be possible causes [11].

Computer-assisted CA measurements have improved the measurement accuracy to within ± 3.3° of the true value [9, 12, 13]. These measurements are obtained by drawing lines along the endplates on a digital image, from which the CA is automatically calculated. However, these methods require the manual selection of the appropriate end vertebrae and endplates by an operator. In this context, there has been a recent marked increase in reports on artificial intelligence (AI) algorithms for fully automated measurement of CA using convolutional neural networks (CNN) [14–27]. The measurement error of the CA in these reports ranged from 1.9° to 9.9°, which is similar to or lower than that in the manual and computer-assisted manual methods. However, these AI algorithms have not yet solved all the challenges in their clinical application. Most of these reports focused only on AIS, and only few have mentioned ASD cases [22, 26]. Moreover, reports on the measurement of minor curves are limited [14, 17, 18], and only one report has verified the accuracy in the supine or lateral bending posture [27]. These problems, which include a lack of versatility for AIS and ASD, lack of measurement of minor curves, and lack of proven accuracy in various postures, remain unresolved.

In this study, we developed three AI algorithms including the AIS/ASD-trained AI, trained with both AIS and ASD cases; the AIS-trained AI, trained solely with AIS cases; and the ASD-trained AI, trained solely with ASD cases by expanding on the development of the AI algorithm that achieved high measurement accuracy for AIS [27]. We measured the major as well as the minor curves and compared the accuracies of the three AI algorithms. The validation data included both AIS and ASD cases and multiple postures, such as standing, supine, and lateral bending.

In this study, we followed the measurement method proposed by Maeda et al. [27], which automatically measures the CA on an input X-ray image of patients with AIS. To enhance the precision of CA measurements for both AIS and ASD, we developed three AI algorithms using the same learning method with three different sets of teaching data as follows: AIS/ASD-trained AI, AIS-trained AI, and ASD-trained AI. Our proposed algorithm consists of three stages. In the first stage, a region of interest (ROI) is detected from the X-ray image, which includes the whole spine with 12 thoracic and 5 lumbar vertebrae. In the second stage, the four corners of each vertebra are detected as feature points for the 17 vertebral bodies from T1 to L5 in the ROI. In the final stage, the CA of the major and minor curves are measured using the detected feature points. Figure 1 shows a schematic diagram of the CA measurement algorithm. Since all data used in this study were obtained in a form of secondary use of previously obtained clinical data, informed consent was waived by the Keio University School of Medicine Ethics Committee and was handled on an opt-out basis. All procedures performed in this study were in accordance with the ethical standards of the national research committee and all the experimental protocols were approved by the Keio University School of Medicine and the Keio University School of Medicine Ethics Committee with approved No. 20200300.

Teaching data

In this study, we used 1,612 full-length X-ray images of the whole spine of patients with AIS or ASD, who underwent surgery between 2009 and 2020, as teaching data. Each X-ray image contained the whole spine, including 12 thoracic vertebrae and 5 lumbar vertebrae, for subsequent teaching and segmentation. The inclusion criterion was a diagnosis of AIS or ASD. Patients who had other neurological disorders or congenital vertebral anomalies or had previous spine surgery were excluded. Our teaching data included 1,029 images acquired from 492 patients with AIS, with the following distributions for each posture: 466 standing, 165 supine, 181 right bending, 182 left bending, 6 traction, and 29 wearing a brace. In addition, 583 images were acquired from 295 patients with ASD, with the following distributions for each posture: 214 standing, 79 supine, 125 right bending, 125 left bending, 37 traction, and 3 wearing a brace. Images of patients with AIS and ASD were used for learning of AIS- and ASD-trained AI, respectively. The AIS/ASD-trained AI was trained using all the teaching data.

The measurement results of our proposed AI algorithm were compared with those of the manual method using ZedView (LEXI Co., Ltd., Tokyo, Japan) by four spine surgery experts who had at least four years of experience with spine deformity treatment. The evaluated curves included the proximal thoracic, main thoracic, and thoracolumbar/lumbar curves, which were classified in the order of CA magnitude as major, minor 1, and minor 2 curves.

ROI detection

The purpose of the ROI detection step was to identify the region of spinal deformity on each X-ray image. This involved the XY coordinate values of the upper left and lower right corners of the rectangle indicating the region of spinal deformity. Transfer learning was performed based on a pre-trained ResNet34 model [28]. The input size of the network was 512 × 512 × 3, and a 512 × 512 grayscale image was used as the input. Transfer learning was performed by replacing the output layer of ResNet34 with a four-channel fully connected layer. The network output was trained with four real values ranging from 0 to 1, representing the XY coordinates of the upper left and lower right corners of the ROI in the thoracolumbar region.

During learning execution, we resized each image to a size of 512 × 512 and scaled the intensity values of each image so that the maximum and minimum values ranged from 0 to 1. Subsequently, random black-and-white inversion processing and cropping on the input images were performed as data augmentation.

We used mean square error as the loss and metric functions for learning, Adam (learning factor, 1.25e-4) as the optimizer, and ExponentialLR (decay rate, 0.96) as the scheduler. The number of learning epochs was set to 30. Our proposed method incorporates ROI identification of the thoracolumbar region into the AI algorithm for practical use in clinical practice.

Vertebra detection

The purpose of the vertebra detection stage was to detect the four corner points of each vertebral body within the ROI. Initially, more than 17 candidate points were detected in the four corners of the vertebral body for each region (upper left, upper right, lower left, and lower right). We then grouped each feature point by determining the vertebral body to which it belonged, and considered the top 17 groups with the highest scores as the feature points of the 17 vertebral bodies.

The output of the network was used to capture the centre position of the vertebral body from each point, and the vertebral body to which each feature point belonged was calculated by grouping feature points close to the estimated centre position for the feature points in different regions. Because the relative vectors from each feature point to the centre point of the vertebral body to which it belongs and to the centre point of a different vertebral body vary considerably, the detected point would be unlikely recognized as a point of a different vertebral body.

In addition, because the points in the same region of different vertebrae were located at some distance from each other and the possibility of confusing points in different regions was low, we avoided the duplication of reference points by simultaneously detecting reference points in each region.

This method could directly detect the positions of the reference points. In contrast, SpineNet [29] has the disadvantage of detecting a point that is clearly not a vertebra if it cannot detect the centre point of a vertebra. The proposed method directly detected the reference points for each region; therefore, there is little possibility of detecting a point that was clearly not a vertebra. However, the disadvantage of the proposed method is that some points may fail to be detected and become missing, requiring postprocessing of the output of the network.

Learning

Network architecture

For feature extraction, we used Conv1–Conv5 of the pretrained model of ResNet34 [28] as the base model. The input size was 1,024 × 512 × 3, and a 1,024 × 512 grayscale image was used as the input. For each input, we used a heat map (four channels) for each feature point at the four corners of the vertebrae to identify the location of the feature point. For one input, we simultaneously output three types of features: heat map (four channels), centre offset (two channels), and vertebral centre offset (four channels) for each of the four vertebral corners.

The loss and metric functions were defined as the sum of the loss functions of the feature point heat map, centre offset, and vertebral centre offset. For the loss function of the feature point heat map, we followed the variant of focal loss [30] described in the SpineNet method [29]. We used L1 loss as the loss function for the centre offset and vertebral centre offset. Adam (learning factor, 1.25e-4) was used as the optimizer, and ExponentialLR (decay rate, 0.96) was used as the scheduler. The number of learning epochs was set to 50.

Feature point heat map

For each of the four corners of the vertebrae (upper left, upper right, lower left, and lower right), we prepared images with non-zero values only around the positions of the 17 feature points corresponding to the 17 vertebrae. They were defined using a Gaussian disk centred on the correct positions of the feature points. The parameters and calculation method of the Gaussian disk are the same as those of SpineNet [29].

Centre offset and vertebral centre offset

The centre offset was used to compensate for the effects of the low resolution of the output image for computational cost reduction and learning stability. It was defined as a vector field that represents the gap between the actual correct position and the position when the image was reduced to a lower resolution. The vertebral centre offset was used to estimate the centre position of the vertebral body from the feature points of the four corners of the vertebral body and to group the feature points. It was defined as a vector that points to the relative position of the centre of the vertebra from each feature point.

CA measurement

For each vertebra, the inclination was calculated from the points at the four corners, and the vertebrae with the maximum and minimum inclination values were identified. Among the adjacent vertebrae with maxima and minima, those with tilt differences <10° were removed, and T1 and L5 were added to the list of vertebrae with maxima and minima. From the top, vertebral pairs with adjacent maxima and minima were taken out and considered as curves, and the difference in inclination between vertebrae was used as the CA value for each curve. The thoracic curve maximum 2 and the lumbar curve maximum 1 were assigned to each curve in order of the highest value. Examples of the AI measurements are shown in Figure 2.

Verification and analysis

Validation was performed using simple X-ray images of 248 cases (155 AIS and 93 ASD) to evaluate inter-observer reliability. These images were completely new and collected separately from the teaching data for validation purposes. We compared the average CA measured by the four spine experts with that measured by each AI algorithm. We calculated the average values of the measured CA, the mean absolute error (MAE), mean absolute percentage error (MAPE), and the interclass correlation coefficient (ICC) (2,1). The ICC (2,1) according to the two-way mixed model was used to analyse reliability, with ICCs <0.70, 0.70–0.79, 0.80–0.89, and 0.90–0.99 considered poor, fair, good, and excellent, respectively [31]. The 95% confidence intervals (CIs) were also calculated. We evaluated all cases as well as the AIS- and ASD-only groups. In addition, we evaluated the group-by-angle order, posture (standing, supine, and lateral bending), angle magnitude (10°–30°, 30°–50°, and >50°), and curve location (proximal thoracic spine, lower thoracic spine, and lumbar spine).

Average value of the measured CA

The average values of the measured CA are presented in Table 1. The average values of the CA were 35.4° ± 16.4° for the manual measurement by four spine experts, 35.2° ± 16.8° for the AIS/ASD-trained AI, 33.0° ± 16.2° for the AIS-trained AI, and 37.0° ± 17.1° for the ASD-trained AI. The maximum and minimum values of the curve were 92.9° and 10.1°, respectively. Focusing only on the major curves, the average values of the CA were 47.0° ± 16.0° for the manual measurement, 47.4° ± 16.2° for the AIS/ASD-trained AI, 44.3° ± 15.7° for the AIS-trained AI, and 49.5° ± 16.5° for ASD-trained AI.

Error and accuracy

The scatterplots of the measurements by the three AI algorithms versus the average values of the measurements by the four spine experts are shown in Figure 3a-c. The coefficients of determination R² values were 0.950 for the AIS/ASD-trained AI, 0.884 for the AIS-trained AI, and 0.947 for the ASD-trained AI.

The MAE and MAPE between the measurements by the four spine experts and the three AI algorithms are shown in Tables 2 and 3, respectively. The MAE was 2.8° ± 2.5° for AIS/ASD-trained AI, 4.2° ± 4.5° for the AIS-trained AI, and 3.2° ± 3.1° for the ASD-trained AI for all cases. For each disease group, the MAE was £3.3° for the AIS/ASD-trained AI and £3.6° for ASD-trained AI. In the AIS-trained AI, the MAE was 5.9° in the ASD cases, which was slightly larger than that in the other AI algorithms. For the major, minor 1, and minor 2 groups, the MAE was the smallest for the AIS/ASD-trained AI among the three AI algorithms. The smallest MAE for the major curve was 3.2° among the three groups. Validation was also performed for each group according to posture, angle magnitude, and curve location, and the MAE was the smallest in the AIS/ASD-trained AI for all groups, except for the proximal thoracic curve. The MAE for the AIS/ASD-trained AI was <3.5° in all groups, and it was similar for the AIS/ASD- and ASD-trained AIs in the proximal thoracic curve group. The MAE was slightly higher in the major curve group and the >50°-angle group, but the MAPE tended to be rather low in these groups.

The ICCs (2,1) between the average values of the measurements by the four spine experts and the respective AI algorithms are shown in Tables 4 and 5. Table 4 shows the ICCs for all cases and disease-specific groups, and Table 5 shows the ICCs for each subgroup. For the AIS/ASD-trained AI, the ICC (2,1) for all cases was 0.974 (95% CI, 0.977–0.971). The ICC (2,1) was >0.9 for all groups (maximum, 0.980; minimum, 0.908) stratified by disease, angle order, posture, and curve location. For the AIS-trained AI, the ICC (2,1) for all cases was 0.929 (95% CI, 0.950–0.894); for AIS cases only, the ICC (2,1) was 0.950 (95% CI, 0.967–0.920); for ASD cases only, the ICC (2,1) was 0.901 (95% CI, 0.934–0.841), and the ICC in each group ranged from 0.849 to 0.964. For the ASD-trained AI, the overall ICC (2,1) was 0.965 (95% CI, 0.974–0.952); for AIS cases only, the ICC (2,1) was 0.960 (95% CI, 0.971–0.942); for ASD cases only, the ICC (2,1) was 0.964 (95% CI, 0.974–0.950), and the ICC in each group ranged from 0.902 to 0.979.

Teaching data and accuracy of presented AI algorithm

In this study, we developed a preprocessing method and CNN-based deep-learning architecture for spine segmentation and vertebral detection that could be used for AIS and ASD. The presented AI algorithms were trained using the same method for developing an AI algorithm that has already achieved high measurement accuracy in AIS cases, as reported by Maeda et al. [27], with additional teaching data. Of the three AI algorithms, the AIS/ASD-trained AI showed the highest accuracy. Several reasons could be given for the better accuracy of the AI algorithm trained with teaching data that included both diseases rather than disease-specific trained AI algorithms. First, the variety of teaching data may have increased the versatility of AI algorithms. Noguchi et al. reported that training with additional public data to an AI algorithm trained with domestic data improved the accuracy of developing an AI algorithm for bone segmentation using whole-body computed tomography [32]. The addition of training with ASD cases to AIS cases enhanced the variety of teaching data. Second, the amount of teaching data positively affected the accuracy of AI algorithms. Thian et al. reported that when developing an AI algorithm for classifying pneumothorax, the accuracy of the AI algorithm increased as the number of teaching data increased [33]. The AIS/ASD-trained AI used the largest number of teaching data. It is difficult to identify all the factors that contribute to the improvement in the accuracy of AI algorithms; however, the above two points can be considered reasonable explanations. The fact that the AIS/ASD-trained AI showed the highest accuracy for both AIS and ASD cases suggests that learning across multiple diseases rather than disease-specific training may improve the measurement accuracy of the AI algorithm. Reports on the learning process of AI in medical image recognition are rare, and this study may provide hints for the efficient learning of AI algorithms.

Challenges of the present study

Several automatic CA measurement algorithms that use deep learning have been previously reported. Caesarendra et al. reported an automatic CA measurement algorithm using ResNet, with an MAE < 5° and an ICC (2,1) of 0.995 [16]. Liu et al. also developed an automatic CA measurement algorithm for AIS cases using DeepLab V3 + and reported an MAE of 1.9°–3.1° compared with the ground truth [15]. Although the algorithms in these reports achieved a high measurement accuracy, they only measured the major curves in patients with AIS. Sun et al. developed an algorithm to automatically measure the CA of the major curves and the upper and lower curves in AIS cases using CenterNet1 with an MAE of 2.2° [14]. Yao et al. and Huang et al. developed an algorithm for AIS that could measure the CA of three curves with MAEs of 3.6° and 2.9°, respectively [17, 18]. Although these measurements were more clinical in terms of measuring multiple curves, their accuracy had not been validated using images of postures other than standing. Suri et al. developed an algorithm for the automatic measurement of the CA for a wide range of ages and reported an MAE < 2° and an ICC (2,1) of 0.96 [26]. However, the algorithm was limited to AIS, its measurement was limited to major curves, and its accuracy was not validated for images of postures other than standing. Maeda et al. developed an AI algorithm that can automatically measure CA in AIS cases [27]. The AI algorithm can capture images in the standing, supine, and lateral bending postures. In addition, the algorithm could measure the major angle and two minor curves. The accuracy of the AI algorithm was high, with an MAE of 1.7°–2.2° and an ICC value of 0.973, compared with the measurements of six spine surgeons, albeit the targeted disease was limited to AIS.

Compared with previous reports on automatic CA measurement algorithms, the presented AI algorithm has the following advantages. First, the presented AI algorithm can automatically and accurately measure the CA in both AIS and ASD cases. Although there have been some previous reports on CA measurement in diseases other than AIS [25, 26], reports validating its accuracy in ASD-only disease groups are lacking, and the presented AI algorithm is the first such report. Second, the presented AI algorithm can measure both major and minor curves in both AIS and ASD cases. It can also measure CA with high accuracy in postures other than standing, such as supine and lateral bending. There are algorithms for measuring curves other than the main curve [14, 17, 18], but all of them reported only on AIS cases and there was no mention of supine or lateral bending postures. The measurement of CA in the minor curves and in the supine or lateral bending position is necessary to classify the cases based on classifications, such as the Lenke classification, and to determine the range of fixation in surgery. This suggests that the presented AI algorithm may be useful not only for diagnosing but also for determining a treatment strategy. Third, the presented AI algorithm uses validation data separate from the teaching data. This could be the basis for the high versatility of AI algorithms.

Learning methods of presented AI algorithm

In this study, we used the same learning algorithm to train three AI algorithms using three datasets with different characteristics as the teaching data. The method used for vertebral detection was to learn the four corner points of each vertebra and a vector pointing to the centre of the vertebra. We used the data augmentation method used in previous studies to improve the accuracy of the AI algorithm [34–36]. In our learning process, we generated multiple variations from a single image by adding manipulations to the input image, such as black-and-white inversion, left-right inversion, and micro-noise addition. This process makes learning less sensitive to noise, contrast, and posture. Because deep-learning models are affected by subtle differences in image characteristics, the variety of images derived from diverse disease groups may also have contributed to the versatility of the presented AI algorithm.

Limitations

This study has several limitations. First, the AI algorithm is not yet fully versatile for measuring CA, such as in operated patients with spinal instrumentation and early-onset scoliosis. In future studies, it will be necessary to expand the versatility of this algorithm by adding teaching data for a wider range of diseases. Second, this study used an AI algorithm to measure the coronal aspect of spinal disorders; however, the development of automatic measurement of sagittal parameters using lateral images should be promoted to achieve the objectives of determining surgical procedures and reducing the workload of spine surgeons. Finally, to examine more efficient development methods for AI training, another study should be conducted to compare learning conditions for different diseases.

The presented AI algorithms showed a high correlation with manual measurements. The accuracy was sufficiently high, regardless of the angle order, posture, angle magnitude, or curve location. The highest accuracy was achieved by the AIS/ASD-trained AI, which was trained on a combination of multiple disease groups. This suggests that learning across multiple diseases rather than disease-specific training may improve the measurement accuracy of AI algorithms. The presented AI algorithm could contribute to reducing measurement errors in the CA measurements of AIS and ASD and improve the quality of clinical practice.

Acknowledgements

We thank Satoshi Suzuki, Takahito Iga, Toshihiro Hirose and Toshiyuki Ogata for their cooperation in the manual measurement and Shin Horikawa (JSR Corporation, Japan) for technical support. This study was supported by the Grant of Japan Orthopaedics and Traumatology Research Foundation Grant No. 524 and AO Spine National Research Grant No. AOSRG2023025.

Author Contributions

S.K. wrote the main manuscript and prepared Figures 1 and 2. Y.M. contributed to the patient recruitment and angle measurements. T.N. and M.N. supervised the development of the AI algorithms. K.W. supervised the study and is the corresponding author. All the authors reviewed the manuscript.

Data Availability Statement

The datasets used and/or analysed in the current study are available from the corresponding author upon reasonable request.

Additional Information

Competing Interests

This research was part of a joint research project operated by Keio University and the JSR Corporation. However, the JSR Corporation was not involved in participant recruitment, data acquisition, analysis, or drafting. The authors have no competing interests to declare.

Konieczny, M. R., Senyurt, H. & Krauspe, R. Epidemiology of adolescent idiopathic scoliosis. J. Child Orthop. 7, 3-9 (2013).
Dunn, J. et al. Screening for adolescent idiopathic scoliosis: Evidence report and systematic review for the US Preventive Services Task Force. JAMA 319, 173-187 (2018).
Kim, H. J. et al. Adult spinal deformity: Current concepts and decision-making strategies for management. Asian Spine J. 14, 886-897 (2020).
Schwab, F. et al. Adult scoliosis: Prevalence, SF-36, and nutritional parameters in an elderly volunteer population. Spine 30, 1082-1085 (2005).
Cobb, J. Outline for the study of scoliosis. Instr. Course Lect. (AAOS, 1948) 5, 261-275.
Lenke, L. G. et al. Adolescent idiopathic scoliosis: A new classification to determine extent of spinal arthrodesis. J. Bone Joint Surg. Am. 83, 1169-1181 (2001).
Diebo, B. G. et al. Adult spinal deformity. Lancet 394, 160-172 (2019).
Weinstein, S. L., Dolan, L. A., Cheng, J. C. Y., Danielsson, A. & Morcuende, J. A. Adolescent idiopathic scoliosis. Lancet 371, 1527-1537 (2008).
Vrtovec, T., Pernus, F. & Likar, B. A review of methods for quantitative evaluation of spinal curvature. Eur. Spine J. 18, 593-607 (2009).
Morrissy, R. T., Goldsmith, G. S., Hall, E. C., Kehl, D. & Cowie, G. H. Measurement of the Cobb angle on radiographs of patients who have scoliosis. Evaluation of intrinsic error. J. Bone Joint Surg. Am. 72, 320-327 (1990).
Mok, J. M. et al. Comparison of observer variation in conventional and three digital radiographic methods used in the evaluation of patients with adolescent idiopathic scoliosis. Spine 33, 681-686 (2008).
Shea, K. G. et al. A comparison of manual versus computer-assisted radiographic measurement. Intraobserver measurement variability for Cobb angles. Spine 23, 551-555 (1998).
Chockalingam, N., Dangerfield, P. H., Giakas, G., Cochrane, T. & Dorgan, J. C. Computer-assisted Cobb measurement of scoliosis. Eur. Spine J. 11, 353-357 (2002).
Sun, Y. et al. Comparison of manual versus automated measurement of Cobb angle in idiopathic scoliosis based on a deep learning keypoint detection technology. Eur. Spine J. 31, 1969-1978 (2022).
Liu, J. et al. The measurement of Cobb angle based on spine X-ray images using multi-scale convolutional neural network. Phys. Eng. Sci. Med. 44, 809-821 (2021).
Caesarendra, W., Rahmaniar, W., Mathew, J. & Thien, A. Automated Cobb angle measurement for adolescent idiopathic scoliosis using convolutional neural network. Diagnostics (Basel) 12, 396 (2022).
Yao, Y. et al. W-transformer: accurate Cobb angles estimation by using a transformer-based hybrid structure. Med. Phys. 49, 3246-3262 (2022).
Huang, X. et al. The comparison of convolutional neural networks and the manual measurement of cobb angle in adolescent idiopathic scoliosis. Global Spine J. 14, 159-168 (2024).
Zhao, Y. et al. Automatic Cobb angle measurement method based on vertebra segmentation by deep learning. Med. Biol. Eng. Comput. 60, 2257-2269 (2022).
Zerouali, M. et al. Automatic deep learning-based assessment of spinopelvic coronal and sagittal alignment. Diagn. Interv. Imaging 104, 343-350 (2023).
Wu, Y. et al. Automated adolescence scoliosis detection using augmented U-net with non-square kernels. Can. Assoc. Radiol. J. 74, 667-675 (2023).
Galbusera, F. et al. Fully automated radiological analysis of spinal disorders and deformities: A deep learning approach. Eur. Spine J. 28, 951-960 (2019).
Horng, M. H., Kuok, C. P., Fu, M. J., Lin, C. J. & Sun, Y. N. Cobb angle measurement of spine from X-ray images using convolutional neural network. Comput. Math. Methods Med. 2019, 6357171 (2019).
Zhang, K., Xu, N., Guo, C. & Wu, J. MPF-net: An effective framework for automated cobb angle estimation. Med. Image Anal. 75, 102277 (2022).
Ha, A. Y. et al. Automating scoliosis measurements in radiographic studies with machine learning: Comparing artificial intelligence and clinical reports. J. Digit. Imaging 35, 524-533 (2022).
Suri, A. et al. Conquering the Cobb angle: A deep learning algorithm for automated, hardware-invariant measurement of cobb angle on radiographs in patients with scoliosis. Radiol. Artif. Intell. 5, e220158 (2023).
Maeda, Y., Nagura, T., Nakamura, M. & Watanabe, K. Automatic measurement of the Cobb angle for adolescent idiopathic scoliosis using convolutional neural network. Sci. Rep. 13, 14576 (2023).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA 770-778 (2016).
Yi, J., Wu, P., Huang, Q., Qu, H. & Metaxas, D. N. 17th IEEE International Symposium on Biomedical Imaging (ISBI) 736-740 (2020).
Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. Proc. IEEE Int. Conf. Comput. Vis. 42, 318-327 (2020).
Zhang, J. et al. A computer-aided Cobb angle measurement method and its reliability. J. Spinal Disord. Tech. 23, 383-387 (2010).
Noguchi, S., Nishio, M., Yakami, M., Nakagomi, K. & Togashi, K. Bone segmentation on whole-body CT using convolutional neural network with novel data augmentation techniques. Comput. Biol. Med. 121, 103767 (2020).
Thian, Y. L. et al. Effect of training data volume on performance of convolutional neural network pneumothorax classifiers. J. Digit. Imaging 35, 881-892 (2022).
Taylor, D. Using a multi-head, convolutional neural network with data augmentation to improve electropherogram classification performance. Forensic Sci. Int. Genet. 56, 102605 (2022).
Miki, Y. et al. Classification of teeth in cone-beam CT using deep convolutional neural network. Comput. Biol. Med. 80, 24-29 (2017).
Wang, Y. & Cao, Y. Human peripheral blood leukocyte classification method based on convolutional neural network and data augmentation. Med. Phys.47, 142-151 (2020).

Table 1. Average CA measured by each of the three AI algorithms and by four spine experts

	Average CA measurement (°)
	Four spine experts	AIS/ASD-trained AI	AIS-trained AI	ASD-trained AI
All curves in all cases	35.4 ± 16.4	35.2 ± 16.8	33.0 ± 16.2	37.0 ± 17.1
All curves in AIS cases	32.1 ± 13.3	31.4 ± 13.7	30.4 ± 13.7	33.6 ± 14.2
All curves in ASD cases	41.7 ± 19.6	42.5 ± 19.6	38.0 ± 19.1	43.5 ± 20.0
Major curve in all cases	47.0 ± 16.0	47.4 ± 16.2	44.3 ± 15.7	49.5 ± 16.5
Minor 1 curve in all cases	31.6 ± 12.6	31.3 ± 12.7	29.7 ± 12.6	33.3 ± 12.8
Minor 2 curve in all cases	23.3 ± 8.2	22.3 ± 8.2	20.5 ± 7.9	24.0 ± 9.0

Values are presented as average ± standard deviation.

CA, Cobb angle; AI, artificial intelligence; AIS, adolescent idiopathic scoliosis; ASD, adult spinal deformity.

Table 2. MAE between the measurements of each of the three AI algorithms and the average of the measurements of four spine experts

	MAE (°)
	AIS/ASD-trained AI	AIS-trained AI	ASD-trained AI
All cases	2.8 ± 2.5	4.2 ± 4.5	3.2 ± 3.1
Group by disease
AIS cases	2.6 ± 2.2	3.3 ± 2.8	2.9 ± 2.6
ASD cases	3.3 ± 2.9	5.9 ± 6.4	3.6 ± 3.8
Group by angle order
Major curve	3.2 ± 2.7	4.8 ± 5.5	3.6 ± 3.8
Minor 1 curve	2.6 ± 2.3	4.0 ± 4.3	2.9 ± 2.5
Minor 2 curve	2.6 ± 2.3	3.5 ± 2.8	2.8 ± 2.6
Group by posture
Standing group	2.7 ± 2.5	5.0 ± 6.1	2.9 ± 2.7
Supine group	2.3 ± 1.9	3.0 ± 2.3	2.4 ± 1.7
Lateral bending group	3.3 ± 2.8	4.4 ± 3.9	4.0 ± 4.1
Group by angle magnitude
Group of angle 10°–30°	2.5 ± 2.1	3.2 ± 2.6	2.7 ± 2.3
Group of angle 30°–50°	2.9 ± 2.7	4.5 ± 5.3	3.4 ± 3.2
Group of angle >50°	3.4 ± 2.7	5.8 ± 5.9	3.8 ± 4.2
Group by curve location
Proximal thoracic curve	2.7 ± 2.9	4.0 ± 3.3	2.7 ± 2.3
Thoracic curve	2.9 ± 2.6	4.4 ± 4.7	3.3 ± 3.5
Lumbar curve	2.9 ± 2.4	4.1 ± 5.3	3.5 ± 3.2

Values are presented as mean ± standard deviation.

MAE, mean absolute error; AIS, adolescent idiopathic scoliosis; ASD, adult spinal deformity.

Table 3. MAPE between the measurements of each of the three AI algorithms and the average of the measurements of four spine experts

	MAPE (%)
	AIS/ASD-trained AI	AIS-trained AI	ASD-trained AI
All cases	9.5	13.2	10.6
Group by disease
AIS cases	9.3	11.7	10.5
ASD cases	9.8	16.1	10.7
Group by angle order
Major curve	7.4	10.6	8.4
Minor 1 curve	9.8	13.8	10.5
Minor 2 curve	12.0	16.3	13.7
Group by posture
Standing group	8.9	14.2	9.4
Supine group	8.2	10.5	8.2
Lateral bending group	10.6	14.1	12.4
Group by angle magnitude
Group of angle 10°–30°	12.5	16.2	13.9
Group of angle 30°–50°	7.7	11.5	8.7
Group of angle >50°	5.5	9.2	6.0
Group by curve location
Proximal thoracic curve	11.3	16.7	11.5
Thoracic curve	8.0	11.5	9.1
Lumbar curve	9.4	12.1	11.2

MAPE, mean absolute percentage error; AIS, adolescent idiopathic scoliosis; ASD, adult spinal deformity.

Table 4. ICC (2,1) between the measurements of each of the three AI algorithms and the average of the measurements of four spine experts

	AIS/ASD-trained AI		AIS-trained AI		ASD-trained AI
	ICC (2,1)	95% CI	ICC (2,1)	95% CI	ICC (2,1)	95% CI
All cases	0.974	0.977–0.971	0.929	0.950–0.894	0.965	0.974–0.952
AIS cases	0.969	0.974–0.962	0.950	0.967–0.920	0.960	0.971–0.942
ASD cases	0.975	0.980–0.969	0.901	0.934–0.841	0.964	0.974–0.950

ICC, interclass correlation coefficient; AIS, adolescent idiopathic scoliosis; ASD, adult spinal deformity; CI, confidence interval.

Table 5. ICC (2,1) between the measurements of each of the three AI algorithms and the average of the measurements of four spine experts in subgroups

	AIS/ASD-trained AI		AIS-trained AI		ASD-trained AI
	ICC (2,1)	95% CI	ICC (2,1)	95% CI	ICC (2,1)	95% CI
Group by angle order
Major curve	0.967	0.973–0.959	0.896	0.927–0.846	0.948	0.965–0.916
Minor 1 curve	0.962	0.969–0.953	0.893	0.924–0.845	0.955	0.967–0.937
Minor 2 curve	0.914	0.936–0.882	0.849	0.910–0.706	0.902	0.926–0.870
Group by posture
Standing group	0.980	0.984–0.975	0.906	0.942–0.826	0.976	0.981–0.970
Supine group	0.978	0.984–0.968	0.964	0.977–0.940	0.979	0.987–0.965
Lateral bending group	0.967	0.973–0.958	0.938	0.953–0.919	0.948	0.965–0.918
Group by curve location
Proximal thoracic curve	0.908	0.931–0.876	0.851	0.911–0.710	0.929	0.944–0.911
Thoracic curve	0.975	0.980–0.970	0.930	0.950–0.896	0.963	0.976–0.939
Lumbar curve	0.975	0.980–0.969	0.918	0.941–0.884	0.960	0.974–0.931

ICC, interclass correlation coefficient; AIS, adolescent idiopathic scoliosis; ASD, adult spinal deformity; CI, confidence interval.

No competing interests reported.

supplementary.xlsx

Download PDF

Editorial decision: Revision requested
03 Jul, 2024
Reviews received at journal
15 Jun, 2024
Reviewers agreed at journal
05 Jun, 2024
Reviews received at journal
04 Jun, 2024
Reviewers agreed at journal
04 Jun, 2024
Reviewers invited by journal
04 Jun, 2024
Editor assigned by journal
28 Apr, 2024
Editor invited by journal
20 Apr, 2024
Submission checks completed at journal
20 Apr, 2024
First submitted to journal
09 Apr, 2024

You are reading this latest preprint version

Comparative Accuracy of Three Artificial Intelligence Algorithms for Automatic Cobb Angle Measurement in Scoliosis: An Analysis Using Disease-specific Teaching Data

Status:

Version 1

Abstract

Figures

Introduction

Methods

Results

Discussion

Teaching data and accuracy of presented AI algorithm

Challenges of the present study

Learning methods of presented AI algorithm

Limitations

Conclusion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1