In this study, we followed the measurement method proposed by Maeda et al. [27], which automatically measures the CA on an input X-ray image of patients with AIS. To enhance the precision of CA measurements for both AIS and ASD, we developed three AI algorithms using the same learning method with three different sets of teaching data as follows: AIS/ASD-trained AI, AIS-trained AI, and ASD-trained AI. Our proposed algorithm consists of three stages. In the first stage, a region of interest (ROI) is detected from the X-ray image, which includes the whole spine with 12 thoracic and 5 lumbar vertebrae. In the second stage, the four corners of each vertebra are detected as feature points for the 17 vertebral bodies from T1 to L5 in the ROI. In the final stage, the CA of the major and minor curves are measured using the detected feature points. Figure 1 shows a schematic diagram of the CA measurement algorithm. Since all data used in this study were obtained in a form of secondary use of previously obtained clinical data, informed consent was waived by the Keio University School of Medicine Ethics Committee and was handled on an opt-out basis. All procedures performed in this study were in accordance with the ethical standards of the national research committee and all the experimental protocols were approved by the Keio University School of Medicine and the Keio University School of Medicine Ethics Committee with approved No. 20200300.
Teaching data
In this study, we used 1,612 full-length X-ray images of the whole spine of patients with AIS or ASD, who underwent surgery between 2009 and 2020, as teaching data. Each X-ray image contained the whole spine, including 12 thoracic vertebrae and 5 lumbar vertebrae, for subsequent teaching and segmentation. The inclusion criterion was a diagnosis of AIS or ASD. Patients who had other neurological disorders or congenital vertebral anomalies or had previous spine surgery were excluded. Our teaching data included 1,029 images acquired from 492 patients with AIS, with the following distributions for each posture: 466 standing, 165 supine, 181 right bending, 182 left bending, 6 traction, and 29 wearing a brace. In addition, 583 images were acquired from 295 patients with ASD, with the following distributions for each posture: 214 standing, 79 supine, 125 right bending, 125 left bending, 37 traction, and 3 wearing a brace. Images of patients with AIS and ASD were used for learning of AIS- and ASD-trained AI, respectively. The AIS/ASD-trained AI was trained using all the teaching data.
The measurement results of our proposed AI algorithm were compared with those of the manual method using ZedView (LEXI Co., Ltd., Tokyo, Japan) by four spine surgery experts who had at least four years of experience with spine deformity treatment. The evaluated curves included the proximal thoracic, main thoracic, and thoracolumbar/lumbar curves, which were classified in the order of CA magnitude as major, minor 1, and minor 2 curves.
ROI detection
The purpose of the ROI detection step was to identify the region of spinal deformity on each X-ray image. This involved the XY coordinate values of the upper left and lower right corners of the rectangle indicating the region of spinal deformity. Transfer learning was performed based on a pre-trained ResNet34 model [28]. The input size of the network was 512 × 512 × 3, and a 512 × 512 grayscale image was used as the input. Transfer learning was performed by replacing the output layer of ResNet34 with a four-channel fully connected layer. The network output was trained with four real values ranging from 0 to 1, representing the XY coordinates of the upper left and lower right corners of the ROI in the thoracolumbar region.
During learning execution, we resized each image to a size of 512 × 512 and scaled the intensity values of each image so that the maximum and minimum values ranged from 0 to 1. Subsequently, random black-and-white inversion processing and cropping on the input images were performed as data augmentation.
We used mean square error as the loss and metric functions for learning, Adam (learning factor, 1.25e-4) as the optimizer, and ExponentialLR (decay rate, 0.96) as the scheduler. The number of learning epochs was set to 30. Our proposed method incorporates ROI identification of the thoracolumbar region into the AI algorithm for practical use in clinical practice.
Vertebra detection
The purpose of the vertebra detection stage was to detect the four corner points of each vertebral body within the ROI. Initially, more than 17 candidate points were detected in the four corners of the vertebral body for each region (upper left, upper right, lower left, and lower right). We then grouped each feature point by determining the vertebral body to which it belonged, and considered the top 17 groups with the highest scores as the feature points of the 17 vertebral bodies.
The output of the network was used to capture the centre position of the vertebral body from each point, and the vertebral body to which each feature point belonged was calculated by grouping feature points close to the estimated centre position for the feature points in different regions. Because the relative vectors from each feature point to the centre point of the vertebral body to which it belongs and to the centre point of a different vertebral body vary considerably, the detected point would be unlikely recognized as a point of a different vertebral body.
In addition, because the points in the same region of different vertebrae were located at some distance from each other and the possibility of confusing points in different regions was low, we avoided the duplication of reference points by simultaneously detecting reference points in each region.
This method could directly detect the positions of the reference points. In contrast, SpineNet [29] has the disadvantage of detecting a point that is clearly not a vertebra if it cannot detect the centre point of a vertebra. The proposed method directly detected the reference points for each region; therefore, there is little possibility of detecting a point that was clearly not a vertebra. However, the disadvantage of the proposed method is that some points may fail to be detected and become missing, requiring postprocessing of the output of the network.
Learning
Network architecture
For feature extraction, we used Conv1–Conv5 of the pretrained model of ResNet34 [28] as the base model. The input size was 1,024 × 512 × 3, and a 1,024 × 512 grayscale image was used as the input. For each input, we used a heat map (four channels) for each feature point at the four corners of the vertebrae to identify the location of the feature point. For one input, we simultaneously output three types of features: heat map (four channels), centre offset (two channels), and vertebral centre offset (four channels) for each of the four vertebral corners.
The loss and metric functions were defined as the sum of the loss functions of the feature point heat map, centre offset, and vertebral centre offset. For the loss function of the feature point heat map, we followed the variant of focal loss [30] described in the SpineNet method [29]. We used L1 loss as the loss function for the centre offset and vertebral centre offset. Adam (learning factor, 1.25e-4) was used as the optimizer, and ExponentialLR (decay rate, 0.96) was used as the scheduler. The number of learning epochs was set to 50.
Feature point heat map
For each of the four corners of the vertebrae (upper left, upper right, lower left, and lower right), we prepared images with non-zero values only around the positions of the 17 feature points corresponding to the 17 vertebrae. They were defined using a Gaussian disk centred on the correct positions of the feature points. The parameters and calculation method of the Gaussian disk are the same as those of SpineNet [29].
Centre offset and vertebral centre offset
The centre offset was used to compensate for the effects of the low resolution of the output image for computational cost reduction and learning stability. It was defined as a vector field that represents the gap between the actual correct position and the position when the image was reduced to a lower resolution. The vertebral centre offset was used to estimate the centre position of the vertebral body from the feature points of the four corners of the vertebral body and to group the feature points. It was defined as a vector that points to the relative position of the centre of the vertebra from each feature point.
CA measurement
For each vertebra, the inclination was calculated from the points at the four corners, and the vertebrae with the maximum and minimum inclination values were identified. Among the adjacent vertebrae with maxima and minima, those with tilt differences <10° were removed, and T1 and L5 were added to the list of vertebrae with maxima and minima. From the top, vertebral pairs with adjacent maxima and minima were taken out and considered as curves, and the difference in inclination between vertebrae was used as the CA value for each curve. The thoracic curve maximum 2 and the lumbar curve maximum 1 were assigned to each curve in order of the highest value. Examples of the AI measurements are shown in Figure 2.
Verification and analysis
Validation was performed using simple X-ray images of 248 cases (155 AIS and 93 ASD) to evaluate inter-observer reliability. These images were completely new and collected separately from the teaching data for validation purposes. We compared the average CA measured by the four spine experts with that measured by each AI algorithm. We calculated the average values of the measured CA, the mean absolute error (MAE), mean absolute percentage error (MAPE), and the interclass correlation coefficient (ICC) (2,1). The ICC (2,1) according to the two-way mixed model was used to analyse reliability, with ICCs <0.70, 0.70–0.79, 0.80–0.89, and 0.90–0.99 considered poor, fair, good, and excellent, respectively [31]. The 95% confidence intervals (CIs) were also calculated. We evaluated all cases as well as the AIS- and ASD-only groups. In addition, we evaluated the group-by-angle order, posture (standing, supine, and lateral bending), angle magnitude (10°–30°, 30°–50°, and >50°), and curve location (proximal thoracic spine, lower thoracic spine, and lumbar spine).