Development and performance verification of AI-based software for quantitative diagnosis of human vertebral fractures

doi:10.21203/rs.3.rs-4001485/v1

Download PDF

Article

Development and performance verification of AI-based software for quantitative diagnosis of human vertebral fractures

https://doi.org/10.21203/rs.3.rs-4001485/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Quantitative morphometry (QM) is crucial to accurately diagnose and perform follow-up of vertebral fractures. Although the semiquantitative technique by Genant is prevalent, its accuracy and reproducibility are low. This study combines an AI model that identifies the position of each vertebral body from thoracic and lumbar lateral X-ray images with another AI model that determines vertebral body height ratios required for QM to develop a software for automatic evaluations. The learning data set comprised 3,082 vertebrae annotated by an orthopedic specialist. Thereafter, the correlation and concordance were evaluated between the specialist in the validation set and external evaluators in the test set. The software required approximately 6 s to read one image. For the validation set, Spearman’s correlation coefficient (r_s) was 0.605, 0.721, and 0.798 for C/A, C/P, and A/P, respectively. Bland–Altman analysis indicated that the percentage within the limits of agreement (LOA) was 96.8%, 95.7%, and 94.9%, respectively, which decreased as the vertebral body compression increased. For the test set, r_s were between 0.519–0.589, 0.558–0.647, and 0.735–0.770, respectively, and the correlations between the external evaluators were similar. Additionally, LOAs were almost above 95%. The proposed software is expected to improve the diagnosis of vertebral fractures and osteoporosis, enabling appropriate treatment.

Biological sciences/Computational biology and bioinformatics/Machine learning

Biological sciences/Computational biology and bioinformatics/High throughput screening

Health sciences/Diseases/Endocrine system and metabolic diseases/Metabolic bone disease/Osteoporosis

Health sciences/Medical research/Translational research

Health sciences/Health care/Medical imaging/Bone imaging

Health sciences/Health care/Medical imaging/Radiography

Health sciences/Health occupations/Orthopaedics

Osteoporosis

radiology

quantitation of bone

fracture prevention

fracture risk assessment

Vertebral fractures mainly occur in the elderly and patients with osteoporosis, seriously impairing their health, resulting in increased pain, limited activity, a chain of fractures, and an increased mortality rate (five-year survival rate: <30%).[1] Additionally, the worldwide prevalence of osteoporosis is 19.7%,[2] and the risk of vertebral fractures increases owing to lifestyle-related diseases such as diabetes, chronic kidney disease, chronic obstructive pulmonary disease, and dyslipidemia, in addition to steroid administration.[3] Furthermore, two-thirds of vertebral body fractures are asymptomatic,[4, 5] with the frequency and importance of diagnosing these fractures expected to steadily increase in the future.

Currently, there is no gold standard for radiological diagnosis of vertebral fractures; instead, comprehensive diagnoses are performed by combining examination findings and other technologies such as computed tomography (CT) and magnetic resonance imaging (MRI). However, quantitative morphometries (QM) of these fractures are necessary for evaluating the degree of crushing, observing changes over time, sharing information among medical professionals, and diagnosing minor fractures. QM has been used since the 1960s[6–9] and includes the following steps.[10–13] In the lateral image of the vertebral body, six locations (measurement points) are annotated at the upper and lower edges of the anterior, middle, and posterior of each vertebral body. The anterior edge height (A), central height (C), and posterior edge height (P) are measured, and C/A, C/P, and A/P are calculated as the vertebral body height ratios. Although this method allows for objective and precise evaluations and is useful for follow-ups, it is not widely applied in clinical practice because the process is complicated and time-consuming. Therefore, in 1993, Genant et al. proposed a semiquantitative (SQ) method,[14] wherein the degrees of reduction in the vertebral body height and cross-sectional area are visually categorized into four levels without measurements. Owing to its simplicity, this method is widely used in clinical practice and research. However, it has limitations such as the subjectivity of evaluators,[15] overlooking minor fractures, standardization difficulties, overestimation,[16] not suitable for follow-ups, and requiring training for stable evaluations.[17]

In recent years, AI-based vertebral fracture evaluation has been attracting attention and existing software can be used to qualitatively evaluate the presence or absence of vertebral fractures in the lumbar spine using lateral X-ray images.[18] Additionally, a quantitative evaluation system that automatically detects the height of vertebral bodies and intervertebral discs using MRI images has been reported.[19] Recently, Suri and colleagues developed a system using deep learning methods to automatically measure the vertebral body heights from radiography, CT, and MRI images, but the analysis targets the thoracic vertebrae from T10 to L5, and does not include the mid-thoracic vertebrae above T10.[20] Vertebral fractures also occur in the mid-thoracic vertebrae,[21] and are particularly common in this location in respiratory diseases such as COPD[22, 23] and recently in COVID-19 infections[24]. Therefore, a system that allows for the simple, more comprehensive evaluation of vertebrae that may be clinically fractured is desirable. To address this need, we developed software to quantitatively evaluate vertebral body fractures using thoracic and lumber X-ray images through QM. We built a two-stage deep learning-based AI algorithm that automatically recognizes vertebral bodies and measurement points and calculates the vertebral body height ratios of C/A, C/P, and A/P. Finally, the performance of this software was verified. We hope that the proposed software will be widely adopted in clinical settings.

Data collection

We collected data from patients who visited The Jikei University Hospital from January 2018 to October 2020, whose front and lateral X-ray images of the thoracic and lumbar spine were included, resulting in 709 images (354 images of the thoracic spine and 355 images of lumbar spine) for a total of 355 cases. We excluded patients who had undergone spinal fusion or had scoliosis with a Cobb angle of ≥ 15°. One orthopedic specialist (training data creator) annotated measurement points in each of the 4,064 vertebral bodies in the 503 images that were legible from the fourth thoracic to the fifth lumbar vertebra. The points were annotated according to the method proposed by Genant et al.,[14] and if the left and right vertebral body edges were misaligned, the point was placed at the center. The SQ grade (SQG0–SQG3) for each vertebral body was also determined.[14] Furthermore, 3,082 vertebral bodies were randomly extracted and used as training data, and the remaining 982 were used for validation. Next, from the remaining 206 images out of 709 images, 1,753 vertebral bodies that could be observed by all external evaluators (two spine surgeons and one radiologist) were extracted and the measurement points were annotated by all external evaluators and the proposed software. The 1,753 vertebral bodies were included in the test set (Fig. 1). The study was conducted according to the Declaration of Helsinki and approved by the Ethical Committee for Clinical Research at The Jikei University School of Medicine (31–0078(9577)). Each patient provided written informed consent before participating in this study.

Software development

To automatically determine the measurement points, we developed software equipped with two AI models, mask region convolutional neural network (Mask R-CNN) and EfficientNet-based transfer learning + head section, that identify the position of each vertebral body from lateral X-ray images of the thoracic and lumbar spine and determine the measurement points for the vertebral bodies based on the position information, respectively (Fig. 2).

Mask R-CNN is a convolutional neural network (CNN) that provides a framework for instance segmentation to efficiently detect target objects in images while simultaneously generating segmentation masks for each instance with high accuracy.[25] In the proposed software, Mask R-CNN was used to generate segmentation images of each vertebral body in the lateral X-ray images of the thoracic and lumbar spine, and the center position of each vertebral body was detected by determining the center of gravity coordinates. Subsequently, we post-processed the detected vertebral body centers to exclude those located at positions that deviated significantly from the anatomical morphology of the spinal column as false detections to improve the position detection performance. Thus, the center obtained for each vertebral body was cropped to a size of 224 × 224 pixels to ensure that it was at the center of the image. These cropped images were used as inputs for the subsequent AI model.

The second-stage AI model was an EfficientNet-based transfer learning model. EfficientNet is a deep learning model that achieves high image recognition accuracy by sequentially scaling up the width, depth, and resolution of a CNN model using a fixed ratio.[26] Models scaled up from B0–B7 can be selected, and the proposed software employs EfficientNetB2 as the base model. However, the head section, which provides the final output, was changed to directly output the coordinates of the six measurement points on the vertebral body image. Furthermore, we adopted a transfer learning method that conducts re-learning (fine tuning) for the main task of determining measurement points based on the trained model of EfficientNet’s feature extraction part.[24] Additionally, a program was created to calculate the Euclidean distance for each pair of anterior (A), central (C), and posterior (P) edges using the coordinates of the six measurement points output by the second-stage AI model and obtain the vertebral body height ratios.

In addition, the two AI models exhibited high measurement accuracy for diverse clinical images (differences in subject and positioning) that were obtained using an image enhancement method that increases training data variations by randomly preprocessing (edge enhancement processing, noise addition, resizing, contrast conversion, rotation processing, etc.) the training data.

Performance evaluation

First, we evaluated the time required for measurements. A total of 20 images (10 each of thoracic and lumbar spine) were randomly extracted from the validation data set, and the image size, number of detected vertebral bodies, and time required for analysis were calculated. The correlations between each parameter were calculated using a standard office notebook without a dedicated GPU, comprising an 11th Gen Intel(R) Core (TM) i5-1135G7 @ 2.40 GHz and 8 GB RAM, running on Windows 10.

Next, we verified the accuracy of the vertebral body height ratios calculated using the software. First, we compared the correlation and consistency between the vertebral body height ratios calculated by the software and the training data creator using the validation set. Next, using the test set, we compared the correlation and consistency between the vertebral body height ratios calculated by the software and two spine surgeons (SS1 and SS2) and one radiologist (R), who all had over 10 years of clinical experience.

Statistical analysis

The mean absolute error (MAE) between each vertebral body height ratio and those between measurers is expressed as mean ± standard deviation (SD), unless otherwise specified. Additionally, data distribution normality was evaluated through the Shapiro–Wilk test. Correlations between parameters were determined through Pearson or Spearman correlation tests based on the normality evaluation results, and Bland–Altman analysis was employed to evaluate the consistencies of the calculated ratios. P < 0.05 was considered statistically significant. Statistical analysis was performed using SAS Studio ver. 3.81 (SAS Institute Inc., Cary, NC, USA).

Measurement times of the proposed software

Table 1 shows the time required for analyzing thoracic and lumber spine images. The average analysis time was 6.44 ± 1.66 s for the thoracic spine and 5.28 ± 1.13 s for the lumbar spine, and although slightly more time was required for the thoracic spine, no statistically significant difference was observed. Additionally, the Shapiro–Wilk test indicated that all parameters were not normal. The correlation evaluation (Spearman's correlation coefficient (r_s)) of the measurement time, image size, and number of detected vertebral bodies showed that for the thoracic spine, r_swas 0.596 (p = 0.069) for the image size and 0.090 (p = 0.804) for the number of detected vertebral bodies, and for the lumbar spine, r_swas 0.815 (p = 0.004) for the image size and 0.651 (p = 0.041) for the number of detected vertebral bodies, indicating that thoracic and lumbar spine measurements tend to be correlated with image size.

Verification using the validation set

For the validation set, the vertebral body height ratios obtained using the software were C/A = 1.019 ± 0.043, C/P = 0.902 ± 0.054, and A/P = 0.888 ± 0.074. Additionally, the ratios obtained by the training data creator were C/A = 1.025 ± 0.070, C/P = 0.917 ± 0.069, and A/P = 0.900 ± 0.094. Moreover, the Shapiro–Wilk test confirmed that all ratios were not normally distributed. Additionally, the number of images classified according to the SQ grades by the training data creator were SQG0: 906 (92.3%), SQG1: 36 (3.7%), SQG2: 35 (3.6%), and SQG3: 5 (0.5%).

In the correlation analysis, Spearman's correlation coefficient (r_s) was 0.605 for C/A, 0.721 for C/P, and 0.798 for A/P (all p < 0.0001). The slopes and confidence intervals of the regression equation were calculated using scatter plots indicating the vertebral body height ratios calculated via AI on the X-axis and those calculated by the training data creator on the Y-axis, which showed C/A = 0.960 (95% CI, 0.877–1.042), C/P = 0.996 (95% CI, 0.945–1.048), and A/P = 1.105 (95% CI, 1.065–1.145) (Fig. 3).

Additionally, the concordance was examined through Bland–Altman analysis, which showed that the percentage of vertebral bodies included in the limits of agreement (LOA) was 96.8% for C/A, 95.7% for C/P, and 94.9% for A/P (Fig. 4). Furthermore, a post hoc analysis of the subgroup classification was conducted to check whether the concordance changed according to the SQ grade. The results presented in Table 2 show that although 95% or more ratios of SQG0 were included within the LOA, this percentage decreased as the SQ grade increased, increasing the MAE between the ratios calculated by the training data creator and AI model.

Furthermore, extraction of vertebral bodies for which the discrepancy between the vertebral body height ratio calculated by the software and the training data creator was ≥0.2 showed that a total of eight vertebral bodies (0.8%) were affected: SQG0: 3, SQG1: 0, SQG2: 1, and SQG3: 4. These vertebral bodies incorrectly pointed to adjacent vertebral bodies, vertebral arches, diaphragm, and posterior airway wall (Fig. 5).

Verification using test set

Fig. 6 shows the distribution of each vertebral body height ratio calculated by the software and external evaluator and the results of the correlation and agreement evaluation between the evaluators. The vertebral body height ratios calculated using the software were C/A = 1.008 ± 0.040, C/P = 0.915 ± 0.051, and A/P = 0.910 ± 0.073; C/A = 0.986 ± 0.070, C/P = 0.903 ± 0.071, A/P = 0.920 ± 0.093 by SS1; C/A = 1.033 ± 0.061, C/P = 0.931 ± 0.063, A/P = 0.905 ± 0.084 by SS2; and C/A = 0.993 ± 0.056, C/P = 0.905 ± 0.065, A/P = 0.915 ± 0.090 by R. Additionally, the Shapiro–Wilk test confirmed that all the calculated vertebral body height ratios were not normally distributed. Moreover, the r_s for C/A, C/P, and A/P were 0.519–0.589, 0.558–0.647, and 0.735–0.770 (all p < 0.0001), respectively. The correlations between the external evaluators were also of similar magnitude. Bland–Altman analysis showed that the percentage of vertebral bodies included in the LOA was 94.8% only when comparing the A/P calculations of the software and SS2, and exceeded 95% for other comparisons. The concordance between the external evaluators was 93.8–94.8% for C/A, 94.5–94.8% for C/P, and 94.1–95.7% for A/P, which were slightly lower than those between the software and each external evaluator. The MAE between the software and the external evaluators was comparable to the MAE observed between the external evaluators, as shown in Table 3.

The proposed software was designed for use in clinical settings. Therefore, we prioritized shortening the measurement time and improving efficiency. The results showed that the time required for analyzing one image using the proposed software was approximately 6 s. After automatic analysis, the software involves a process wherein the user manually checks the measurement points for each vertebral body and makes the necessary corrections. Depending on the proficiency level of the user, most images were analyzed in less than 1 min. Based on this result, we believe the proposed software is suitable for use in clinical settings.

The correlation analysis showed that C/A had the lowest correlation coefficient between the AI models and each rater among the vertebral body height ratios in both the validation and test sets. This study focused on vertebral body height ratios and did not evaluate the position of the measurement point itself because we assumed that the measurement point at the vertebral body center varied the most between examiners. Additionally, decisions are made easier when the left and right endplates overlap and are clearly visible in the vertebral body. However, if the vertebral body is tilted and rotated owing to the limb position or scoliosis, the endplate on the side farther from the cassette appears thinner, making it more difficult to set the measurement point at the center of the vertebral body. Furthermore, when developing the software, preliminary observations showed that increasing the number of vertebral bodies used for learning in the validation set could improve the correlation between the height ratio of each vertebral body and the training data creator. Considering the almost no increase in the correlation coefficient of each vertebral body height ratio after annotating 2000 vertebral bodies suggests that a further increase in the number of learned vertebral bodies may not necessarily improve software accuracy. Discrepancies of 0.2 or more between the AI- and human-based calculations of vertebral body height ratios were 0.8 and 2.4% in the validation and test sets, respectively. For vertebral bodies whose accurate lateral images cannot be captured owing to scoliosis or lateral bending, the endplates and vertebral arches of adjacent vertebral bodies may be misrecognized owing to the difficulty of determining the edges of vertebral bodies. Vertebral bodies with a deviation of 0.2 or more are relatively easy to recognize through observations after automatic analyses. However, based on the low deviation rate, we believe their impact on the detection efficiency is limited.

QM involves cumbersome process that requires significant time to manually set measurement points, wherein more than 10 min are required for each X-ray image, and approximately 20 min for images including the thoracic and lumbar spine. Thus far, software that semi-automatically evaluates vertebral body height using a statistical decomposition method has been reported. [27–29] However, they require manual detection of vertebral bodies. Additionally, they require approximately 35 + 10 s to interpret each vertebral body, resulting in approximately 7 min for each case. In contrast, the proposed software performs fully automatic detection of vertebral bodies and requires approximately 6 s to interpret an image for one case, potentially offering significant advantages in daily clinical practice. Suri and colleagues' system is capable of analyzing the vertebral body heights across the entire spine from CT, MRI, and radiography within 2 s. [20] However, their analysis targets the vertebrae from T10 to L5 for radiography. Vertebral fractures can occur not only in the lower thoracic spine but also in the entire thoracic spine below T4, and they are not clinically rare. [21] While our software may take slightly longer than their system in terms of measurement time, we believe it offers value by providing a more comprehensive evaluation of vertebral body compression from radiography, the most widely used modality globally, for both thoracic and lumbar spines. Additionally, while they have validated the performance of their system in comparison with radiologists, a unique aspect of our study is that we have also conducted this comparison among external evaluators with significant clinical experience in the test set. It is noteworthy that a certain degree of discrepancy was observed between the software and external evaluators' vertebral height ratios, and a similar level of discrepancy was observed among the external evaluators. Addressing this might require efforts such as promoting a unified approach to evaluating vertebral heights across various vertebrae, potentially advocated by international bodies. Attempts have been made to reduce measurement discrepancies among evaluators through tutorial-based training programs. [17] In our study, both the training data creators and external evaluators referred to Genant et al.'s evaluation method [14]; however, they did not receive comprehensive training. Had they undergone such training, consistency might have been improved. However, as the results of that study suggest, a certain degree of divergence persists even after tutorials. [17] Therefore, it is important to acknowledge that achieving complete agreement among evaluators is realistically impossible, which is a limitation that must be recognized not only in the use of this software but also more broadly in evaluations using QM. Additionally, a software that uses deep CNN to detect fractures from thoracolumbar vertebral body images has also been developed,[18] with its fracture detection performance confirmed to be comparable to those of orthopedic specialists. In addition to its aim of detecting fractures rather than quantitatively evaluating them, it primarily differs from the proposed software in that old cases with more than one month of injury and those with SQ grade 1 are excluded from the learning data. Moreover, the proposed software may falsely recognize obsolete or deformed vertebral bodies as vertebral fractures. However, for diagnosing osteoporosis, the presence of vertebral fractures rather than the onset time is important; we believe that this method is significant as it has the potential to improve the diagnostic rate of osteoporosis.

Although detection of severe vertebral fractures is easy, diagnosing minor vertebral fractures requires training[17, 30] and is often overlooked in routine clinical practice. The proposed software automates the process of QM, thereby enabling faster and more efficient evaluations, and is expected to improve the diagnostic rate of minor vertebral body fractures. In particular, as two-thirds of vertebral fractures are asymptomatic fractures[4, 5] and crushing is often mild[5], the proposed software is expected to contribute toward improving the overall diagnostic rate of vertebral fractures and osteoporosis. Additionally, owing to its considerably short analysis time, the proposed software may be suitable for use in mass osteoporosis screenings and clinical research, which can help provide early treatment and fracture prevention, making it possible to maximize the effects of therapeutic drugs.

Currently, the mainstream quantitative evaluation methods are SQ, and the difference between them and the proposed software lies in their evaluation complexities. SQ only assigns a grade to each vertebral body, whereas QM requires setting six measurement points on each vertebral body and calculating the vertebral body height ratios. The proposed software makes high-throughput testing easier and may be more useful in daily clinical practice. In addition, QM allows subdividing and optimizing the vertebral body height ratio threshold for diagnosing fractures based on age, gender, and race. The reference range for this threshold is set based on research results for various races and ages[10, 12, 13, 31]. However, in recent years, several studies have reported that vertebral body morphology differs based on race,[32–34] gender,[33] and age.[34] Additionally, several studies have shown slight vertebral body crushing even in relatively young people[21, 35]; we believe that it is necessary to understand the reference range of vertebral body morphology across a wide range of generations, genders, and races, rather than solely focusing on the elderly population. To apply these research results in clinical practice, rapid QM is necessary, which is enabled by the proposed software.

However, several recent studies have stated that qualitative evaluation methods that emphasize the presence or absence of endplate damage are more useful for diagnosing new vertebral body fractures and assessing subsequent fracture risk. Additionally, qualitative assessment methods have been reported to be strongly associated with low bone mineral density,[15, 36] and development of vertebral[15] and non-vertebral osteoporotic fractures[15, 37]. In addition, it has been noted that QM may increase the false-positive rate by misrecognizing vertebral body deformities that are not fragility fractures (e.g., Schmorl's nodes and Scherermann's disease) as fractures.[31, 38] When evaluating endplate damage through QM, the deformation of the vertebral body owing to endplate damage is mainly reflected in the height of the central vertebral body;[14] however, the degree of deformation may be underestimated by using the midpoint. However, even if endplate damage occurs, the morphology of the endplate becomes smooth over time because of remodeling, and it may no longer be recognized as endplate damage through qualitative evaluation methods.[39] By using the proposed software, after the vertebral body height ratio is automatically calculated by AI, each vertebral body is enlarged and displayed, and the measurement point position can be manually confirmed and corrected. At this point, it is possible to improve the diagnostic accuracy of vertebral body fractures by combining QM with various qualitative evaluations, such as evaluation of endplate damage and exclusion of vertebral body deformity.[40]

This study had several limitations. The first was the problem of ground truths in training data. As it may be difficult to establish the ground truth for a measurement point by only using a simple X-ray image, it would be ideal if CT and MRI scans were performed simultaneously. However, deep learning of this software required more than 3,000 vertebral bodies, and it was impossible to simultaneously capture plain X-ray, CT, and MRI images. In this study, the training data were collected based on the measurement point creation method proposed by Genant et al.,[14] and were verified by an external expert to address issues of subjectivity and accuracy. The correlation and consistency of the measurement results with the external verifiers were stated in the results, and the correlation and consistency between the external verifiers were also the same. Each external evaluator was an expert with sufficient clinical experience, and it is considered that a certain degree of ambiguity in setting measurement points using only lateral X-ray images is inevitable. Second, the degree of collapse of the vertebral body images in the training data is not uniform. Although training the same number of vertebral bodies with various degrees of crushing is considered ideal, in this study, vertebral bodies with SQG0 accounted for more than 90% of the training data. In fact, in the validation set, as vertebral body collapse progressed, the consistency between the ratios calculated by the training data creator and the AI models decreased. As the training data were randomly selected from clinical images over a certain period, discrepancies in the proportions between SQ grades were inevitable. However, if the number of crushed vertebral bodies, such as those included in SQG2 and SQG3 can be increased, it may be possible to improve the measurement accuracy for crushed vertebral bodies. However, considering the aforementioned preliminary study on improving measurement accuracy by increasing the number of learning vertebral bodies, this possibility may be limited. Third, there is a concern about the inaccuracy of measurement points in scoliosis cases. In scoliosis, the vertebral body is not only laterally but also rotationally deviated, making it difficult to set measurement points at the rear in addition to the central measurement point. Additionally, scoliosis cases with a Cobb angle of 15° or more were excluded from the training and validation/test data. Therefore, it was not possible to evaluate the performance of the proposed software for cases of scoliosis as it was technically challenging. Developing a technology that can simulate 3D evaluations of the vertebral body and set measurement points more accurately by combining frontal X-ray, CT, and MRI images may allow for obtaining better-quality training data.

In this study, we developed an AI-based software that allows quantitative evaluations of vertebral body fractures through QM. It analyzes lateral X-ray images of the thoracic and lumbar spine, automatically detects six measurement points on the vertebral body in just 6 s, and calculates the vertebral body height ratios. Its precision was equivalent to that of external experts, and it can significantly shorten the time required to diagnose vertebral body fractures, allowing for timely and more appropriate treatment.

Competing interests

S.A.: Patent application pending; Y.T.: Employee—Shimadzu Corporation; K.N.: Employee—Shimadzu Corporation. Patent application pending; and M.S.: Patent application pending. All other authors have no competing interest to declare.

Author Contribution

S.A.: Conceptualization, Data Collection, Validation, Formal Analysis, Investigation, Visualization, Data Curation, Writing Original Draft; A.S.: Data Collection; D.A.: Data Collection; T.F.: Data Collection; Y.T.: Project Administration, Data Curation, Funding Acquisition; K.N.: Software Development, Draft Review and Editing and M.S.: Conceptualization, Draft Review and Editing, Supervision.

Acknowledgements

The authors would like to thank the external evaluators from The Jikei University School of Medicine for their independent assessment of the software's performance; it should be noted that this evaluation was carried out without any financial transaction between the contributors. The authors would also like to thank Joe Ledsam for his advice on the statistical analysis of the data.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Johnell, O., et al. Mortality after osteoporotic fractures. Osteoporos. Int. 15, 38–42 (2004)
Xiao, P. L., et al. Global, regional prevalence, and risk factors of osteoporosis according to the World Health Organization diagnostic criteria: a systematic review and meta-analysis. Osteoporos. Int. 33, 2137–2153 (2022)
Angeli, A., et al. High prevalence of asymptomatic vertebral fractures in post-menopausal women receiving chronic glucocorticoid therapy: A cross-sectional outpatient study. Bone 39, 253–259 (2006)
Pongchaiyakul, C. et al. Asymptomatic vertebral deformity as a major risk factor for subsequent fractures and mortality: A long-term prospective study. J. Bone Miner. Res. 20, 1349–1355 (2005)
El Maghraoui, A. et al. Vertebral fracture assessment in asymptomatic men and its impact on management. Bone 50, 853–857 (2012)
Barnett, E. & Nordin, B. E. C. The radiological diagnosis of osteoporosis: A new approach. Clin. Radiol. 11, 166–174 (1960 Jul 1)
Jensen, K. K. & Tougaard, L. A simple X-ray method for monitoring progress of osteoporosis. Lancet 2, 19–20 (1981 Jul 4)
Gallagher, J. C., Hedlund, L. R., Stoner, S. & Meeger, C. Vertebral morphometry: normative data. Bone Miner. 4, 189–196 (1988 Jun)
Davies, K. M., Recker, R. R. & Heaney, R. P. Normal vertebral dimensions and normal variation in serial measurements of vertebrae. J. Bone Miner. Res. 4, 341–349 (1989 Jun 1)
Eastell, R., Cedel, S. L., Wahner, H. W., Riggs, B. L. & Melton, L. J. III. Classification of vertebral fractures. J. Bone Miner. Res. 6, 207–215 (1991 Mar)
Melton, L. J. et al. Prevalence and incidence of vertebral deformities. Osteoporos. Int. 3, 113–119 (1993)
McCloskey, E. V., et al. The assessment of vertebral deformity: a method for use in population studies and clinical trials. Osteoporos. Int. 3, 138–147 (1993 May)
Jackson, S. A., Tenenhouse, A., Robertson, L. & the CaMos Study Group. Vertebral fracture definition from population-based data: preliminary results from the Canadian multicenter osteoporosis study (CaMos). Osteoporos. Int. 11, 680–687 (2000 Sep 1)
Genant, H. K., Wu, C. Y., van Kuijk, C. & Nevitt, M. C. Vertebral fracture assessment using a semiquantitative technique. J. Bone Miner. Res. 8, 1137–1148 (1993 Sep)
Lentle, B. C., et al. Comparative analysis of the radiology of osteoporotic vertebral fractures in women and men: cross-sectional and longitudinal observations from the Canadian multicentre osteoporosis study (CaMos). J. Bone Miner. Res. 33, 569–579 (2018 Apr)
Uemura, Y. et al. Comparison of expert and nonexpert physicians in the assessment of vertebral fractures using the semiquantitative method in Japan. J. Bone Miner. Metab. 33, 642–650 (2015)
Gardner, J. C., von Ingersleben, G., Heyano, S. L. & Chesnut, C. H. III. An interactive tutorial-based training technique for vertebral morphometry. Osteoporos. Int. 12, 63–70 (2001)
Murata, K., et al. Artificial intelligence for the detection of vertebral fractures on plain spinal radiography. Sci. Rep. 10, 20031 (2020)
Pang, S., et al. Direct automated quantitative measurement of spine by cascade amplifier regression network with manifold regularization. Med. Image Anal. 55, 103–115 (2019)
Suri, A., et al. Vertebral deformity measurements at MRI, CT, and radiography using deep learning. Radiol. Artif. Intell. 4, e210015 (2022 Jan)
Horii, C., et al. Differences in prevalence and associated factors between mild and severe vertebral fractures in Japanese men and women: the third survey of the ROAD study. J. Bone Miner. Metab. 37, 844–853 (2019 Sep)
McEvoy, C. E. et al. Association between corticosteroid use and vertebral fractures in older men with chronic obstructive pulmonary disease. Am. J. Respir. Crit. Care Med. 157, 704–709 (1998 Mar 1)
van Dort, M. J., et al. High imminent vertebral fracture risk in subjects with COPD with a prevalent or incident vertebral fracture. J. Bone Miner. Res. 33, 1233–1241 (2018 Jul)
di Filippo, L. et al. Radiological thoracic vertebral fractures are highly prevalent in COVID-19 and predict disease outcomes. J. Clin. Endocrinol. Metab. 106, e602–e614 (2021 Feb 1)
He, K., Gkioxari, G. & Mask, D. P. r-cnn. Proc. IEEE (2017)
Tan, M. & Le Q. EfficientNet: rethinking model scaling for convolutional neural networks in (eds Chaudhuri, K. & Salakhutdinov, R.). Proceedings of the 36th International Conference on Machine Learning. Internet. PMLR, Jun 09–15 2019 6105–6114. (Proceedings of Machine Learning Research; vol. 97), Available from: https://proceedings.mlr.press/v97/tan19a.html
Brett, A., et al. Development of a clinical workflow tool to enhance the detection of vertebral fractures: accuracy and precision evaluation. Spine 34, 2437–2443 (2009)
Guglielmi, G., Stoppino, L. P., Placentino, M. G., D’Errico, F. & Palmieri, F. Reproducibility of a semi-automatic method for 6-point vertebral morphometry in a multi-centre trial. Eur. J. Radiol. 69, 173–178 (2009 Jan)
Guglielmi, G., Haslam, J., DʼErrico, F., Steiger, P. & Nasuto, M. Comprehensive vertebral deformity and vertebral fracture assessment in clinical practice: intra- and inter-reader agreement of a clinical workflow tool. Spine 38, E1676–E1683 (2013 Dec 15)
Wáng, Y. X. J., et al. Semi-quantitative grading and extended semi-quantitative grading for osteoporotic vertebral deformity: a radiographic image database for education and calibration. Ann. Transl. Med. 8, 398 (2020 Mar)
Smith-Bindman, R., Cummings, S. R., Steiger, P. & Genant, H. K. A comparison of morphometric definitions of vertebral fracture. J. Bone Miner. Res. 6, 25–34 (1991 Jan)
Ross, P. D., Wasnich, R. D., Davis, J. W. & Vogel, J. M. Vertebral dimension differences between Caucasian populations, and between Caucasians and Japanese. Bone 12, 107–112 (1991)
Ning, L., et al. Vertebral heights and ratios are not only race-specific, but also gender- and region-specific: establishment of reference values for mainland Chinese. Arch. Osteoporos. 12, 88 (2017 Oct 11)
Hipp, J. A., Grieco, T. F., Newman, P. & Reitman, C. A. Definition of normal vertebral morphometry using NHANES-II radiographs. JBMR Plus 6, e10677 (2022 Oct)
Ferrar, L. et al. Prevalence of non-fracture short vertebral height is similar in premenopausal and postmenopausal women: the osteoporosis and ultrasound study. Osteoporos. Int. 23, 1035–1040 (2012 Mar)
Jiang, G., Eastell, R., Barrington, N. A. & Ferrar, L. Comparison of methods for the visual identification of prevalent vertebral fracture in osteoporosis. Osteoporos. Int. 15, 887–896 (2004)
Johansson, H., Odén, A., McCloskey, E. V. & Kanis, J. A. Mild morphometric vertebral fractures predict vertebral fractures but not non-vertebral fractures. Osteoporos. Int. 25, 235–241 (2014 Jan)
Fechtenbaum, J., et al. Difficulties in the diagnosis of vertebral fracture in men: agreement between doctors. Joint Bone Spine 81, 169–174 (2014 Mar)
Szulc, P. Vertebral fracture: diagnostic difficulties of a major medical problem. J. Bone Miner. Res. 33, 553–559 (2018 Apr)
Oei, L., et al. Osteoporotic vertebral fracture prevalence varies widely between qualitative and quantitative radiological assessment methods: the Rotterdam study. J. Bone Miner. Res. 33, 560–568 (2018 Apr 1)

Table 1. Time required for measurements: (A) Images of the thoracic spine. (B) Images of the lumber spine.

(A)

No.	Image size (pixels)	Detected vertebrae (n)	Measurement time (s)
1	3520×4280	10	10.13
2	2373×2836	12	6.43
3	2373×2836	11	6.26
4	2373×2836	10	5.98
5	2373×2836	11	5.43
6	1926×3408	11	8.72
7	2373×2836	11	5.48
8	2373×2836	11	5.70
9	1713×2426	11	5.18
10	1713×2426	10	5.13

(B)

No.	Image size (pixels)	Detected vertebrae (n)	Measurement time (s)
1	2540×3600	8	7.65
2	1713×2426	7	4.78
3	1713×2426	10	5.21
4	1713×2426	9	4.98
5	1713×2426	9	4.80
6	1849×2909	9	7.03
7	1713×2426	8	4.75
8	1713×2426	8	4.65
9	1713×2426	8	4.78
10	1693×2033	6	4.13

Table 2. Bland–Altman analysis of the validation set. The MAE is expressed as mean ± SD. For each vertebral body height ratio, the MAE increased and the proportion within the LOA decreased as the SQ grade increased. MAE = mean absolute error; LOA = limit of agreement; SD = standard deviation.

	SQG	n	MAE	Within LOA (n)	Within LOA (%)
C/A	0	906	0.033 ± 0.033	892	98.5
	1	36	0.051 ± 0.040	34	94.4
	2	35	0.084 ± 0.050	25	71.4
	3	5	0.363 ± 0.130	0	0.0
C/P	0	906	0.033 ± 0.031	877	96.8
	1	36	0.031 ± 0.027	33	91.7
	2	35	0.044 ± 0.046	27	77.1
	3	5	0.067 ± 0.049	3	60.0
A/P	0	906	0.035 ± 0.030	874	96.5
	1	36	0.045 ± 0.033	31	86.1
	2	35	0.056 ± 0.033	26	74.3
	3	5	0.175 ± 0.081	1	20.0

Table 3. MAE between the software and the external evaluators for each vertebral body height ratio in the test set. The MAE is expressed as mean ± SD. The MAE between the software and the external evaluators was comparable to the MAE observed between the external evaluators. MAE = mean absolute error; SD = standard deviation; SS1 = spine surgeon 1; SS2 = spine surgeon 2; R = radiologist.

	C/A	C/P	A/P
AI vs SS1	0.0458 ± 0.0448	0.0415 ± 0.0394	0.0435 ± 0.0392
AI vs SS2	0.0445 ± 0.0376	0.0378 ± 0.0322	0.0390 ± 0.0317
AI vs R	0.0351 ± 0.0345	0.0326 ± 0.0297	0.0373 ± 0.0339
SS1 vs SS2	0.0604 ± 0.0466	0.0491 ± 0.0408	0.0415 ± 0.0380
SS1 vs R	0.0412 ± 0.0375	0.0386 ± 0.0369	0.0385 ± 0.0350
SS2 vs R	0.0510 ± 0.0363	0.0419 ± 0.0333	0.0384 ± 0.0335

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Development and performance verification of AI-based software for quantitative diagnosis of human vertebral fractures

Status:

Version 1

Abstract

Figures

Introduction

Material and Methods

Data collection

Software development

Performance evaluation

Statistical analysis

Results

Discussions

Conclusions

Declarations

Competing interests

Author Contribution

Acknowledgements

Data availability

References

Tables

Additional Declarations

Status:

Version 1