Clinical study on ultrasonic artificial intelligence-assisted diagnosis of developmental hip dysplasia in children

doi:10.21203/rs.3.rs-2832274/v1

Download PDF

Article

Clinical study on ultrasonic artificial intelligence-assisted diagnosis of developmental hip dysplasia in children

https://doi.org/10.21203/rs.3.rs-2832274/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background:Developmental hip dysplasia(DDH) is a common pediatric disease.For patients younger than 6 months of age,ultrasound diagnosis is more suitable for screening and assessment of hip development.At present,there is an urgent need for a reproducible and reliable ultrasound screening method for DDH diagnosis.

Purpose: To construct and verify an artificial intelligence-assisted deep learning system for ultrasound diagnosis of developmental hip dysplasia in children.

Materials and Methods: 2021 standard sections were selected from January 2019 to January 2021. All standard sections were annotated using unified standards through the image transmedia data annotation and audit system.1753 images were randomly selected to train the deep learning system,the remaining 268 were used to test the system.

Results:

268 patients were tested. The AUC for diagnosing hip joint maturity was 0.941, (sensitivity 90.5%, specificity 97.8%),while the AUC for Graf classification was 0.685(sensitivity 45.3% specificity 91.7%),compared with clinicians’ measurements. According to the Bland–Altman method, the 95% limits of agreement of α angle was-6.426°~4.811°(Bias=-0.8075,P < 0.001), that of β angle was -5.545°~6.507°(Bias=0.4812,P=0.057). 7 key points measured by AI were statistically different from the clinician values.

Conclusions: The artificial intelligence system could quickly and accurately measure the Graf correlation index of standard hip joint ultrasound images.

Biological sciences/Computational biology and bioinformatics

Health sciences/Medical research

Developmental dysplasia of the hip (DDH) is a common pediatric disease with a global incidence ranging from 0.006% in Africa to 7.61% in the Americas ^{[1, 2]}. For patients younger than 6 months, ultrasound diagnosis is more suitable for screening and evaluation of hip development because secondary ossification centers are formed at 4–6 months of age [1, 3, 4]. The Graf method [5], proposed in 1980, is well accepted by pediatric orthopedic surgeons because it can classify the severity of DDH using full-spectrum ultrasound images [6], and the joint angles measured using this method have lower variability compared with other approaches. Currently, hip ultrasound screening in most hospitals is performed and evaluated by experienced clinicians, often with low accuracy and reproducibility in the diagnosis of DDH. Studies have reported low diagnostic accuracy (55.0–89.4%) and low consistency (17.7–80.8%) of DDH ultrasound screening under complete manual guidance [7–10]. The range of the standard deviation of the α and β angles measured by clinicians was approximately ± 3 ~ 7.1 and ± 5.9 ~ 10.1 [20–24].The accuracy of this clinical screening is closely related to the experience and training of clinicians [11, 12]. In addition, Quader et al. found that ultrasound diagnostic indicators showed temporal variability, declaring “an urgent need for a highly reproducible and reliable DDH ultrasound screening diagnostic method.”

Based on traditional indicators, research groups have proposed new artificial intelligence (AI)-assisted DDH diagnosis methods. Hareendranathan et al. designed a rounding index to analyze DDH [13]; however, this method relies heavily on manual labeling and is not fully automated. Quader et al. further proposed an automatic bone boundary detection system for DDH analysis [14]. At the same time, Sezer et al. proposed a “split” two-step method for the analysis of hip ultrasound images [15]. To build a more rapid and accurate automatic analytic and diagnostic system, we used a convolutional neural network based on clinician-labeled ultrasound images to achieve fast inference of DDH ultrasound image features. Finally, we demonstrated that this method can perform fast image recognition and diagnosis with high accuracy.

1.Ultrasound images

This retrospective study was approved by the Ethics Committee of Anhui Provincial Children's Hospital(approval no. 20190021), and all methods were carried out in accordance with their guidelines and regulations.The requirement for informed consent was waived because the Ethics Committee of Anhui Provincial Children's Hospital used completely anonymised data.2021 standardized ultrasound images (Philips Color Ultrasonic Diagnostic System EPIQ5, 7–12 Hz) were recorded in the outpatient department of hospital from January 2019 to January 2021(Fig. 1). The recorded ultrasonic image segments are taken from the beginning of operation to the end of ultrasonic diagnosis of the operator, and the image used for diagnosis is saved at the same time.The hip joint was placed in a physiologically neutral position due to the influence of different positions on the β angle [16]. The ultrasonic probe was placed on coronal plane and rotated backward about 10–15 degrees.2021ultrasound image fragments were finally generated; the process is shown in Fig. 1.

2.Obtaining cross-sections of ultrasound images

The Graf screening method can only be performed with the cross-section containing standard ultrasound images.The specific screening process is shown in Fig. 1. Then it was submitted to the labeling team to complete the diagnostic image labeling.

3.The annotation group completed the true value of the image annotation

Six key points were labeled on standard hip joint ultrasound images according to Graf theory.The baseline, bone parietal line, and cartilage apical lines were plotted (Fig. 2). The rectus femoris tendon was used as the starting point, and the iliac crest border was connected as the baseline.Connect the lowest point of the ilium with the osseous apex of the acetabulum as the bone parietal line. The drawing method of cartilage apical line is to connect the turning point of cartilage apical from concave to convex with the center of labrum. Graf classification was then determined using the α and β angles (α, between bone parietal line and baseline; β, between cartilage apical line and baseline). 268 standard hip ultrasound images were randomly selected as the test set and labeled.The annotation process is also shown in Fig. 1. Finally the data in the test set could be divided into six groups according to Graf classification and two groups according to joint maturity (Table 1).

Table 1

Basic information
Characteristic	Entire set (n = 2021)	Training set (n = 1753)	Testing set (n = 268)
Sex
Female	1700	1476	224
Male	321	277	44
Age
0–60 days	795	682	113
≥ 61–120 days	880	776	104
≥ 121–180 months	346	295	51
Graf
Non-dislocation	1779	1552	227
IA	1442	1241	201
IB	335	311	24
Dislocation	242	201	41
II	184	149	35
Stable IIC	32	29	3
Unstable IIC, D	3	3	0
IIIA/B, IV	25	20	5

Network framework.

As shown in the Fig. 3, we applied a deep learning method for the automatic diagnosis of ultrasound images. The localization of a landmark was converted to the regression of a heat map, centered at landmark. For an input image, our network first utilized a deep-learning baseline, named ResNet-50, to obtain a rich feature representation of spatial information. Based on the feature map, we used deconvolutional layers to decode the feature map and generate target heat map which indicate the position of the landmark. Finally, the system completed the diagnosis with the position of landmarks.

Heat map regression network use mean square error (MSE) loss to optimize the network. To train the heat map regression network, we generated the target heat map according to the manual labeled landmark. The position of landmark has the maximum pixel value. The pixel value of other positions depend on the distance to the landmark:

$$\text{H}\left(\text{x},\text{y}\right)=-\text{exp}\left\{\frac{{\left(x-{x}^{\ast }\right)}^{2}+{\left(y-{y}^{\ast }\right)}^{2}}{2{\sigma }^{2}}\right\},$$

where ${x}^{\ast },{y}^{\ast }$refer to the coordinate of the landmark, $\text{H}\left(\text{x},\text{y}\right)$ represents the target heat map. $\sigma$ is a hyper-parameter which represent the variance, we use $\sigma =3$ pixels in our work.

After the training of heat map regression network, our method can learn the potential position of landmark by generating the heat map. The landmark can be determined according to the heat map by selecting the position with maximum pixel value. Finally, the α and β angles and illness analysis can be determined by the hip’s morphology according to the landmarks.

5. System test

The authors conducted receiver operating characteristic (ROC) tests on the consistency of the relevant parameters between the AI system and the clinicians on the test set. Subsequently, the ROC curves of the two groups were compared according to clinical diagnosis of maturity (the “mature” group included IA and IB, “immature” included IIA/B, stable IIC, unstable IIC, D, IIIA/B, and IV). The α and β angles measured by the AI system were compared with those measured by clinicians.

6. Statistical analyses

Data were analyzed using SPSS 22.0 (IBM Corp., Armonk, NY, USA) and GraphPad Prism 5 (GraphPad Inc., San Diego, CA, USA). ROC curves were used to evaluate the diagnostic performance of the AI system in determining hip joints maturity. Bland–Altman scatter plots were then used to evaluate consistency between the AI system and the clinician-measured acetabular index. When P < 0.05, the difference was statistically significant.

We trained and optimized an AI neural network model using 1753 standard hip ultrasound images. The complete set included 2021 (321 boys; mean age: 74.89 d), the training set included 1753 patients (277 boys; mean age: 73.96 d), and the test set included 268 patients (44 boys; mean age: 80.49 d).The consistency between the original data coordinates (x and y) of the six key points marked by the AI system and the clinical true value data was detected by Bland-Altman method ,as shown in Table 2.According to the data, x2, x4, x6, y1, y2, y4,and y6 measured by AI were statistically different from the clinician values, as shown in Fig. 4.The Bland–Altman method was also used to detect α and β angles generated by clinician and AI labeling (α, 95%LOA: -6.426°~4.811°,Bias=-0.8075, P < 0.0001); β, 95% LOA༚-5.545°~6.507°, Bias = 0.4812, P = 0.0570). According to the data, the α angles measured by AI were statistically different from the clinician-measured angles. A Bland–Altman diagram of α and β angles is shown in Fig. 5.

Table 2

Data distribution of each annotation points.
	Point 1(x,y)		Point 2(x,y)		Point 3(x,y)		Point 4(x,y)		Point 5(x,y)		Point 6(x,y)
	x1	y1	x2	y2	x3	y3	x4	y4	x5	y5	x6	y6
Bias	-0.53	-1.05	1.37	-0.78	-0.19	-0.97	-1.48	1.55	-0.58	-1.02	-1.84	-2.11
95% LOA lower limit	-12.03	-5.26	-4.37	-4.73	-5.17	-6.21	-7.24	-4.82	-6.59	-9.43	-5.68	-6.22
95% LOA upper limit	13.33	3.26	6.81	3.09	5.85	4.89	3.91	8.17	7.54	8.58	2.40	2.36
p	0.318	< 0.001	< 0.001	< 0.001	0.233	0.022	< 0.001	< 0.001	0.190	0.356	< 0.001	< 0.001

The team selected specific Graf classifications and measured the α and β angles. Clinician-measured results and ultrasound fragment diagnosis results of hip joints with different degrees of dysplasia diagnosed by the AI system are shown below (Table 3). Since the children screened by ultrasound Graf method are generally young and have high growth potential, it is more practical for the health of children to divide the hip joint into "immature group" and "mature group" than using detailed Graf classification.

Table 3

Comparison of clinician and AI system diagnoses.
		True Value					Total
		1a	1b	2a/b	2c/d	3	Total
AI	1a	166	11	5	0	0	182
	1b	33	11	0	0	0	44
	2a/b	2	2	26	0	0	30
	2c/d	0	0	2	3	0	5
	3	0	0	2	0	5	7
Total		201	24	35	3	5	268

The “mature” group includes IA and IB Graf classifications, while the “immature” group includes IIA/B Graf classifications, stable IIC, unstable IIC, D, IIIA/B, and IV. The correlations between clinician labeling and AI are shown in Table 2. There were some inconsistencies between the methods. The distribution of results is shown in Table 3. In the test set of 268 hip joints, the area under ROC curve was 0.941, the sensitivity was TPR = 90.5%, and the specificity was TNR = 97.8% (Fig. 6).

We further evaluated the performance of the AI system in interpreting Graf classifications with the clinical values as a reference. The criteria were: compared with the true value, if the interpretation result of the deep learning system is lighter, it is considered that the judgment is negative. If the result of the deep learning system is more severe, the verdict is considered positive.(Table 4).

Table 4

Comparison between diagnoses of clinicians and AI system.
AI's diagnosis	Clinician's diagnosis		Total
AI's diagnosis	Mature	Immature	Total
Mature	221	5	226
Immature	4	38	42
Total	225	43	268

The ROC curve of Graf subdivision interpreted by the deep learning system was drawn.(Fig. 6). The results showed that the area under the ROC curve was 0.685, the sensitivity was 45.3%, and the specificity was 91.7%, all of which were significantly lower than those for the AI system in determining hip joint maturity.

Among the different methods of ultrasound diagnosis of neonatal hip joint, the repetition rate of α Angle and β Angle of Graf method was the highest, so Graf method was used as the standard in this study [19]. Table 5 shows that the mean value of the α angle was smaller while the mean value of β angle was larger. However, the standard deviation of two angles measured by AI is significantly smaller than that of clinicians. AI measurements are more stable than manual measurements and more likely to expose the immature hip joint (the AI system measured a smaller α angle and a larger β angle than the manual measurement). The range of the standard deviation of the α and β angles measured by clinicians was approximately ± 3 ~ 7.1 and ± 5.9 ~ 10.1.The error of AI measurement in this study is within this range.Therefore, it can be considered that the AI system can detect immature hip joints more stably and sensitively

Table 5

Mean value and standard deviation of α and β angle measured by doctors and AI.
	α angle		β angle
	Mean value	Standard deviation	Mean value	Standard deviation
AI	63.6387	5.91354	53.9823	4.06810
Doc	64.4462	6.87366	53.5012	5.51409

The six points labeled by the AI system were more clustered toward the center of the acetabulum compared with those labeled by clinicians (Fig. 4). This suggests that the AI system was more conservative in its labeling, which may lead to more stability. At the same time, the AI-labeled points were more consistent with the ideal situation in Graf theory; that is, the iliac margin of baseline (point 2), the acetabular vertex of the bone parietal line (point 6), and the center of the labrum of the cartilaginous apical line (point 3) coincide at three points. However, in terms of the α and β angles, we predicted that the β angle, based on soft tissue that is difficult to identify in ultrasound images, would lead to a larger statistical gap, as in other studies [16]. However, only the α angle showed a statistical difference between the methods. For AI, ultrasonic image recognition, accuracy and repeatability are highly dependent on high contrast between image and background and rely on higher image stability. The α angle is based on bone annotation points. Although the image is more clear and obvious to the naked eye, the bone optional area is much larger than the soft tissue area that can be selected by the β Angle labeling point, so there is a larger gap compared with the artificial true value.For subdivision judgment, the α angle was more valuable than β angle, which is consistent with our strategy of favoring the α angle when both angles conflict. This further suggests that it is more important to draw accurate α angles in clinical calculations.

Our AI system could be used to identify and diagnose immature hip joints more sensitively than manual labeling and diagnosis. However, the sensitivity of the Graf method is not as good as that of simple identification and diagnosis of hip joint maturity. Although the sensitivity is low, its ideal specificity can ensure that the screening results will not cause excessive treatment for children diagnosed with immature hip joints. The hip joint in children has great growth potential

so this highly specific detection method is beneficial to protect the normal growth of children.

The study has some limitations. First,the standard cross-section of ultrasonic images can also be intercepted by AI system. However, due to limited energy and time, this study only focused on the diagnostic ability of AI standard interface.Second, the accuracy and resolution of ultrasound is less than other modalities, and manual annotation is highly dependent on operator proficiency and the specific anatomical structures. Third, after selection based on exclusion criteria, only 268 standard sections were used in this study, and the data should be expanded to obtain a more stable AI system.

This study demonstrates the feasibility of using a deep learning system to perform preliminary screening of standard hip ultrasound images in children with immature hips. The preliminary success of the AI-aided DDH diagnosis in this study provides a solid foundation for the development of more rapid, objective, and accurate technology.

Author contributions

Si-Cheng Zhang, Hai-Long Ma,Xi-Wei Sun,Qing-Jie Wu:Project design, implementation and thesis writing

Hai-Long Ma,Xi-Wei Sun,Qing-Jie Wu:Collect, analyze and interpret data

Jun Sun, Si-Cheng Zhang:Writing instruction and paper revision

Jun Sun, Si-Cheng Zhang:Provide administrative, material and surgical technical support

Si-Cheng Zhang, Jing-Yuan Xu:Design and test of computer algorithms

Competing interests

All authors declare no competing interest.

Data availability

The datasets analyzed in this study is available from the corresponding author upon a reasonable request.

Clinical practice guideline: early detection of developmental dysplasia of the hip. Committee on Quality Improvement, Subcommittee on Developmental Dysplasia of the Hip. American Academy of Pediatrics. Pediatrics, 2000. 105(4 Pt 1): p. 896–905.
Loder, R.T. and E.N. Skopelja, The epidemiology and demographics of hip dysplasia. ISRN Orthop, 2011. 2011: p. 238607.
Mulpuri, K., et al., Detection and Nonoperative Management of Pediatric Developmental Dysplasia of the Hip in Infants up to Six Months of Age. J Am Acad Orthop Surg, 2015. 23(3): p. 202–5.
AIUM-ACR-SPR-SRU Practice Parameter for the Performance of an Ultrasound Examination for Detection and Assessment of Developmental Dysplasia of the Hip. J Ultrasound Med, 2018. 37(11): p. E1-E5.
Graf, R., The diagnosis of congenital hip-joint dislocation by the ultrasonic Combound treatment. Arch Orthop Trauma Surg, 1980. 97(2): p. 117–33.
Bruno de Castro Paixao Jacobino, A.M.D.G., A Adriano Ferreira da Silva, B and Claudio Campi de Castrob, C, Using the Graf method of ultrasound examination to classify hip dysplasia in neonates. Autopsy Case Reports, 2012. 2(2): p. 5–10.
Tan, S.H.S., et al., The earliest timing of ultrasound in screening for developmental dysplasia of the hips. Ultrasonography, 2019. 38(4): p. 321–326.
Quader, N., et al., A Systematic Review and Meta-analysis on the Reproducibility of Ultrasound-based Metrics for Assessing Developmental Dysplasia of the Hip. J Pediatr Orthop, 2018. 38(6): p. e305-e311.
Pillai, A., et al., Diagnostic accuracy of static graf technique of ultrasound evaluation of infant hips for developmental dysplasia. Arch Orthop Trauma Surg, 2011. 131(1): p. 53–8.
Woolacott, N.F., et al., Ultrasonography in screening for developmental dysplasia of the hip in newborns: systematic review. BMJ, 2005. 330(7505): p. 1413.
Simon, E.A., et al., Inter-observer agreement of ultrasonographic measurement of alpha and beta angles and the final type classification based on the Graf method. Swiss Med Wkly, 2004. 134(45–46): p. 671–7.
Ahmet Sinan Sari, O.K., Is experience alone sufficient to diagnose developmental dysplasia of the hip without the bony roof (alpha angle) and the cartilage roof (beta angle) measurements?: A diagnostic accuracy study. Medicine (Baltimore), 2020 Apr. 99(14).
Hareendranathan, A.R., et al., Toward automated classification of acetabular shape in ultrasound for diagnosis of DDH: Contour alpha angle and the rounding index. Comput Methods Programs Biomed, 2016. 129: p. 89–98.
Quader, N., et al., Automatic Evaluation of Scan Adequacy and Dysplasia Metrics in 2-D Ultrasound Images of the Neonatal Hip. Ultrasound Med Biol, 2017. 43(6): p. 1252–1262.
Sezer HB, Sezer A. Automatic segmentation and classification of neonatal hips according to Graf's sonographic method: A computer⁃aided diagnosis system[J]. Appl Soft Comput, 2019: 82. DOI: 10.1016/j.asoc.2019.105516.
Xu YL, Chen BC. Ultrasound assessment of normal hip in different position[J]. Chin J Pediatr Surg, 2010, 31(3): 187⁃190. DOI: 10.3760/cma.j.issn.0253⁃3006.2010.03.009.
Graf, R., Fundamentals of sonographic diagnosis of infant hip dysplasia. J Pediatr Orthop, 1984. 4(6): p. 735–40.
American Institute of Ultrasound in Medicine. AIUM practice guideline for the performance of an ultrasound examination for detection and assessment of developmental dysplasia of the hip[J]. J Ultrasound Med, 2013, 32(7): 1307⁃1317. DOI: 10.7863/ultra.32. 7.1307.
Peterlein, C.D., et al., Reproducibility of different screening classifications in ultrasonography of the newborn hip. BMC Pediatr, 2010. 10: p. 98.
Omeroğlu H, Biçimoğlu A, Koparal S, Seber S. Assessment of variations in the measurement of hip ultrasonography by the Graf method in developmental dysplasia of the hip. J Pediatr Orthop B. 2001 Apr;10(2):89–95. PMID: 11360786.
Simon EA, Saur F, Buerge M, Glaab R, Roos M, Kohler G. Inter-observer agreement of ultrasonographic measurement of alpha and beta angles and the final type classification based on the Graf method. Swiss Med Wkly. 2004 Nov 13;134(45–46):671–7. PMID: 15611889.
Jaremko JL, Mabee M, Swami VG, Jamieson L, Chow K, Thompson RB. Potential for change in US diagnosis of hip dysplasia solely caused by changes in probe orientation: patterns of alpha-angle variation revealed by using three-dimensional US. Radiology. 2014 Dec;273(3):870–8. doi: 10.1148/radiol.14140451. Epub 2014 Jun 25. PMID: 24964047.
Lee SW, Ye HU, Lee KJ, Jang WY, Lee JH, Hwang SM, Heo YR. Accuracy of New Deep Learning Model-Based Segmentation and Key-Point Multi-Detection Method for Ultrasonographic Developmental Dysplasia of the Hip (DDH) Screening. Diagnostics (Basel). 2021 Jun 28;11(7):1174. doi: 10.3390/diagnostics11071174. PMID: 34203428; PMCID: PMC8303134.
Hu X, Wang L, Yang X, Zhou X, Xue W, Cao Y, Liu S, Huang Y, Guo S, Shang N, Ni D, Gu N. Joint Landmark and Structure Learning for Automatic Evaluation of Developmental Dysplasia of the Hip. IEEE J Biomed Health Inform. 2022 Jan;26(1):345–358. doi: 10.1109/JBHI.2021.3087494. Epub 2022 Jan 17. PMID: 34101608.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Clinical study on ultrasonic artificial intelligence-assisted diagnosis of developmental hip dysplasia in children

Status:

Version 1

Abstract

Figures

Introduction

Materials and Methods

1.Ultrasound images

2.Obtaining cross-sections of ultrasound images

3.The annotation group completed the true value of the image annotation

5. System test

6. Statistical analyses

Results

Discussion

Declarations

References

Additional Declarations

Status:

Version 1