A robust artificial intelligence method detects almost non-reactive Non-stress pattern: What we expect?

doi:10.21203/rs.3.rs-3314240/v1

Download PDF

Article

A robust artificial intelligence method detects almost non-reactive Non-stress pattern: What we expect?

https://doi.org/10.21203/rs.3.rs-3314240/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Objective

To compare interpretation of prenatal non-stress (NST) pattern between obstetricians and artificial intelligence (AI), and to determine the degree of agreement of AI system.

Methods

One thousand records of prenatal NST pattern with 20 to 30 minutes were interpreted using AI system, as well as visual interpretation of five obstetricians, to explore the agreement and accuracy of AI system. Weighted kappa was used to assess reliability of AI for interpretation of prenatal NST pattern.

Results

A total of 967 cases enroll in this study. Moderate agreement (kappa, 0.48) was found among the five obstetricians for FHR pattern during antepartum period. The AI system recognized NST pattern like obstetricians, with a moderate kappa coefficient of agreement of 0.42. When AI was used to assess the strong consistent set of inter-obstetricians, the agreement was high (kappa, 0.75). AI could identify major non-reactive NST pattern, with high sensitivity of 91.67%. A concordant identification was observed 71.76% of preterm cases and 66.05% of term cases.

Conclusion

Based on the visual interpretation of obstetricians, AI was excellent for antepartum FHR monitoring interpretation, regardless gestational age. Further, AI showed a competitive ability to identify non-reactive NST pattern and the potential avoidance of unnecessary clinical intervention.

Biological sciences/Computational biology and bioinformatics

Biological sciences/Computational biology and bioinformatics/Data processing

Biological sciences/Computational biology and bioinformatics/Machine learning

antepartum

non-stress test

fetal heart rate

visual interpretation

artificial intelligent

Prenatal non-stress test, known as NST, is an assessment tool to evaluate fetal well-being from 32 weeks of gestation to term. NST is widely used to continuously record fetal heart rate (FHR) before the onset of labor. It’s critical to perform a close FHR monitoring for fetal well-being and determine necessary clinical interventions for pregnant women at high risk, such as fetal growth restriction. In 1997, the National Institute of Child Health and Human Development developed firstly guideline for FHR waveform pattern^[1]. Guidelines of NST pattern have been proposed^[2–4], which categorized it into three tiers (reactive, atypical and non-reactive). The prenatal NST pattern is typically printed on paper and visually analyzed by obstetricians to detect abnormalities, including low variability, bradycardia, tachycardia, decelerations and sinusoidal patterns^[5]. It’s demonstrated that accurate interpretation of NST pattern improves neonatal outcomes and avoid unnecessary cesarean section. The limitation of FHR monitoring interpretation was an unacceptably high inter- and intra-observer variation in interpretation, which hamper the clinical interventions^[6–8].

Modern processing large electronic datasets have placed artificial intelligence (AI) in a position to alter the landscape of clinical practice, with AI showing extend capabilities beyond human cognition^[9]. To overcome the present limitations of usage of NST, interpretation by AI has been implemented ^[10]. Signal processing and pattern recognition techniques paved the way for AI to sept over inconsistency in NST interpretation^[11]. Every coin has two sides, and artificial intelligence is no exception. Previous studies demonstrated that interpretation of NST by AI were only useful for typical features of FHR pattern, including acerbation and deceleration^[9]. Balayla ^[12] revealed that AI displayed a moderate agreement in intrapartum FHR interpretation compared to experts, but did not improve neonatal outcomes. Even though AI models which interpreted intrapartum FHR pattern have been developed in previous studies, they could not either represent gold-standard interpretation, nor extract sufficient information from a raw FHR pattern^[13]. Recently, a new approach for interpretation of fetal heart rate based on Gaussion mixture model is proposed, which calculates over the fetal heart rate pattern with interactive local linear regression (ILLR) method. We also found that the AI system of deep Gaussian mixture model promote accuracy of interpretation at least 7% accuracy.

The objectives of the current study were to determine the agreement of interpretation between five obstetricians and AI system. Furthermore, we sought how well AI system for reorganization of non-reactive NST pattern compared to obstetricians. Then we investigate the accuracy and the agreement AI from obstetricians to assess the reliability of AI for interpretation of NST pattern during antepartum period.

This study explored 1000 NST records obtained from the remote fetal heart rate monitoring data system, which were obtained from four tertiary care hospitals between June 1, 2020, and December 31, 2021. Data of pregnant women below 32 weeks of gestation and fetal heart monitoring duration less than 16 minutes were excluded. Based on the original questionnaire, all cases were randomly selected according to 60% of normal data, 20% of suspicious data and 20% of abnormal data. The study was conducted in line with the principle of the Declaration of Helsinki. Ethical approval was obtained from the Ethics Committee of Sun Yat-sen University (No. 21–413)

All cases were interpreted by 5 obstetricians in Shandong and 5 obstetricians in Guangdong, respectively, while the AI system also interpret these 1000 cases. If the interpretation is inconsistent among the obstetricians, the expert panel would then conduct authoritative analysis and calibration to determine the reference results. The reference results were used as the standard to evaluate the ability of the AI early warning system. All the obstetricians were blinded of patients’ clinical information. Then, we investigate the accuracy and agreement of AI model for interpretation using above qualified NST records. The AI system based on an interpretable deep Gaussian mixture model and could automatically identify multiple features. Comparisons of judgment consistency for these three categories between AI and obstetricians were analyzed using weighted kappa coefficient of agreement. Thereafter, AI system was tested in two clinical setting: preterm set (< 37 weeks of gestation) and term set (≥ 37 weeks of gestation).

Statistics

The software R4.2.2 was used for statistical analysis. Agreement between observers was evaluated using the proportions of agreement (Pa), with 95% confidence intervals (95%CI). Reliability was evaluated with the kappa statistic, measuring agreement beyond that expected by chance. Kappa adjusts Pa to the agreement expected by chance, so the distribution of ratings in the different classes influences the results. Predefined agreement criteria were used: 0.00 ≤ κ ≤ 0.20 indicated poor agreement; 0.21 ≤ κ ≤ 0.40, fair agreement; 0.41 ≤ κ ≤ 0.60 moderate agreement; 0.61 ≤ κ ≤ 0.80, substantial agreement and 0.81 ≤ κ ≤ 1.00 excellent agreement ^[14]. AI performance on each NST class and the association between AI and obstetricians were expressed as general performance metrices: sensitivity, specificity, positive predictive value, negative predictive value, true-positive rate and false-positive rate.

According to the exact agreement of five obstetricians, 485 cases were consistent which was considered as group A, while 479 cases required further analysis by experts, which was categorized to group B. As shown in Table 1, all obstetricians identified a mean of 627 reactive NST pattern, 36 non-reactive NST pattern and 36 atypical NST pattern. Regarding to the reliability of inter-rater, kappa of 0.48 [0.45–0.51] showed moderate agreement between 5 obstetricians. Besides, 36 cases were unreadable by obstetricians. Hence, a total of 964 NST pattern records were included in the further analysis.

Table 1

The exact agreement of the inter-obstetricians’ visual interpretation at each the 3 categories
Category	Number	Pa	95%CI	Kappa value
Normal (reactive)	627	0.85	0.84–0.87
Atypical	301	0.51	0.48–0.54
Abnormal (non-reactive)	36	0.48	0.42–0.54
Overall	964	0.72	0.70–0.74	0.48

Figure 1 showed the detail on the amount of agreement between the AI system and inter-obstetricians. AI system identified 575 reactive NST pattern, 188 non-reactive NST pattern and 201 atypical NST pattern. The accuracy of AI was 0.69 [0.67–0.72], with a moderate agreement (Kappa of 0.42) between the rater and AI (Table 2). For the reactive NST set, AI resulted in an accuracy of 83.08%, with the sensitivity and specificity was 82.78% and 83.38% respectively. Compared to the reactive NST pattern, AI system showed a higher accuracy (87.48%) and sensitivity (91.67%) for the non-reactive NST pattern. AI had a high positive predictive value (90.26%) for reactive NST pattern and an excellent negative predictive value (99.61%) for non-reactive NST pattern. However, the accuracy was low (61.89%) for atypical NST pattern by AI system.

Table 2

Agreement between the results of the five obstetricians visual interpretation and AI analysis
Category	Sensitivity (%)	Specificity (%)	PPV (%)	NPV (%)	Accuracy (%)	Kappa
Normal (reactive NST)	82.78	83.38	90.26	72.24	83.08
Atypical (atypical NST)	37.21	86.58	55.72	75.23	61.89
Abnormal (non-reactive NST)	91.67	82.30	17.55	99.61	87.48
Overall					68.88	0.42
PPV:positive predictive value; NPV: negative predictive value

In the group A (Fig. 2), AI interpretation was similar and lay well with the visual interpretation by five obstetricians. Proportion of agreement of AI for exact match to the majority cases were shown in Table 3. In the consistent group A, the proportion of agreement was high (94.02%) and the kappa score indicated substantial agreement (0.75). However, Fig. 3 showed that the accuracy of AI system was low, with slight agreement (0.15) in the inconsistent group B.

Table 3

Proportions of agreement and kappa scores for exact agreement with AI
	Number	Proportion of agreement	95% CI	Kappa
All cases	964	68.89	65.85–71.79	0.42
group A	485	94.02	91.53–95.96	0.75
group B	479	43.42	38.93−48.0	0.15
preterm group	478	71.76	67.49–75.75	0.42
term group	486	66.05	61.65–70.25	0.42

There were 478 NST recorded from preterm group (32–36 + 6 weeks of gestation), while a total of 486 cases from the term group (≥ 37 weeks of gestation). On average, AI interpreted exactly with the majority opinion in 71.76% (95%CI, 67.49–75.75%) of NST records from preterm cases (Table 3), while only 66.05% of NST pattern from term cases was identified by AI system. Regardless gestational age, the agreement between AI system and obstetricians was moderate, kappa value of 0.42 and 0.42 respectively.

The non-stress test (NST) is a keyway for fetal monitoring during antepartum period. Owing to delayed visual interpretation of FHR pattern and discrepancy between obstetricians, AI interpretation of FHR pattern with standard criteria is important. It’s the first study to explore the agreement between AI and obstetricians for prenatal NST pattern. The agreement among 5 obstetricians who used a strict rule showed moderate agreement among the prenatal NST pattern. AI yielded excellent agreement for reactive NST pattern which was consistent with obstetricians’ interpretation. With respecting to non-reactive NST pattern, there was no case misidentified by AI system.

In present study, we identified a moderate agreement for NST pattern interpretation between inter-obstetricians with a kappa of 0.48, in which the agreement was much higher than that previously reported. On the contrary, Uceclla ^[15] showed the reliability of obstetricians for the interpretation of the FHR was poor. In addition, various approaches to determine the clinician agreement of interpretation and showed poor levels of agreement. Devena ^[6] reported that the agreement for accelerations was 0.46 and 0.11–0.55 for decelerations among obstetricians. However, Fontenla ^[16] demonstrated that the inter- and intra-observer variability in visual interpretation was high owing to rich clinical experience of obstetric experts. Similarly, we also found that all obstetricians exactly detected the majority reactive NST pattern (Pa,0.85).

We showed that AI system could identify all non-reactive NST pattern and lay well consistent of the obstetricians. Regarding the AI’s evaluation of non-reactive NST pattern, the agreement with five obstetricians was high, which was an important impression in clinic. In a meta-analysis, Balayla^[12] demonstrated that the inter-rater reliability between obstetricians and AI system was moderate for the interpretation of intrapartum FHR. Recently, we found that some atypical NST pattern were misrecognized or misread by AI system, although the atypical NST pattern is also the limit of this study. Previously, Blackwell ^[17] demonstrated that the agreement of atypical NST pattern was poor owing to lack of consistency between minimal and absent variability. We speculate that the definition of absent variability by AI system was too stringent ^[18]. When non-reactive NST pattern was detected by AI, the pregnant women inquired repeated NST or visited the clinic. The pattern of reactive NST was easier to interpreted, owing to the exponential accelerated knowledge acquisition of AI in nature^[19].

Owing to the delayed interpretation of NST pattern and discrepancy between inter-obstetricians, analysis of prenatal NST pattern with AI system is important^[17]. In our preliminary study, among the NST pattern from term pregnancies, higher kappa value for NST pattern recognition between the AI and obstetricians was detected. In addition. excellent agreement of AI system was found in the NST pattern of preterm pregnancies and better than that of the term pregnancies. Unsurprisingly, the kappa score was like that of visual interpretation by obstetricians for term pregnancies. The concordance analysis of non-reactive NST pattern showed a high agreement by AI, which was interpreted as the inter-obstetricians and positive predictive ability. AI model could help obstetricians to assess fetal well-being quickly during antepartum period, not only relieves time constraints for obstetricians, but also enables high quality care for fetus. In this context, high-risk pregnant women require NST twice or more times each day to ensure sufficient observation, which constitutes a major challenge for obstetricians to interpret numerous results. Further, with advancement in computer processing speed, AI displays real-time interpretation for NST pattern to promote revaluation and subsequent intervention. In 2018, AI was tested in cardiotocography analysis, showing that the level of AI was as similar as the obstetricians and detecting the errors^[20]. AI has its inherent fallacy because the proposed algorithm settled by the clinicians, leading to wrong interpretation ^[11]. Further, absence of a fixed pattern of atypical NST pattern might be another hypothesis.

AI system approach yielded great potential use owing to its high sensitivity for non-reactive NST pattern. In accordance with present study, Liu ^[21] analyzed 3239 FHR pattern by AI and obstetricians and revealed that a moderate agreement (kappa, 0.525) of AI. However, in this report, a higher false-positive rate (0.632) was detected compared to our study. Notably, when aggressive clinical intervention is applied to prevent prenatal compromise, known as treatment paradox, owing to false-positive prediction from AI. AI is unlikely to get over limitations of the FHR pattern itself, such as atypical NST pattern acquisition, visual interpretation is a better-suited approach. Poor accuracy of AI is acceptable in interpretation of atypical NST pattern without missing any non-reactive NST pattern. AI would be most appropriate when studying large atypical NST pattern from hundreds of thousands of patients. One solution to poor interpretation for atypical NST pattern is to develop inherently interpretable models.

Considering the variation of obstetricians and the similar agreement compared to AI, a competitive ability to AI identify NST pattern and potential evaluation of fetal well-being^[22]. In addition, a warning system of AI would inform the pregnancies of atypical or non-reactive NST pattern to ask for further care in clinic. Real-time and high accuracy are important points in NST pattern reading and fetal well-being evaluation^{[13, 23]}. Hence, we believed that AI could be used for outpatient care in the home monitoring and adequately improve surveillance of high-risk pregnancies. Despite the retrospectively proven agreement of AI, the benefits of AI in a real would require prospective clinical studies which focus on obstetric outcomes compared with visual interpretation by obstetricians.

The strength of this study is the large data used to investigate the interpretation of AI for prenatal NST pattern, which is in a real-world setting. In addition, the present study enrolled not only pregnant women without pathology, but also pregnancies at high-risk. One limitation of this study was that the high false positive value of AI system for atypical NST pattern was high, might leading to unnecessary intervension. Indeed, the patients would repeat NST if the AI system gave early warning. Another downside of the present study was that the accuracy and agreement of AI system was only tested in antepartum, even though it proved successful, a larger study is further needed to replicate and validate these results. The current study was also limited to pregnant women presenting from 32 weeks to term, future investigation including pregnancies from 28 weeks onward is needed to perform.

In antepartum FHR monitoring, AI is changing traditional interpretation model of obstetricians. AI could identify all atypical NST pattern which were confirmed by visible obstetricians and showed high accuracy and agreement of interpretation for reactive NST pattern. AI system could be considered as an additional antepartum care net, that is, AI could be used as a support tool in the decision of obstetrics providers. Researchers should empower AI to take literacy to interpret the atypical NST pattern as it is adopted into future use.

Acknowledgments

Not applicable.

Authorship confirmation/contribution statement

Caixia Zhu and Zhuyu Li: Review and editing (equal). Caixia Zhu: Writing original draft. Zhuyu Li: Formal analysis. Xietong Wang: Investigation and data curation. Bin Xu and Xiaohui Guo: Software (equeal). Jingwan Huang and Bin Liu: Project administration. Hongyan Li: Methodology. Yan Kong: Data curation. Xiaobo Yang: Methodology. Jingyu Du: Validation and data curation. Zilian Wang and Haitian Chen: Conceptualization and supervision (equall). All authors read and approved the final manuscript.

Author(s’) disclosure

The authors declare that they have no competing interests.

Funding statement

This work was supported by Guangdong Provincial Natural Science Foundation (No. 2021A1515010411), China Medical Board Grant (No.21-413)

The statement on informed consent to participate

Written informed consent for participates of their clinical details was obtained.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

SANDMIRE H F, DEMOTT R K. Electronic fetal heart rate monitoring: research guidelines for interpretation [J]. Am J Obstet Gynecol, 1998, 179(1): 276-7.
SANTO S, AYRES-DE-CAMPOS D, COSTA-SANTOS C, et al. Agreement and accuracy using the FIGO, ACOG and NICE cardiotocography interpretation guidelines [J]. Acta Obstet Gynecol Scand, 2017, 96(2): 166 − 75.
LISTON R, SAWCHUCK D, YOUNG D. Fetal health surveillance: antepartum and intrapartum consensus guideline [J]. J Obstet Gynaecol Can, 2007, 29(9 Suppl 4): S3-56.
MACONES G A, HANKINS G D, SPONG C Y, et al. The 2008 National Institute of Child Health and Human Development workshop report on electronic fetal monitoring: update on definitions, interpretation, and research guidelines [J]. J Obstet Gynecol Neonatal Nurs, 2008, 37(5): 510-5.
EVANS M I, BRITT D W, EVANS S M, et al. Changing Perspectives of Electronic Fetal Monitoring [J]. Reprod Sci, 2022, 29(6): 1874-94.
DEVANE D, LALOR J. Midwives' visual interpretation of intrapartum cardiotocographs: intra- and inter-observer agreement [J]. J Adv Nurs, 2005, 52(2): 133 − 41.
CHAUHAN S P, KLAUSER C K, WOODRING T C, et al. Intrapartum nonreassuring fetal heart rate tracing and prediction of adverse outcomes: interobserver variability [J]. Am J Obstet Gynecol, 2008, 199(6): 623.e1-5.
FIGUERAS F, ALBELA S, BONINO S, et al. Visual analysis of antepartum fetal heart rate tracings: inter- and intra-observer agreement and impact of knowledge of neonatal outcome [J]. J Perinat Med, 2005, 33(3): 241-5.
IFTIKHAR P, KUIJPERS M V, KHAYYAT A, et al. Artificial Intelligence: A New Paradigm in Obstetrics and Gynecology Research and Clinical Practice [J]. Cureus, 2020, 12(2): e7124.
GALAZIOS G, TRIPSIANIS G, TSIKOURAS P, et al. Fetal distress evaluation using and analyzing the variables of antepartum computerized cardiotocography [J]. Arch Gynecol Obstet, 2010, 281(2): 229 − 33.
SHI X, YAMAMOTO K, OHTSUKI T, et al. Non-invasive Fetal ECG Signal Quality Assessment based on Unsupervised Learning Approach [J]. Annu Int Conf IEEE Eng Med Biol Soc, 2022, 2022: 1296-9.
BALAYLA J, SHREM G. Use of artificial intelligence (AI) in the interpretation of intrapartum fetal heart rate (FHR) tracings: a systematic review and meta-analysis [J]. Arch Gynecol Obstet, 2019, 300(1): 7–14.
CHEN Y, WILKINS M D, BARAHONA J, et al. Toward Automated Analysis of Fetal Phonocardiograms: Comparing Heartbeat Detection from Fetal Doppler and Digital Stethoscope Signals [J]. Annu Int Conf IEEE Eng Med Biol Soc, 2021, 2021: 975-9.
LANDIS J R, KOCH G G. The measurement of observer agreement for categorical data [J]. Biometrics, 1977, 33(1): 159 − 74.
UCCELLA S, CROMI A, COLOMBO G F, et al. Interobserver reliability to interpret intrapartum electronic fetal heart rate monitoring: Does a standardized algorithm improve agreement among clinicians? [J]. J Obstet Gynaecol, 2015, 35(3): 241-5.
FONTENLA-ROMERO O, ALONSO-BETANZOS A, GUIJARRO-BERDIÑAS B. Adaptive pattern recognition in the analysis of cardiotocographic records [J]. IEEE Trans Neural Netw, 2001, 12(5): 1188-95.
BLACKWELL S C, GROBMAN W A, ANTONIEWICZ L, et al. Interobserver and intraobserver reliability of the NICHD 3-Tier Fetal Heart Rate Interpretation System [J]. Am J Obstet Gynecol, 2011, 205(4): 378.e1-5.
BOUDET S, HOUZÉ DE L'AULNOIT A, PEYRODIE L, et al. Use of Deep Learning to Detect the Maternal Heart Rate and False Signals on Fetal Heart Rate Recordings [J]. Biosensors (Basel), 2022, 12(9).
DAS S, OBAIDULLAH S M, MAHMUD M, et al. A machine learning pipeline to classify foetal heart rate deceleration with optimal feature set [J]. Sci Rep, 2023, 13(1): 2495.
CÖMERT Z, KOCAMAZ A F, SUBHA V. Prognostic model based on image-based time-frequency features and genetic algorithm for fetal hypoxia assessment [J]. Comput Biol Med, 2018, 99: 85–97.
LIU L C, TSAI Y H, CHOU Y C, et al. Concordance analysis of intrapartum cardiotocography between physicians and artificial intelligence-based technique using modified one-dimensional fully convolutional networks [J]. J Chin Med Assoc, 2021, 84(2): 158 − 64.
AL-YOUSIF S, NAJM I A, TALAB H S, et al. Intrapartum cardiotocography trace pattern pre-processing, features extraction and fetal health condition diagnoses based on RCOG guideline [J]. PeerJ Comput Sci, 2022, 8: e1050.
DUVAL A, NOGUEIRA D, DISSLER N, et al. A hybrid artificial intelligence model leverages multi-centric clinical data to improve fetal heart rate pregnancy prediction across time-lapse systems [J]. Human reproduction (Oxford, England), 2023, 38(4): 596–608.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

A robust artificial intelligence method detects almost non-reactive Non-stress pattern: What we expect?

Status:

Version 1

Abstract

Objective

Methods

Results

Conclusion

Figures

Introduction

Materials and methods

Statistics

Results

Discussion

Conclusions

Declarations

References

Additional Declarations

Status:

Version 1