The non-stress test (NST) is a keyway for fetal monitoring during antepartum period. Owing to delayed visual interpretation of FHR pattern and discrepancy between obstetricians, AI interpretation of FHR pattern with standard criteria is important. It’s the first study to explore the agreement between AI and obstetricians for prenatal NST pattern. The agreement among 5 obstetricians who used a strict rule showed moderate agreement among the prenatal NST pattern. AI yielded excellent agreement for reactive NST pattern which was consistent with obstetricians’ interpretation. With respecting to non-reactive NST pattern, there was no case misidentified by AI system.
In present study, we identified a moderate agreement for NST pattern interpretation between inter-obstetricians with a kappa of 0.48, in which the agreement was much higher than that previously reported. On the contrary, Uceclla [15] showed the reliability of obstetricians for the interpretation of the FHR was poor. In addition, various approaches to determine the clinician agreement of interpretation and showed poor levels of agreement. Devena [6] reported that the agreement for accelerations was 0.46 and 0.11–0.55 for decelerations among obstetricians. However, Fontenla [16] demonstrated that the inter- and intra-observer variability in visual interpretation was high owing to rich clinical experience of obstetric experts. Similarly, we also found that all obstetricians exactly detected the majority reactive NST pattern (Pa,0.85).
We showed that AI system could identify all non-reactive NST pattern and lay well consistent of the obstetricians. Regarding the AI’s evaluation of non-reactive NST pattern, the agreement with five obstetricians was high, which was an important impression in clinic. In a meta-analysis, Balayla[12] demonstrated that the inter-rater reliability between obstetricians and AI system was moderate for the interpretation of intrapartum FHR. Recently, we found that some atypical NST pattern were misrecognized or misread by AI system, although the atypical NST pattern is also the limit of this study. Previously, Blackwell [17] demonstrated that the agreement of atypical NST pattern was poor owing to lack of consistency between minimal and absent variability. We speculate that the definition of absent variability by AI system was too stringent [18]. When non-reactive NST pattern was detected by AI, the pregnant women inquired repeated NST or visited the clinic. The pattern of reactive NST was easier to interpreted, owing to the exponential accelerated knowledge acquisition of AI in nature[19].
Owing to the delayed interpretation of NST pattern and discrepancy between inter-obstetricians, analysis of prenatal NST pattern with AI system is important[17]. In our preliminary study, among the NST pattern from term pregnancies, higher kappa value for NST pattern recognition between the AI and obstetricians was detected. In addition. excellent agreement of AI system was found in the NST pattern of preterm pregnancies and better than that of the term pregnancies. Unsurprisingly, the kappa score was like that of visual interpretation by obstetricians for term pregnancies. The concordance analysis of non-reactive NST pattern showed a high agreement by AI, which was interpreted as the inter-obstetricians and positive predictive ability. AI model could help obstetricians to assess fetal well-being quickly during antepartum period, not only relieves time constraints for obstetricians, but also enables high quality care for fetus. In this context, high-risk pregnant women require NST twice or more times each day to ensure sufficient observation, which constitutes a major challenge for obstetricians to interpret numerous results. Further, with advancement in computer processing speed, AI displays real-time interpretation for NST pattern to promote revaluation and subsequent intervention. In 2018, AI was tested in cardiotocography analysis, showing that the level of AI was as similar as the obstetricians and detecting the errors[20]. AI has its inherent fallacy because the proposed algorithm settled by the clinicians, leading to wrong interpretation [11]. Further, absence of a fixed pattern of atypical NST pattern might be another hypothesis.
AI system approach yielded great potential use owing to its high sensitivity for non-reactive NST pattern. In accordance with present study, Liu [21] analyzed 3239 FHR pattern by AI and obstetricians and revealed that a moderate agreement (kappa, 0.525) of AI. However, in this report, a higher false-positive rate (0.632) was detected compared to our study. Notably, when aggressive clinical intervention is applied to prevent prenatal compromise, known as treatment paradox, owing to false-positive prediction from AI. AI is unlikely to get over limitations of the FHR pattern itself, such as atypical NST pattern acquisition, visual interpretation is a better-suited approach. Poor accuracy of AI is acceptable in interpretation of atypical NST pattern without missing any non-reactive NST pattern. AI would be most appropriate when studying large atypical NST pattern from hundreds of thousands of patients. One solution to poor interpretation for atypical NST pattern is to develop inherently interpretable models.
Considering the variation of obstetricians and the similar agreement compared to AI, a competitive ability to AI identify NST pattern and potential evaluation of fetal well-being[22]. In addition, a warning system of AI would inform the pregnancies of atypical or non-reactive NST pattern to ask for further care in clinic. Real-time and high accuracy are important points in NST pattern reading and fetal well-being evaluation[13, 23]. Hence, we believed that AI could be used for outpatient care in the home monitoring and adequately improve surveillance of high-risk pregnancies. Despite the retrospectively proven agreement of AI, the benefits of AI in a real would require prospective clinical studies which focus on obstetric outcomes compared with visual interpretation by obstetricians.
The strength of this study is the large data used to investigate the interpretation of AI for prenatal NST pattern, which is in a real-world setting. In addition, the present study enrolled not only pregnant women without pathology, but also pregnancies at high-risk. One limitation of this study was that the high false positive value of AI system for atypical NST pattern was high, might leading to unnecessary intervension. Indeed, the patients would repeat NST if the AI system gave early warning. Another downside of the present study was that the accuracy and agreement of AI system was only tested in antepartum, even though it proved successful, a larger study is further needed to replicate and validate these results. The current study was also limited to pregnant women presenting from 32 weeks to term, future investigation including pregnancies from 28 weeks onward is needed to perform.