This study aimed to determine the effect of segmentation guidance on AI performance in predicting clinical pregnancy using embryo images. AI predictive performance improved when the model was guided by ICM and TE segmentation. Although the overlooking appropriate areas in embryo images has been previously pointed out as an issue of AI, there have been no reported attempts to solve it. To the best of our knowledge, this is the first study to verify improvements in AI predictive performance using ICM and TE segmentation.
Segmentation technology is commonly employed in various medical imaging domains. The utilization of segmentation technology in the computer-aided diagnosis of medical images is increasingly recognized as a valuable and critical aspect of the field of medicine. This approach allows for the effective utilization of large volumes of medical data while mitigating the risk of misdiagnosis resulting from subjective visual observations 22–24. Research in other medical imaging areas has indicated that AI trained on datasets simplified through segmentation has higher diagnostic performance by focusing on the regions of interest 17,25,26. Although the operating mechanism of deep learning algorithms remains elusive, it is widely accepted that the less complex the data, the higher the learning efficiency that AI can achieve under the same conditions 27.
The ICM and TE morphologies have long been used worldwide to evaluate and screen embryos. Since these structures play important roles in determining the fate of cells during embryonic differentiation, many studies have investigated the correlation between their morphology and pregnancy rates 11,13. Several studies have shown that the ICM and TE are also strongly related to the live birth rate, and the ICM is known as a major factor that can predict euploid embryos 14,28. It seems that, despite expecting the AI model to learn and accurately represent the morphology of ICM and TE by training it with segmented images, the results were not as good as anticipated. In particular, when explaining the inferences of the deep learning model using techniques such as Grad-CAM, we noticed that the model often fails to focus on the ICM or TE. However, using the enhanced ICM and TE images proposed in this study enhances the emphasis on the morphology of such embryonic structures for the deep learning model. This approach provides a strong basis for creating an AI system that can predict clinical pregnancy by incorporating relevant clinical domain knowledge.
One of the major strengths of this study is its high-quality dataset. It is known that the key to developing an AI model is to train it with a sufficient volume of good data, and this study used nationally curated data that was well-refined and labeled from embryo images collected from multiple institutions, which was endorsed by a third-party inspector. Since the dataset was created as part of a government-funded project, quality control was adequately performed, and the original dataset is scheduled to be made public in the future, ensuring external reliability.
The additional task of investigating the incorrectly predicted cases was undertaken in this study, particularly in the four categories shown in Table 3. Two laboratory directors closely examined the original and enhanced ICM and TE images. For clear images, both the original model and the enhanced ICM and TE models correctly predicted the actual outcome. For less-clear images, the segmentation model outperformed the original model. For inherently messy images, both models failed to predict pregnancies. Of the 75 messy images, 25 showed irregular contours of the zona pellucida, embedded sperm in the zona, or cytoplasmic darkness, and the AI may have misinterpreted them as fragmentation. The use of well-focused and clear images ensures a fair performance of the AI model. Furthermore, image reconstruction using ICM and TE segmentation was validated and may be helpful for less-clear images to a certain extent. However, for cases where the original model outperformed the segmentation model, no specific pattern was found in Grad-CAM, and potential explanations include variables such as clinical or genetic information that may help overcome the limitations of morphological assessment.
The Hosmer–Lemeshow test was conducted to assess calibration and provided non-significant results with a p-value < 0.001 and a Brier score of 0.241. To address this issue, we applied isotonic regression and achieved a well-calibrated model with a p-value of 0.265 and a Brier score of 0.178. A calibration plot is shown in Fig. 3. However, even after applying isotonic regression, the true positive rate decreased in the range of high predicted probabilities (0.7–0.9). This has been confirmed to be due to the occurrence of incorrect cases; in these cases, the AI model predicted good-quality embryos after being successfully trained on ICM and TE images, but pregnancy did not occur due to various complex factors. It is important to note that calibration can be more difficult for small sample sizes because there may be insufficient information to accurately estimate the mapping between the predicted scores and probabilities. In such cases, it may be necessary to collect additional data or employ alternative methods to enhance calibration. Therefore, we aim to collect a global dataset that includes both domestic and foreign data and develop a model that is more accurate and robust in future research.
To generalize our results, an international study is required, as our study was conducted on the same racial background. In addition, because our dataset consisted of images from multiple institutions, the color and sharpness of the images varied slightly. Although this allows for better performance in terms of validation compared to training in a single institution, it is still difficult to guarantee the same performance when unfamiliar, heterogeneous images are processed. In addition, to automate our method, we must develop an algorithm that automatically segments the €CM and TE areas. In future research, we plan to develop an ICM and TE segmentation model that automatically segments these areas from the original embryo image, creates an enhanced ICM and TE image, and feeds it as input to the AI model.
In conclusion, this study demonstrates that the predictive performance of AI improves when using enhanced ICM and TE images. AI can provide objective information to empower the assessment of embryologists; however, it often analyzes irrelevant parts of images, leading to incorrect results, particularly when the image is out of focus. Our research findings may help generalize the AI model for application to embryo images with various focuses. Further research can serve as the first step towards transparent AI models for embryo assessment.