Study population
In this study, we retrospectively enrolled consecutive patients who underwent transthoracic echocardiogram (TTE) from December 2019 to December 2020. The Institutional Review Board at Yonsei University Severance Cardiovascular Hospital approved this study protocol. Inclusion criteria consisted of having had all echocardiography visually correspond to standard views and we excluded patients with diagnosis of heart failure, coronary artery disease, valvular heart disease, known pregnant state, uninterpretable quality of echocardiographic images or images that were acquired by non-standardized scan angles, and those had abnormal echocardiograms (Table.1). All patients were scanned in the left lateral position using grayscale second-harmonic 2D imaging techniques, with the adjustment of image contrast, frequency, depth, and sector size for adequate frame rate and optimal LV border visualization. All patients received a complete quantification report, which was validated by cardiologists. Echocardiographic images were acquired by standard ultrasound equipment including Vivid 9 (GE Healthcare, Horton, Norway; n = 259), EPIQ 7C (Phillips Healthcare, Andover, MA, USA; n = 170), Acuson SC2000 (Siemens, Mountain View, CA, USA; n = 42), and Artida (Canon Medical Systems, Tokyo, Japan; n = 29).
Ground-truth generation
Anonymized echocardiographic digital images were analyzed and annotated in a core laboratory, where experienced sonographers (five expert sonographers who all had more than 5 years of echocardiographic experience) manually contoured cardiac structures according to the recommendations of the American Society of Echocardiography [10]. For each patient, we selected a set of B-mode images including apical two chamber (A2C) view, apical four chamber (A4C) view and parasternal short axis (PSAX) view at papillary muscle (PM) level which included at least one cardiac cycle. In a series of multi-video frames, manual annotation at end-diastole (ED) and end-systole (ES) for four chambers were performed using a commercial annotation tool (OsiriX, Pixmeo, Switzerland). Two chambers including left ventricle (LV) and left atrium (LA) were delineated when available in A2C, A4C, and PSAX views, respectively. In addition, we further delineated four chamber structures including right ventricle (RV) and right atrium (RA) in A4C view (Figure 1). During LV endocardial wall segmentation, trabeculations and papillary muscles were excluded in PSAX view at PM level.
Inter- and intra-observer reproducibility analysis
To validate our manually created ground truth annotation, we performed both inter- and intra- observer variability analysis. Two sonographers, who both had more than 5 years of echocardiography experience with more than 5,000 echocardiographic examinations, were chosen and the variability of clinical indices were assessed on a randomly selected echocardiographic images of A2C, A4C, and PSAX views in 100 patients.
One sonographer manually annotated LV, LA, RV, and RA contours on the three views at end diastole (ED) and end systole (ES) with an interval of one month with a randomly shuffled analysis preventing the observer from being influenced by previous measurements (intra-observer variability). The other sonographer annotated the same groups without being influenced by the other (inter-observer variability).
Dataset splitting
A total of 500 echocardiograms were allocated for developing automated deep learning methods. Each of the echocardiograms consisted of consecutive frames with dozens of still images. All information was removed from echocardiograms that could identify individual patients. Echocardiographic images were extracted from anonymized DICOM files, and unorganized videos with different views were grouped according to their views. The entire dataset containing 500 patients was divided into training (80%, n= 400), validation (10%, n= 50), and test set (10%, n=50) for deep learning methods. The 5-fold cross validation was employed to analyze the generalization performance of ML methods. The entire dataset containing 500 patients was divided into five groups. In training stages, four subsets were used for training and validation of the network. In test stages, the remainder was employed to evaluate the ML model.
Deep Learning-Based Algorithms
To automatically segment cardiac structures in echocardiography, we employed three deep learning models based on U-net which were used for biomedical image segmentation and have demonstrated high performance on segmentation of organs [11]. The U-net consists of a fully convolutional encoder path, called backbone with a symmetric expanding decoder path for segmentation [11]. We constructed three deep learning models on the basis of encoder-decoder architecture of U-net. In our experiments, we deployed the backbone of U-net architecture with residual and dense blocks, which shows robust performance in the image classification network with a U-net decoder path [12, 13]. The flow chart of deep learning methods is shown in Figure 2.
Training strategy
Given input images and corresponding annotated masks containing different categories with their views, data augmentation was performed with up-down, left-right flips, and rotation. For training deep learning models, a pixel-wise cross-entropy loss function was employed to minimize inaccuracy of prediction. An Adam optimizer with learning rate of 0.0001 was adopted to optimize network parameter. We trained our models from scratch without using any pretrained weight for initialization. We randomly shuffled training dataset and trained deep network for 200 epochs with a mini-batch size of 5. All input cases were resized to 512 × 512 due to GPU memory limitation and then the image intensity values were normalized into the range of [0,1].
Performance analysis of deep-learning models
Dice similarity coefficient (DSC) and Intersection over union (IOU), and clinical indices such as volume, mass, and ejection fraction (EF) were included to compare the performance of deep learning methods. The DSC and IOU are both quantifies the pixel-wise degree of similarity between the model-predicted segmentation mask and the ground truth, and ranges from 0 (no similarity) to 1 (identical). We employed segmentation results of the deep learning model to calculate clinical index to compare chamber structure quantification and ejection fraction based on standard guidelines. Left and right volume were calculated by the area-length method derived by the long axis and area of A2C and A4C [10]. LVM was calculated as the left ventricular myocardial volume derived by the delineation of its endocardial and epicardial borders and multiplied with the specific gravity of myocardial tissue (assuming a tissue density of 1.05 g/ml). These annotations, which were established and verified by board-certified cardiologists, were used as the ‘ground truth’ for the deep-learning model.
Statistical analysis
Continuous and normally distributed variables were represented as mean ± SD and median with interquartile range (IQR) for non-normally distributed variables. Comparison between ground truth and prediction results were assessed by the paired t test and Pearson correlation coefficient using two-sided p values. A p value of <0.05 was considered significant. Bland–Altman plots with 95 % confidence intervals for correlation were calculated. Inter-observer and intra-observer reproducibility were assessed by the intraclass correlation coefficients (ICC) for absolute agreement of single measures between two observers. All statistical analyses were performed using commercially available statistics software (MedCalc, version 18.9, MedCalc software Inc., Mariakerke, Belgium).