Patient population and data preparation
The data of patients corresponding to LTs performed between January 2017 and December 2019 at the First Affiliated Hospital, Sun Yat-sen University were evaluated. Only LTs from donation by brain death donors were included in this study. Pediatric LTs, multiple organ transplants, retransplants, split or living donor LTs were all excluded. Of 386 patients eligible for EAD assessment, 1 with unclear data and 3 with intraoperative death were not considered. Among the remaining patients, 31 patients were excluded due to the missing image data. Eventually, a total of 351 patients remained for study analysis. They were divided into two data sets based on the year of LT, i.e., the training set comprised all eligible LT recipients for the year 2017 and 2019, while the testing set contained LT recipients for the year 2018. The flow chart of selection is depicted in Fig 1.
The images of the post-reperfusion liver were routinely captured using smartphones in our transplant center for further analysis before the abdominal closure, just as the instances in Fig 2. The data of donors and recipients were extracted by experienced research assistants from electronic medical record system. This study was performed following the Declaration of Helsinki and was approved by the institutional review board at Sun Yat-sen University. Moreover, the need for written informed consent was waived because of its retrospective and observational nature. No organs from executed prisoners were used.
Definitions
Based on the criteria suggested by Olthoff et al.[1], EAD is identified if one or more of the following abnormalities occur: (1) total bilirubin on postoperative day (POD) 7 >= 171 mg/dL; (2) the international normalized ratio on POD 7 >= 1.6; and (3) the level of serum alanine aminotransferase or aspartate aminotransferase within the first 7 days is > 2000 IU/L. Balance of risk (BAR) score, applied as a predictor for LT patient survival, was calculated by the calculator available at https://www.assessurgery.com/bar-score/bar-score-calculator/.
Regarding the outcomes of patients within 3 months, post-LT hydrothorax was assessed based on ultrasound or thoracentesis results. Whereas hepatic artery thrombosis is the disruption of blood flow to the allograft through the hepatic artery, as confirmed by angiography, Doppler ultrasound, and surgical exploration. Also, pulmonary infection is diagnosed when the pathogens listed by Singh Nina [12] are isolated and detected in the pleural fluid or respiratory secretions (bronchoalveolar lavage or sputum). Chest roentgenograms and arterial oxygenation are collected to evaluate pulmonary edema based on the established methods [13, 14]. Intra-abdominal abscess is characterized by the presence of fluid collection on CT imaging or ultrasonography, coupled with the detection of organisms in the fluid, or with systemic or local signs of infection excluding other sources. Bleeding event is the occurrence of one of the following events: (1) surgical bleeding with the requirement of reoperation; and (2) anatomical bleeding (requiring transfusion and not surgical intervention).
Image data preprocessing
For each liver RGB image, the first step was manual segmentation, which separated the liver tissue from the background. All the liver contours were drawn, and the image space outside the marked hepatic tissue was filled with black color. The second step varied according to the TA approach. The pipelines of two different methods are depicted in Fig 3.
As for the classical machine learning approach, each image was resized to 1000*1000 pixels, then divided evenly into 25 non-overlapping patches (each in the size of 200×200 pixels). Only the patch where hepatic tissue took up at least 90% of image space, was considered valid for further analysis. The minimum number of valid patches that could be obtained from each image was set at 5. To ensure that the number of patches for each patient was similar, 5 valid patches of each patient were randomly selected. Consequently, the dataset for the classical machine learning approach comprised 1,145 patches from the training set and 610 patches from the testing set.
For a deep learning approach, each image was resized to 512*512 pixels. Here data augmentation was applied to balance the number of EAD and non-EAD class. The ImageDataGenerator function in Keras module of Python (version 3.7) was used to double the images of the EAD class in the training set (details shown in Supplementary Table 1).
In addition, sample-wise normalization, a function provided by Keras module, was performed to all images before the extraction of features, regardless of the TA approach.
Classical machine learning model
The uniform rotation-invariant local binary patterns was computed for each RGB channel of the image, and the histogram of local binary patterns was created to acquire a multi-scale and accurate description of the texture. Also, the feature descriptors based on gray level co-occurrence matrices, including contrast, correlation, energy, and homogeneity, were calculated. Intensity-based features, which referred to the image mean and standard deviation for each RGB channel, were computed. The feature descriptors mentioned above were acquired with the help of scikit-image. The technical details are shown in Supplementary Table 2. All the descriptors were concatenated, and consequently, a data set with 195 features for each patch was obtained.
A synthetic minority over-sampling (SMOTE) algorithm was used to balance the number of different groups in the training set. Through SMOTE, samples in the minority group were synthesized by linear interpolation. The data was equalized, and a training set where the patches of EAD and non-EAD patients were equal to 1:1 was obtained. Based on the equalized training set, the supervised machine learning approach, SVM, was used to build the predictive model. The optimal hyperparameters of SVM were determined via grid search, where 1/3 of data in the training set were randomly selected for validation during every cross-validation, and the hyperparameter combination with the highest mean accuracy on the validation data during 20 times of cross-validation was regarded as optimum).
Convolution neural network approach
Inspired by the Google Inception-Net, this study proposed a predictive model based on the CNN architecture, named CNN model 1, to classify EAD and non-EAD patients. The process in the “Image Data Preprocessing” section ensured similar dimensionality of input vector for each image in the training and testing sets. Besides, the global and local features were extracted from images by max-pooling layer and convolutional 2D layers with 1×1 kernel size. Subsequently, these extracted features were convoluted, then concatenated as the input of the global average pooling 2D layer. The final dense layer, followed by a softmax activation function yielded the prediction result. The details of the architecture of CNN model 1 are presented in Supplementary Figure 1.
Model evaluation
The receiver operating characteristic curve was used, and the area under the receiver operating characteristic curve (AUC) was adopted to quantitatively assess the discrimination capability of the proposed models. The model with higher AUC on the testing set was selected to construct a further predictive model with additional inputs of clinical information. Also, confusion matrixes were used to compute the sensitivity and specificity. Since both the sensitivity and specificity reflect the performance of model one-sidedly, F1 score was calculated according to the following formula:
Predictive model combined with clinical information
Therefore, the predictive model based on the combination of RGB image and clinical information was proposed. To reduce the training cost and to avoid the model overfitting, only significant clinical variables in multivariate analysis were included in the model. The architecture of CNN model 2 with clinical variable input was designed to make full use of clinical variables and the features from post-reperfusion liver RGB images (Supplementary Figure 2). The training of the new predictive model was based on the image data set (training set) with data augmentation described in “Image Data Preprocessing” section. Missing clinical variable values were processed with mean value interpolation. The part of architecture transferred from CNN model 1 was untrainable during training.
Score based on prediction probability from CNN
For simplicity, a score named image score was assigned to each liver image according to the prediction probability from CNN model 1: prediction probability of 0 to 0.3 (including 0.3) was marked as 1, 0.3 to 0.5 (including 0.5) was marked as 2, and the probability above 0.5 was marked as 3. Here we compared outcomes of patients with different image scores, including the dose of postoperative infusion (red blood cell, plasm and platelet), postoperative complication (hydrothorax, hepatic artery thrombosis, pulmonary infection, pulmonary edema, intra-abdominal abscess and bleeding event), the time to resume eating, the length of ICU and hospital stay. As a reference, the comparison of outcomes between EAD and non-EAD group were also performed.
Statistical analysis
Quantitative variables were described by mean ± SD or median (IQR), while frequency and percentages were used to describe qualitative variables. The comparisons between different groups were described by the Chi-square test for qualitative variables, and the Student t-test or rank test for quantitative variables. Additionally, multivariate analysis was processed with a stepwise logistic regression in the rule of the Wald method to select clinical variables for CNN model 2. These variables were retained if P < 0.05. The comparison of outcomes among different image scores, or between EAD and non-EAD group, were both on the total data set (training set + testing set) without interpolation. SPSS Statistics for Windows (Version 24.0, IBM Corporation) was used to perform the statistical analyses. P-value < 0.05 was considered statistically significant.