Predicting Early Allograft Dysfunction after Liver Transplantation from Post-Reperfusion Donor Liver Image

doi:10.21203/rs.3.rs-223716/v1

Download PDF

Research

Predicting Early Allograft Dysfunction after Liver Transplantation from Post-Reperfusion Donor Liver Image

https://doi.org/10.21203/rs.3.rs-223716/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

BACKGROUND:

To explore the relationship between early allograft dysfunction (EAD) and post-reperfusion liver appearance, and to develop image-based models which predict EAD and short-term mortality.

METHODS:

A total of 351 recipients of liver transplant were enrolled and divided into training set and testing set. Liver images of post-reperfusion donors and clinical information were collected. All the images were preprocessed. Support vector machines (SVM) and convolution neural network (CNN) models based on the texture analysis of post-reperfusion liver RGB images were constructed to predict EAD. Then, the model with a better performance was selected to construct further predictive models with additional inputs of clinical information. In addition, a score, namely image score, was assigned to each liver image based on the prediction probability from the CNN model. Further, the comparisons of outcomes among different image scores were performed.

RESULTS:

Out of the 351 enrolled recipients, 229 were in the training set while 122 in the testing set. CNN model achieved an AUC of 0.709 in testing set, outperforming the SVM model which has an AUC of 0.661. Further predictive model was based on the framework of the CNN model, where an AUC of 0.727 was obtained. Moreover, the lager image score was found to be relative to more postoperative infusion, more postoperative complication, the longer length of ICU and hospital stay.

CONCLUSION:

The post-reperfusion appearance of donor liver was associated with the occurrence of EAD. Moreover, it was feasible to predict EAD and patient outcomes through the texture analysis of post-reperfusion liver RGB images.

Translational Medicine

postoperative day 90 death

texture analysis

support vector machines

BAR score

smartphone image

Currently, liver transplantation (LT) is the optimal approach for the treatment of patients diagnosed with end-stage liver diseases. Being a type of primary graft dysfunction post LT procedure, early allograft dysfunction (EAD) represents a condition where the graft shows varying degrees of liver damage but still exhibit sufficient functions to support life. Occurrence of EAD has been attributed to donor characteristics, recipient aspects, and intraoperative risk factors. For example, the utilization of marginal donor liver to expand the donor pool and alleviate the shortage of donated organs has been regarded as a predictor of EAD. Meanwhile, reports claimed that EAD is associated with increased susceptibility to sepsis, prolonged stay in the intensive care unit (ICU), as well as increased morbidity and mortality of recipient ^[1–3]. Therefore, the quick postoperative function assessment of the allograft, and the accurate prediction of EAD, will benefit LT in patients, allowing advanced preparation in dealing with adverse events.

In this context, research on the development of a reliable, practical, and cost-effective method to help transplant doctors in distinguishing high-risk patients from low-risk patients with EAD has matured. For instance, a method that combines point shear wave elastography and sonographic grading system, which achieved an area under the receiving operating characteristic curve of 0.935 has been proposed ^[4]. Moreover, a predictive model based on donor information only, has been constructed with a concordance index of 0.622 ^[5]. However, despite its strength and effect on inferring diagnostic and prognostic information, no attempts have been made to exploit texture analysis (TA) of liver image in the prediction of EAD.

Notably, TA has transformed the subjective visual evaluation of liver texture into a quantitative and objective method ^[6]. Sara et al.^[7] performed TA of liver RGB image using machine learning algorithms, and eventually developed an artificial intelligence for evaluation of graft hepatic steatosis ^[6]. With the ubiquity of smartphones, there is a colossal utilization of TA of RGB images for diagnosis of diseases or classification of tissues in other medical fields, including the diagnosis of skin cancer ^[8]. Notably, RGB optical images acquired with smartphones in the operating room, are important and available information carriers during LT without any additional being equipment required. Herein, we speculated that liver appearance was associated with EAD, this was inspired by the realization that the post-reperfusion appearance of donor liver might partly reflect allograft quality and ischemia-reperfusion. This study focused on exploring the relationship between EAD and post-reperfusion liver appearance using TA. Further, we developed image-based models that predict EAD and short-term mortality.

So far, there exist different strategies for texture analysis, whose primary function is to measure and describe the difference in the pixel levels of different images. Conventional methods including gray level co-occurrence matrices and the histogram of local binary patterns, combined with machine learning approaches [e.g. support vector machines (SVM), random forests] showed an encouraging performance on the image recognition of lesion ^{[9, 10]}. Since 2012, deep learning approaches have demonstrated sustained improvements in medical image recognition, whose performance has currently surpassed the traditional methods ^[11]. Deep learning methods, particularly convolution neural network (CNN), extract automatically features from images, deliver them to different layers, and improve prediction results through backpropagation. Also, this paper compares the performance of different approaches for texture analysis and selects a preferred method for further construction of a predictive model.

Patient population and data preparation

The data of patients corresponding to LTs performed between January 2017 and December 2019 at the First Affiliated Hospital, Sun Yat-sen University were evaluated. Only LTs from donation by brain death donors were included in this study. Pediatric LTs, multiple organ transplants, retransplants, split or living donor LTs were all excluded. Of 386 patients eligible for EAD assessment, 1 with unclear data and 3 with intraoperative death were not considered. Among the remaining patients, 31 patients were excluded due to the missing image data. Eventually, a total of 351 patients remained for study analysis. They were divided into two data sets based on the year of LT, i.e., the training set comprised all eligible LT recipients for the year 2017 and 2019, while the testing set contained LT recipients for the year 2018. The flow chart of selection is depicted in Fig 1.

The images of the post-reperfusion liver were routinely captured using smartphones in our transplant center for further analysis before the abdominal closure, just as the instances in Fig 2. The data of donors and recipients were extracted by experienced research assistants from electronic medical record system. This study was performed following the Declaration of Helsinki and was approved by the institutional review board at Sun Yat-sen University. Moreover, the need for written informed consent was waived because of its retrospective and observational nature. No organs from executed prisoners were used.

Definitions

Based on the criteria suggested by Olthoff et al.^[¹^], EAD is identified if one or more of the following abnormalities occur: (1) total bilirubin on postoperative day (POD) 7 >= 171 mg/dL; (2) the international normalized ratio on POD 7 >= 1.6; and (3) the level of serum alanine aminotransferase or aspartate aminotransferase within the first 7 days is > 2000 IU/L. Balance of risk (BAR) score, applied as a predictor for LT patient survival, was calculated by the calculator available at https://www.assessurgery.com/bar-score/bar-score-calculator/.

Regarding the outcomes of patients within 3 months, post-LT hydrothorax was assessed based on ultrasound or thoracentesis results. Whereas hepatic artery thrombosis is the disruption of blood flow to the allograft through the hepatic artery, as confirmed by angiography, Doppler ultrasound, and surgical exploration. Also, pulmonary infection is diagnosed when the pathogens listed by Singh Nina ^[¹²^] are isolated and detected in the pleural fluid or respiratory secretions (bronchoalveolar lavage or sputum). Chest roentgenograms and arterial oxygenation are collected to evaluate pulmonary edema based on the established methods ^[¹³^,¹⁴^]. Intra-abdominal abscess is characterized by the presence of fluid collection on CT imaging or ultrasonography, coupled with the detection of organisms in the fluid, or with systemic or local signs of infection excluding other sources. Bleeding event is the occurrence of one of the following events: (1) surgical bleeding with the requirement of reoperation; and (2) anatomical bleeding (requiring transfusion and not surgical intervention).

Image data preprocessing

For each liver RGB image, the first step was manual segmentation, which separated the liver tissue from the background. All the liver contours were drawn, and the image space outside the marked hepatic tissue was filled with black color. The second step varied according to the TA approach. The pipelines of two different methods are depicted in Fig 3.

As for the classical machine learning approach, each image was resized to 1000*1000 pixels, then divided evenly into 25 non-overlapping patches (each in the size of 200×200 pixels). Only the patch where hepatic tissue took up at least 90% of image space, was considered valid for further analysis. The minimum number of valid patches that could be obtained from each image was set at 5. To ensure that the number of patches for each patient was similar, 5 valid patches of each patient were randomly selected. Consequently, the dataset for the classical machine learning approach comprised 1,145 patches from the training set and 610 patches from the testing set.

For a deep learning approach, each image was resized to 512*512 pixels. Here data augmentation was applied to balance the number of EAD and non-EAD class. The ImageDataGenerator function in Keras module of Python (version 3.7) was used to double the images of the EAD class in the training set (details shown in Supplementary Table 1).

In addition, sample-wise normalization, a function provided by Keras module, was performed to all images before the extraction of features, regardless of the TA approach.

Classical machine learning model

The uniform rotation-invariant local binary patterns was computed for each RGB channel of the image, and the histogram of local binary patterns was created to acquire a multi-scale and accurate description of the texture. Also, the feature descriptors based on gray level co-occurrence matrices, including contrast, correlation, energy, and homogeneity, were calculated. Intensity-based features, which referred to the image mean and standard deviation for each RGB channel, were computed. The feature descriptors mentioned above were acquired with the help of scikit-image. The technical details are shown in Supplementary Table 2. All the descriptors were concatenated, and consequently, a data set with 195 features for each patch was obtained.

A synthetic minority over-sampling (SMOTE) algorithm was used to balance the number of different groups in the training set. Through SMOTE, samples in the minority group were synthesized by linear interpolation. The data was equalized, and a training set where the patches of EAD and non-EAD patients were equal to 1:1 was obtained. Based on the equalized training set, the supervised machine learning approach, SVM, was used to build the predictive model. The optimal hyperparameters of SVM were determined via grid search, where 1/3 of data in the training set were randomly selected for validation during every cross-validation, and the hyperparameter combination with the highest mean accuracy on the validation data during 20 times of cross-validation was regarded as optimum).

Convolution neural network approach

Inspired by the Google Inception-Net, this study proposed a predictive model based on the CNN architecture, named CNN model 1, to classify EAD and non-EAD patients. The process in the “Image Data Preprocessing” section ensured similar dimensionality of input vector for each image in the training and testing sets. Besides, the global and local features were extracted from images by max-pooling layer and convolutional 2D layers with 1×1 kernel size. Subsequently, these extracted features were convoluted, then concatenated as the input of the global average pooling 2D layer. The final dense layer, followed by a softmax activation function yielded the prediction result. The details of the architecture of CNN model 1 are presented in Supplementary Figure 1.

Model evaluation

The receiver operating characteristic curve was used, and the area under the receiver operating characteristic curve (AUC) was adopted to quantitatively assess the discrimination capability of the proposed models. The model with higher AUC on the testing set was selected to construct a further predictive model with additional inputs of clinical information. Also, confusion matrixes were used to compute the sensitivity and specificity. Since both the sensitivity and specificity reflect the performance of model one-sidedly, F1 score was calculated according to the following formula:

Predictive model combined with clinical information

Therefore, the predictive model based on the combination of RGB image and clinical information was proposed. To reduce the training cost and to avoid the model overfitting, only significant clinical variables in multivariate analysis were included in the model. The architecture of CNN model 2 with clinical variable input was designed to make full use of clinical variables and the features from post-reperfusion liver RGB images (Supplementary Figure 2). The training of the new predictive model was based on the image data set (training set) with data augmentation described in “Image Data Preprocessing” section. Missing clinical variable values were processed with mean value interpolation. The part of architecture transferred from CNN model 1 was untrainable during training.

Score based on prediction probability from CNN

For simplicity, a score named image score was assigned to each liver image according to the prediction probability from CNN model 1: prediction probability of 0 to 0.3 (including 0.3) was marked as 1, 0.3 to 0.5 (including 0.5) was marked as 2, and the probability above 0.5 was marked as 3. Here we compared outcomes of patients with different image scores, including the dose of postoperative infusion (red blood cell, plasm and platelet), postoperative complication (hydrothorax, hepatic artery thrombosis, pulmonary infection, pulmonary edema, intra-abdominal abscess and bleeding event), the time to resume eating, the length of ICU and hospital stay. As a reference, the comparison of outcomes between EAD and non-EAD group were also performed.

Statistical analysis

Quantitative variables were described by mean ± SD or median (IQR), while frequency and percentages were used to describe qualitative variables. The comparisons between different groups were described by the Chi-square test for qualitative variables, and the Student t-test or rank test for quantitative variables. Additionally, multivariate analysis was processed with a stepwise logistic regression in the rule of the Wald method to select clinical variables for CNN model 2. These variables were retained if P < 0.05. The comparison of outcomes among different image scores, or between EAD and non-EAD group, were both on the total data set (training set + testing set) without interpolation. SPSS Statistics for Windows (Version 24.0, IBM Corporation) was used to perform the statistical analyses. P-value < 0.05 was considered statistically significant.

Patient characteristics

A total of 351 consecutive patients [257 males, 94 females; mean age = 51.2 years ± 10.8 (SD)] were enrolled in the study, where 108 patients suffered from EAD, and 59 patients died within POD 90. Out of the 351 patients, 229 patients were in the training set and 122 in the testing set. The number of EAD patients in the training and testing sets was 67 (29.3%) and 41 (33.6%) respectively, on the other hand, the number in terms of POD 90 death was 37 (16.2%) and 22 (18.0%) respectively. Supplementary Table 3 shows the comparison of the donor and recipient information, operative characteristics, and postoperative outcomes between the training and testing sets. The table reveals the difference regarding the antibody status of the donor, indication for transplant, preoperative blood test, anhepatic time, etc. Moreover, Supplementary Table 4 reveals the difference between the EAD group and non-EAD group in the terms of the age, gender, and BMI of the donor, as well as the surgery time, cold ischemia time, POD 90 death, etc.

The performance of predictive model for EAD

Fig. 4 shows the performance of the SVM predictive model on the training set and testing set with an AUC of 0.670 and 0.661 respectively. Calculated from the confusion matrix in Fig. 5, the sensitivity, specificity, and F1 score of 67%, 42%, and 51% respectively were obtained by the SVM model on the testing set, whereas 64%, 60%, and 62% on the training set were obtained respectively. CNN model 1 achieved an AUC, sensitivity, specificity, and F1 score of 0.709, 49%, 50%, and 49% respectively on the testing set. For the training set, the values were 0.710, 56%, 65%, and 60% respectively. Due to its higher AUC, CNN model 1 was selected to construct the further predictive model, i.e., CNN model 2. Only 3 clinical variables, i.e., donor age, surgery time, and cold ischemia time were included in the CNN model 2. Further, CNN model 2 yielded an AUC of 0.727, sensitivity of 54%, specificity of 50% and F1 score of 52% on the testing set, while an AUC of 0.78, sensitivity of 66%, a specificity of 66% and F1 score of 66% was obtained on the training set.

The relationship between image score and outcomes

The distribution of image score was as follows: patients with 1 image score made up 35.0% of the total, those with 2 image scores made up 30.8%, while those with 3 image scores made up 34.2%. The instances of donor liver images with different image scores are presented in Supplementary Figure 3. The differences in patient outcomes among image scores are revealed in Fig. 6, where the dose of postoperative infusion (red blood cell, plasm and platelet), postoperative complication (bleeding event, hydrothorax, pulmonary infection), the length of ICU stay, and hospital stay were found to be different. As a reference, the differences between EAD and non-EAD groups are shown in Supplementary Figure 4.

Texture analysis is one of the research fields that has been profoundly impacted by deep learning (especially CNN). This therefore might improve the decisions made by clinical doctors as well as advance the diagnosis and treatment of various diseases. Arguably, this is a maiden effort to explore the TA of post-reperfusion appearance of donor liver in predicting EAD and short-term survival in patients, an attempt which achieved a preliminary success. Although it is yet to reach the practical application stage, this preliminary work provides an additional insight into EAD prediction. Once completed and applied in clinical practice, such novel predictive models will help transplant doctors in predicting the possibility of EAD occurrence and adverse outcomes immediately LT surgery is completed.

In this study, we extracted features for prediction from post-reperfusion liver RGB images. These images were captured using smartphones camera, gadgets which were easy to use and widely available. Nonetheless, the quality of liver RGB images varied, and this was attributed to the difference in illumination conditions, phone distance, and the smartphone type. Thus, to partly control the effects of these differences, a normalization of each image was performed. In addition, the training of prediction models was frequently disrupted by the class-imbalance problem, a common phenomenon in modeling, particularly in the field of multiclass modeling. Reports claim that the class-imbalance problem limits the practicality of the machine-learning model since it causes the model to predict the majority class and ignore the minority ^[¹⁵^]. Therefore, to alleviate the negative impact of class imbalance, SMOTE and data augmentation was applied to classical machine learning and convolution neural network approach respectively.

As a result, CNN model 1 outperformed the SVM model, i.e., the superiority and effectiveness of deep learning approach for texture analysis of medical images were again proven. The AUC of CNN model 1 in either the training set or testing set (0.710 and 0.709 respectively), indicated the relationship between EAD and post-reperfusion donor liver appearance, thereby confirming the feasibility of predicting EAD from post-reperfusion liver RGB images. Based on the images and clinical information, CNN model 2 achieved an AUC of 0.727 (in testing set), illustrating a relatively excellent discrimination. Although the sensitivity (54%) and specificity (50%) were unideal, there was a significant potential for improvement as the expansion and standardization of the training dataset.

To furtherly explore the feasibility of using post-reperfusion donor liver appearance in predicting LT patient outcomes (postoperative complication, stay time in the ICU, etc.), we built a scoring system based on EAD prediction probability from CNN model 1. This procedure allowed us to avoid cumbersome multiple modeling, and greatly simplify the process. The results (Fig 6) found the differences in patient outcomes (the dose of postoperative infusion, postoperative complication, the ICU length of stay and duration of hospital stay) among image scores, which reflected a promising prospect of applying TA of post-reperfusion liver image to outcomes prediction.

Our study had worth mentioning limitations. First, the study was single-centered, and the models were not validated in other centers, this might limit the generalization of prediction models. Secondly, the sample size was small (351), therefore, limiting the improvement of the model performance. Thirdly, the liver images were not captured based on the uniform standard, hence causing heterogeneity in the image quality, and consequently causing adverse impacts on the model performance. Lastly, the nature of the proposed SVM model was to predict whether the patch came from patient with EAD, however, different prediction results might be obtained from the same patient. Although research used the SVM-SIL method to resolve this problem ^[⁷^], this study did not utilize this method. Despite these limitations, we have confidence that our research has merits considering that it revealed the relationship between post-reperfusion appearance of donor liver and EAD, as well as confirmed the feasibility of applying post-reperfusion liver RGB images for EAD and outcome prediction.

AUC --- area under the receiver operating characteristic curve

BAR --- Balance of risk

CNN --- convolution neural network

EAD --- early allograft dysfunction

ICU --- intensive care unit

LT --- liver transplantation

POD --- postoperative day

SMOTE --- synthetic minority over-sampling

SVM --- support vector machines

TA --- texture analysis

Ethics approval and consent to participate

This study was performed following the Declaration of Helsinki and was approved by the institutional review board at Sun Yat-sen University. The need for written informed consent was waived because of its retrospective and observational nature.

Consent for publication

Not applicable.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was funded by the Guangzhou Science and Technology Planning Project (201802020027) provided by Guangzhou Science and Technology Innovation Committee.

Authors' contributions

Maodong Ye and Fangchong Li designed the model and the computational framework. Yi Jie and Huadi Chen collected and analyzed the data. Weijie Su and Dehua Chen carried out the implementation. Weixin Luo performed the calculations. Maodong Ye, Fangchong Li and Xiaogang Li wrote the manuscript with input from all authors. Dongping Wang were in charge of overall direction and planning.

Acknowledgments

Not applicable

Olthoff, K.M., et al., Validation of a current definition of early allograft dysfunction in liver transplant recipients and analysis of risk factors. Liver Transpl, 2010. 16(8): p. 943-9.
Croome, K.P., R. Hernandez-Alejandro, and N. Chandok, Early allograft dysfunction is associated with excess resource utilization after liver transplantation. Transplant Proc, 2013. 45(1): p. 259-64.
Croome, K.P., et al., Evaluation of the updated definition of early allograft dysfunction in donation after brain death and donation after cardiac death liver allografts. Hepatobiliary Pancreat Dis Int, 2012. 11(4): p. 372-6.
Liu, W.Y., et al., Combination of liver graft sonographic grading and point shear wave elastography to reduce early allograft dysfunction after liver transplantation. Eur Radiol, 2020.
Hoyer, D.P., et al., Donor information based prediction of early allograft dysfunction and outcome in liver transplantation. Liver Int, 2015. 35(1): p. 156-63.
Cesaretti, M., et al., Use of artificial intelligence as innovative method for liver graft macrosteatosis assessment. Liver Transpl, 2020.
Moccia, S., et al., Computer-assisted liver graft steatosis assessment via learning-based texture analysis. Int J Comput Assist Radiol Surg, 2018. 13(9): p. 1357-1367.
Esteva, A., et al., Dermatologist-level classification of skin cancer with deep neural networks. Nature, 2017. 542(7639): p. 115-118.
Misawa, M., et al., Accuracy of computer-aided diagnosis based on narrow-band imaging endocytoscopy for diagnosing colorectal lesions: comparison with experts. Int J Comput Assist Radiol Surg, 2017. 12(5): p. 757-766.
Moccia, S., et al., Uncertainty-Aware Organ Classification for Surgical Data Science Applications in Laparoscopy. IEEE Trans Biomed Eng, 2018. 65(11): p. 2649-2659.
Voulodimos, A., et al., Deep Learning for Computer Vision: A Brief Review. Comput Intell Neurosci, 2018. 2018: p. 7068349.
Singh, N., et al., Pulmonary infections in liver transplant recipients receiving tacrolimus. Changing pattern of microbial etiologies. Transplantation, 1996. 61(3): p. 396-401.
Thomason, J.W., et al., Appraising pulmonary edema using supine chest roentgenograms in ventilated patients. Am J Respir Crit Care Med, 1998. 157(5 Pt 1): p. 1600-8.
Bernard, G.R., et al., Report of the American-European consensus conference on ARDS: definitions, mechanisms, relevant outcomes and clinical trial coordination. The Consensus Committee. Intensive Care Med, 1994. 20(3): p. 225-32.
Kim, K.H. and S.Y. Sohn, Hybrid neural network with cost-sensitive support vector machine for class-imbalanced multimodal data. Neural Netw, 2020. 130: p. 176-184.

Download PDF

Version 1

posted

You are reading this latest preprint version

Predicting Early Allograft Dysfunction after Liver Transplantation from Post-Reperfusion Donor Liver Image

Status:

Version 1

Abstract

BACKGROUND:

METHODS:

RESULTS:

CONCLUSION:

Figures

Background

Materials And Methods

Results

Discussion

Abbreviations

Declarations

References

Supplementary Files

Status:

Version 1