Study design
This study is a retrospective investigation, with data collection from our hospital's database spanning from January 2022 to January 2024. The initial purpose of establishing the database was for medical education, which later transitioned into a machine learning database.
The data collection is divided into two parts:1) Kawasaki Disease Group: Clinical symptom images and laboratory examination indices of Kawasaki disease patients were collected during outpatient visits or hospitalization periods when they were initially suspected of having Kawasaki disease. After confirming the diagnosis of Kawasaki disease, these data were included in the Kawasaki disease group.2) Healthy Control Group: Clinical symptom images and laboratory examination indices of healthy children were collected during routine health check-ups at pediatric health clinics. After confirming their health status, these data were included in the healthy control group.
According to the diagnostic guidelines for Kawasaki disease, clinical symptoms are the primary diagnostic criteria. In this study, the palms and conjunctiva were chosen as research subjects due to their high stability and specificity in assisting Kawasaki disease diagnosis, thereby aiding physicians in diagnosing the disease. Another reason is the limited number of images of tongues and typical rash symptoms of Kawasaki disease patients stored in our hospital's database, which cannot adequately support subsequent research. Therefore, only palm and conjunctiva images were selected as research subjects. According to the Kawasaki disease diagnosis and treatment guidelines, laboratory examination indicators are the second diagnostic criteria following clinical symptoms in the diagnostic process of Kawasaki disease. Typical laboratory examination data, such as CRP and ESR, have significant auxiliary diagnostic value. Moreover, multiple studies have suggested that various laboratory examination indicators can assist in Kawasaki disease diagnosis. Therefore, this study comprehensively included all laboratory examination results obtained during the diagnosis and differential diagnosis process of Kawasaki disease in our hospital.
The cohort consisted of 620 children (310 cases of Kawasaki disease children and 310 cases of healthy children for physical examination), with each child's data comprising 2 clinical symptom images (1 conjunctival image and 1 palm image) and 26 laboratory assay indicators. The dataset included a total of 1240 images (620 clinical symptom images from Kawasaki disease children and 620 clinical images from healthy children) and 16120 laboratory assay indicators (8060 from Kawasaki disease children and 8060 from healthy children).
Inclusion criteria for Kawasaki disease group data: 1) Confirmed diagnosis of Kawasaki disease, 2) Kawasaki disease as the primary diagnosis, 3) Complete data on laboratory examination indicators, 4) Symptom onset within a narrow time frame of medical consultation (less than 5 days). Exclusion criteria: 1) Diagnoses such as trauma, congenital heart disease that may affect image quality, 2) Poor quality or blurry images, 3) Diagnoses such as pneumonia, infection that may affect laboratory assay results, 4) Missing laboratory examination indicators.
Inclusion criteria for healthy children group data: 1) Absence of significant abnormalities in pediatric health examination, 2) Normal growth and development, 3) Complete data on laboratory examination indicators. Exclusion criteria: 1) History of genetic metabolic diseases or conditions affecting facial appearance, 2) History of hand injuries or conditions affecting image quality, 3) Missing laboratory examination indicators.
Group characteristic matching involves matching the gender and age of the Kawasaki disease
children group with those in an existing healthy children database.
Furthermore, this study collected an additional 50 children from a peer hospital to serve as an external validation group. This group was utilized to validate the stability of the multimodal model and conduct a double-blind controlled trial involving human-machine interactions.
The Ethics Review Committee of Qingdao Women's and Children's Hospital approved this study, confirming that all methods adhered to relevant guidelines and laws. Prior to preliminary data collection, guardians of children with Kawasaki disease and healthy children signed informed consent for data use. All data are rigorously protected, and palm images containing palm prints and palm veins (18, 19), as well as conjunctival data (20, 21) containing iris images, are non-deidentifiable personal privacy and must not be disclosed. Figure 1 depicts the flowchart of this study.
Study patients, examination, and image acquisition
Enhancing conjunctival and palm image data through augmentation can improve the generalization of multimodal models. This study primarily involves augmentation techniques, including basic image transformations, color and brightness adjustments, and the addition of blur and noise. Basic image transformations include horizontal flipping, vertical flipping, random rotation, and scaling operations. Color and brightness adjustments simulate variations in image appearance under different lighting conditions by randomly altering color channel values and adjusting brightness and contrast, thereby enabling the model to adapt to various real-world scenarios. Blur and noise adjustments utilize Gaussian blur to simulate focusing issues during image capture, while random noise simulates noise from image sensors. All images are downscaled to 512×512 JPG images through downsampling conversion.
Blood laboratory tests encompass numerous parameters. To mitigate interference from irrelevant variables, statistical screening was conducted, resulting in the inclusion of 26 indicators for subsequent model training. These selected indicators comprise hematological analyses: neutrophil count, platelet count, lymphocyte count, neutrophil percentage, neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, white blood cell count, and hemoglobin. Additionally, biochemical tests include lactate dehydrogenase, aspartate aminotransferase, total bilirubin, alanine aminotransferase, globulin, albumin, glutamate dehydrogenase, potassium ion, sodium ion, and C-reactive protein.
The dataset was randomly sampled and divided into training set (N = 496) and validation set (N = 124) in an 8:2 ratio.
Model development
The essence of the multimodal model in this study lies in the late fusion of residual neural networks and one-dimensional convolutional neural networks, while simultaneously addressing the classification of images and laboratory indicators.
Model architecture
The multimodal model primarily consists of three components: a ResNet image processing module with transfer learning capabilities, a one-dimensional CNN processing module for laboratory assay indicators, and a late fusion fully connected layer. The ResNet module addresses the optimization challenges in training deep neural networks by introducing the concept of residual blocks. The key aspect of this model is the late fusion of features from images and one-dimensional data at the fully connected layer, which occurs during the decision-making phase of the model. Features from each modality are initially processed independently in their respective neural networks and then merged at the model's fully connected layer. This strategy's advantage lies in maintaining the independence of each modality and potentially reducing the risk of overfitting.
Additionally, to minimize the training time cost of the model, we adopted the concept of transfer learning. We chose a ResNet model pre-trained on the ImageNet dataset (22) as the baseline model for initial training and performed training of the multimodal model by locally unfreezing. Training was conducted using the Adam optimizer and grid parameters. Given that this model encompasses a 50-layer neural network and simultaneously processes both image and laboratory indicator data, it is named the ResNet50-clinical model.
To assess the performance of the multimodal model, five single-modal residual convolutional networks (ResNet18, ResNet34, ResNet50, ResNet101, ResNet152) and separate traditional machine learning models (Support Vector Machines (SVM), Random Forest, Decision Tree, XGBoost, and LightGBM) were designed for model training and evaluation.
Model optimization
The hyperparameters were set to a Batch size of 32 and a learning rate of 0.000001. The random seed was set to 1024, and the parameters yielding the minimum loss function value on the validation dataset within 100 epochs were identified as the optimal model for performance.
The model training, building, and validation were performed using PyTorch (2.2.0) (23) on a computer equipped with an AMD EPYC 7532 processor (32 cores 64 threads @2.4-3.3GHz) and 4 x RTX 4090 cards (24GB GDDR6X VRAM, 16384 CUDA cores).
Model validation
This study incorporates fusion of gradient-weighted class activation mapping (GradCAM)
(24–26) into the multimodal model for attention analysis, enhancing the interpretability of the model and the confidence of physicians. Global average pooling is applied to the last convolutional layer of the ResNet module to generate classification activation maps. The training weights for each output of the global average pooling layer indicate the importance of each feature map from the last convolutional layer. These weights are then applied to the corresponding feature maps to generate significance maps, which are superimposed on palm and conjunctival images to achieve visualization of category differentiation in prioritizing regions of the multimodal model.
To assess the generalization performance of the model, we conducted a human-machine double-blind controlled experiment, utilizing an external validation group. Following a thorough diagnostic process conducted by a professional diagnostic and treatment team, the external validation group's images and laboratory data were provided to the model and senior clinical physicians with advanced professional titles separately for evaluation (double-blind trial). Children in the external validation group were diagnosed with Kawasaki disease after undergoing a standard diagnostic process, and then matched with the same age and gender from the healthy children database. Subsequently, 50 Kawasaki disease children and 50 matched healthy children from the external validation group formed a new external validation group. Palm and conjunctival images, along with corresponding laboratory examination data, were independently evaluated by the multimodal model and three pediatricians with advanced professional titles (unknown diagnosis results), who did not participate in the diagnostic and treatment process. They could only access the laboratory examination indicators and palm-conjunctival symptom images of the children and made judgments based on these limited data to ensure the effectiveness of the human-machine controlled trial. Although their inability to access other auxiliary diagnostic data may lead to a noticeable decrease in the accuracy of their diagnoses compared to their routine practices, this does not affect the validation of the human-machine double-blind trial, demonstrating the reliability of the multimodal model.
Statistical analysis
In terms of performance metrics, the proposed model was compared with existing methods, including accuracy, sensitivity, specificity, area under the curve (AUC), and F1 score.