Clinical Study and Image Acquisition
Twenty-two head and neck cancer patients who were scheduled to receive radiation therapy at the German Oncology Center (GOC) in Limassol, Cyprus, participated in this proof-of-concept trial. The trial has received bioethics approval from the Cyprus National Bioethics Committee. Patients with disabilities, expectant women, those who had recently undergone radiation therapy in the same area, and patients with autoimmune diseases were excluded from the study. All study participants were over 16 years of age. Specifically, one patient was within the range 30-40 years of age, two were between 40-50 years old, six patients were in the range of 50-60 years, seven fell between 60-70, three between 70-80 age range, and another three were in the 80-90 age range. The mean patient age was 62.3 years with a standard deviation of 12.5. The participants provided the informed consent form prior to the study. After informed consent, the irradiated side of the neck of the subjects, was imaged with OCT (Fig. 1). Six images were acquired at 1 cm intervals, covering the region from the mandibular angle to the clavicle. Additionally, a photograph of the same area was captured using a digital camera. Imaging was repeated prior to every radiation therapy session, twice per week, until the conclusion of the therapy, resulting in a dataset of 1487 images. During each visit, the patient's ARD grade, at each of the imaging sites, was determined and recorded by a senior oncologist. The imaging was performed with a swept-source OCT system (Santec IVS300), with a center wavelength of 1300 nm, an axial resolution of 12 μm in tissue, and an A-scan rate of 40 kHz. All experimental and analysis procedures were performed in accordance with the relevant guidelines and regulations.
Image Processing and Feature Extraction
An automated algorithm was developed to segment the OCT images of the skin by finding the top surface of each image and subsequently isolating a segment containing the epidermis (Fig. 2 A, between the green lines, and Fig.2 B). The algorithm used automatic thresholding, with Otsu’s method, and morphological processing of the binary image to determine the borders of the tissue. Subsequently, several features were extracted either from individual neighborhoods of the image (Fig. 2B, purple squares) or from strips at various depths (Fig. 2C, yellow rectangles). The details regarding each feature are explained below. The value of each feature at each image location was used to create both a pseudo-color image (where each pixel was assigned a color based on the value of the feature) or a colorized image where the value of the feature was overlaid as color over the intensity image (i.e. the hue in a hue-saturation-value (HSV) image, for example Fig. 2D and Fig. 3).
For each neighborhood in the image, the following features were extracted:
- First order intensity statistics: Extraction of various first-order intensity statistics, such as the mean, standard deviation, variance, skewness, median, kurtosis, minimum, mode and maximum of the intensity were calculated from each neighborhood of each segmented portion of the images. These features provide valuable textural information. For the purposes of feature-based machine learning, the statistics of each feature over the entire image were also calculated resulting in 90 features per image.
- Gray Level Co-occurrence Matrix (GLCM): The Gray Level Co-occurrence Matrix (GLCM), is used to effectively measure perceptual texture image qualities. Four essential characteristics, the correlation, contrast, homogeneity, and energy, can be estimated using this approach. The correlation coefficient gauges how closely two pixels in an image are related to one another. While a low correlation suggests that the two pixels tend to have different values, a high correlation suggests that they are closer together. Contrast quantifies the contrast differences between neighboring pixels. A low contrast implies a small change between neighboring pixel values whereas a high contrast suggests a significant difference. The degree of similarity between adjacent pixels in an image is measured by homogeneity. When homogeneity is high, values of adjacent pixels are comparable. In contrast, when homogeneity is low, values of adjacent pixels are different. Finally, energy quantifies the overall level of uniformity in the neighborhood and indicates a more uniform distribution. Each GLCM feature was extracted at four directions (0, 45, 90 and 135 degrees) and for three different offsets (1, 3, and 5 pixels) from the center of the neighborhood. When the statistics of each were calculated over the entire image, the result was 600 features for feature-based machine learning.
- Fractal Dimension (FD): An effective measure of the complexity and irregularity of image structures is the fractal dimension. It is especially useful in medical imaging due to its capacity of identifying the increased irregularities often associated with disease. The box counting approach is used in this work to calculate the statistics of the FD distribution for of all the neighborhoods of each image19. For feature-based machine learning, the statistics over the entire image resulted in an additional 90 features.
- Novel feature – Group Velocity Dispersion (GVD): Dispersion, an indicator of the wavelength dependency of the index of refraction, has recently been investigated as a useful biomarker of disease. Since changes in tissue dispersion are a result of compositional/biochemical alterations, this metric can be an invaluable complement to the micro-structural information offered by the features above. By examining the speckle patterns in OCT images, the resolution degradation brought about by GVD as a function of depth, can be measured in situ and in vivo21. As with the previous features, the calculation of the statistics over the entire image resulted in 10 additional features.
- Novel feature – Scatterer Size (SS): In addition to the intensity information, which is the source of the micro-structural images, the OCT interferograms contain spectral information that, most often, remains unused. However, according to the Mie theory of light scattering, oscillatory patterns in the spectrum are related to the size of the scatterer. A novel metric, the bandwidth of the correlation of the derivative (COD) of the OCT spectrum, has been formulated to estimate the mean size of the scatterers, in this case the nuclei of the cells, in vivo22. Calculating the statistics of the COD and SS over the entire image resulted in another 20 features for the feature-based machine learning.
Feature-based Classification
Given the large number of features (810 in total) and the fact that many of those attributes are correlated, the processes of feature-based machine learning classification began with feature selection. First, the features were ranked in order of significance using a combination of hypothesis t-test and Maximum Relevance - Minimum Redundancy (MRMR) algorithms. The final number of features used were selected in decreasing order of significance by optimizing the classification accuracy as was the depth of segmentation. The feature vectors were normalized prior to classification. In addition, to address the issue of an imbalanced dataset, Synthetic Minority Oversampling Technique (SMOTE) was used to increase the minority class by 20% in order to prevent overfitting of the majority class. SMOTE is a data augmentation approach which selects the instances closer to the feature space and creates new samples at points between the existing ones. Various classifiers were evaluated to determine the most appropriate for the specific purpose of early ARD classification. The classifiers tested included Linear Discriminant Analysis (LDA), with a pseudolinear discriminant type, Support Vector Machine (SVM), with a linear kernel, k-Nearest Neighbor (k-NN), with 11 nearest neighbors, Naïve-Bayes (NB), and Decision Tree (DT) classifiers. The classifiers were compared in terms of accuracy in a leave-one-patient-out (LOPO) cross-validation scheme to ensure unbiased results.
Classification using multi-feature deep learning with late fusion
A deep learning methodology was also applied to differentiate between normal skin and early stages of ARD. The images were segmented at a 0.375 mm depth since that was empirically shown to provide the most accurate results. The segmented image sections were processed to extract the features from each 21x21 pixel (0.21 x 0.063 mm) neighborhood of the 1487 images. For the 80 most significant features, a pseudo color and an overlay image were created (a total of 237,920 images from the 22 patients) As mentioned above, pseudo-color and HSV images were created for each feature (Fig. 3). These images were rescaled to 227 x 227 x 3 pixels for more efficient deep learning. Data augmentation was implemented by applying a set of augmentation operations to the original images during training, generating augmented versions of the images as they pass through the network. The augmentation techniques applied included rotation as well as translation in the x and y scale for each image. A pre-trained ResNet101 neural network was utilized, with the modification of the last layers to incorporate a fully connected layer for two-class classification, followed by a "Softmax" activation, and finally the classification layer. The Adam optimizer was utilized with a batch size of 128. The learning rate was set to 0.001 and the network was trained for 35 epochs. The dataset of images corresponding to each figure were classified using a separate network. A new feature vector for each image was created by combining the results of the classification of each of the different feature datasets. Finally, a conventional classifier was used to combine the results into a final classification of each image (Fig. 4). Each image was classified using Leave-one-patient-out (LOPO) cross-validation.