Patients
We collected hepatoxylin-eosine stained pathology slides from the pathology reports of 178 patients that underwent lung or bronchus biopsies at Gyeongsang National University Hospital, Jinju, Korea, in 2012 and the pathology slides of 12 patients diagnosed with large cell neuroendocrine carcinoma in their biopsy at Gyeongsang National University Hospital from 2012 to 2018. Out of the patients the pathology slides were taken from, 18, 19, and 18 patients were respectively diagnosed with adenocarcinoma, squamous cell carcinoma, and small cell carcinoma while the others slides (n=123) all came from non-tumor cases. Each diagnosis was histopathologically confirmed by two experienced pathologists. This study was approved by the Institutional Review Board of Gyeongsang National University Hospital with a waiver for informed consent (2021-04-016), and all methods were performed in accordance with the relevant guidelines and regulations.
Whole slide image (WSI) dataset
A total of 190 WSIs were acquired from 190 pathology slides with an Aperio AT2 slide scanner (Leica Biosystems Division of Leica Microsystems Inc., IL, USA) and 400x. Two experienced pathologists annotated the cancer regions on the WSIs with Aperio ImageScope v12.4.3 (Leica Biosystems Division of Leica Microsystems Inc., IL, USA). Tumor areas were extracted from the annotated whole slide images (WSIs), the extracted areas were used to generate nonoverlapping patches 256 256 pixels in size at a magnification of 20 using DeepPATH based on the OpenSlide library in Python (Fig. 1a).7 In this process, 10235 patches were generated from the original 190 WSIs containing either one of the four lung cancer subtypes or a negative case, as shown in Table 1.
Patch Dataset
The dataset consisted of slides from patients in one of 5 classes with 18 ADC, 12 LCNEC, 19 SCC, and 18 small cell carcinoma whole slide images (WSIs) of lung cancer subtypes as well as 123 non-tumor WSIs. The whole slide images were used to generate 1759 ADC patches, 1061 LCNEC patches, 1314 SCC patches, 1711 small cell carcinoma patches, and 4390 patches of non-tumor whole slide images, as shown in Fig 1b and in Table 1. The patches were removed if the percentage of background in the patch was above 25% according to the DeepPATH program.7 The input data were randomly divided into a training set, a validation set, and a testing set using the split-folders library in the Python programming language version 3.8.3 (https://pypi.org/project/split-folders/), we then generated three different datasets, A, B, and C, using the split-folders library from the original dataset. Out of 10235 patches, 7366 patches were used to construct the training sets (72%), 816 patches were used for the validation sets (8%) and 2053 patches were used in the test sets (20%).
Convolutional Neural Networks
A deep neural network (DNN) is a supervised classifier that contains multiple layers between input and output layers. 16 A convolutional neural network (CNN, or ConvNet) is a specialized kind of a DNN, CNNs are known to perform particularly when analyzing images.17 We constructed a convolutional neural network model for the multinomial classification of lung cancer biopsies with the possible outputs being the four lung cancer types or the negative case. Our CNN was built on the Keras Sequential API (https://keras.io/), written in Python and running on TensorFlow (https://www.tensorflow.org/).18 CNN models take tensors of a certain shape as input, for image analysis CNNs the shape of these tensors are dictated by the height of input images, width, and color channels. Our model takes inputs with dimensions of 244 x 244 x 3 and consists of four convolution blocks with a max pool layer in each. The 1st and 2nd hidden layers of the model have 16 and 32 filters, respectively, with a kernel size of (2, 2) and use a rectified linear unit (ReLU) as their activation functions. The 3rd and 4th hidden layers have 64 filters with a kernel size of (2, 2) and use a rectified linear unit (ReLU) as their activation functions, as shown in Table 2. The fully connected dense layer of the model has 5 units and uses a softmax activation function. Batch size of 200 and 100 epochs were determined as the optimum values for the model when considering both time and computational costs. When compiling the model, Nadam was chosen as the optimizer and categorial cross entropy was selected for the loss function.
Transfer Learning with Pre-Trained ConvNets
We evaluated four AI models that used transfer learning to implement state-of-the-art pre-trained convolutional neural networks. Transfer learning is a subfield of machine learning and artificial intelligence which uses the learned weights of an already trained model to solve a different problem instead of starting the training process of a model over from scratch, this approach saves time and computational costs.19 Transfer learning for computer vision problems is normally executed by applying pre-trained ConvNet architectures (e.g. VGG, ResNet, Xception etc) that were trained on large benchmark datasets (e.g. ImageNet14) to solve a particular problem. The pre-trained convolutional neural networks (ConvNets) allow us to build AI models for image classification with relatively high accuracy and diagnostic performance even if the target dataset is small or if the people tackling the problem do not have the required expertise to train a CNN from scratch. Pre-trained image classification networks are trained on a subset of the ImageNet database used in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC).20 Four pre-trained convolution neural networks; Xception,10 ResNet152,11 VGG19,12 and NASNETLarge13 were used in this study. Xception which was build and trained by Google, is a novel deep convolutional neural network inspired by Inception.10 It slightly outperformed InceptionV3(GoogLeNet)21 on the ImageNet database. ResNet152 is a newer version of ResNet (Residual Network), which is a convolutional neural network built and trained by Microsoft.11 VGG19, which has a depth of 19 layers, was established by the University of Oxford in 2014.12 Lastly, NASNETLarge is a state-of-art neural image classification model built and trained by Google in 2018.13 The pre-trained models are freely accessible through the Keras Application (https://keras.io/api/applications/), which is a deep learning library. After the pre-trained models were chosen, we repurposed the knowledge that had already being learned; the layers, features, weights, and biases by fine-tuning to generate the correct outputs for our problem. Batch sizes of 20 and 10 epochs were determined as the optimum values for the pre-trained CNNs in consideration of time and computational costs. When compiling each model, Nadam was chosen as the optimizer and categorial cross entropy was selected for the loss function.
Statistical analysis
To evaluate the classification performance of the AI models, area under the curve (AUC) of the receiver operating characteristic curve (ROC), precision, recall with accuracy, and f1-score were utilized.
True positive (TP): the number of cases where the class was correctly identified versus the rest of classes
False positive (FP): the number of cases where the class was incorrectly identified versus the rest of classes
True negative (FP): the number of cases correctly identified as healthy or other cancer type
False negative (FN): the number of cases incorrectly identified as healthy or other cancer type
All statistical analyses were performed using the scikit-learn library (https://scikit-learn.org/) from Python version 3.8.3 (https://www.python.org/).