Patient specimens, digital image acquisition and image pre-processing
The study is based on retrospective collection of 695 routine renal biopsies performed and tested at the National Center of Pathology (Vilnius, Lithuania) from 2016 to 2021. The cohort was balanced by a final pathology diagnosis that contained: 100 cases of IgA nephropathy, 99 cases of membranoproliferative glomerulonephritis, 100 cases of crescent glomerulonephritis, 100 cases of membranous nephropathy, 96 cases of minimal change disease, 35 cases of secondary focal segmental glomerulosclerosis and 92 biopsies of endocapillary glomerulonephritis. Formalin-fixed paraffin embedded routine renal biopsies were cut at 3µm-thick sections and stained by modified Picrosirius Red stain. Digital whole slide images (WSI) were recorded using a ScanScope XT Slide Scanner (Leica Aperio Technologies, Vista, CA, USA) under 20x objective magnification and 0.5-µm resolution) and subsequently subjected to digital image analysis by using HALO™ software (version 3.5.3577.140 and HALO AI 3.5.3577; Indica Labs, Corrales, New Mexico, United States) for glomerular segmentation. Based on manual annotations, HALO AI DenseNet classifier was trained to recognize and segment glomeruli containing all types of injury patterns. HALO classifier prediction masks were used to create a collection of glomeruli cropped from original biopsy WSI into 1024 × 1024 pixel-sized images. A total of 27,156 glomerular images were extracted and pre-processed by replacing surrounding renal cortex tissue with a black background.
Ethics declarations
All tissue samples originated from the Lithuanian National Center of Pathology and the study was performed under permission of Vilnius Regional Biomedical Research Ethics Committee No. 2019/6-1148-637. Informed consent was waived by Vilnius Regional Biomedical Research Ethics Committee and all methods were performed in accordance with the relevant guidelines and regulations.
Defining glomerular injury patterns and datasets for classification
All extracted glomerular images were reviewed and categorized into nine main injury patterns: mesangioproliferative, endocapillary, membranoproliferative GN, membranous, crescentic, segmental glomerulosclerosis, hypertrophy, global glomerulosclerosis, and normal glomeruli (Fig. 1). Glomeruli that represented mixed or ambiguous patterns of injury were not included in the training set. The second step of preselecting glomerular images for CNN-based approach was carried out as a consensus of two nephropathologists who blindly reviewed the preselected glomerular images and classified the glomerular injury patterns as pure (uniform) as possible. Images of cropped glomeruli representing pure patterns were randomly assigned to testing and training sets.
9 classes of glomerular histological patterns were used: a. Crescentic, b. Endocapillary, c. Mesangioproliferative, d. Membranoproliferative, e. Segmental sclerosis (FSGS), f. Membranous, g. Hypertrophy, h. Normal glomeruli, i. Global sclerosis.
We doubled the number of glomeruli images in a training set by rotational augmentation – each original image of a cropped glomerulus was rotated by individually selecting random rotation angle in 90° steps (one of 90°, 180°, 270°). The complete composition of both training and testing sets is given in the Table 1.
Table 1
Composition of training and testing data sets.
Glomerular injury pattern | Testing set | Training set | Total original glomeruli |
original | original | augmented |
Crescentic | 29 | 81 | 81 | 110 |
Endocapillary | 37 | 81 | 81 | 118 |
FSGS | 54 | 81 | 81 | 135 |
Hypertrophy | 33 | 81 | 81 | 114 |
Membranoproliferative | 46 | 81 | 81 | 127 |
Membranous | 35 | 81 | 81 | 116 |
Mesangioproliferative | 42 | 81 | 81 | 123 |
Normal | 96 | 81 | 81 | 177 |
Sclerosed | 45 | 81 | 81 | 126 |
Total | 417 | 1458 | 1146 |
Multi-class classification of glomerular injury patterns by a single artificial neural network-based classifier
We utilized the training set develop classifiers for the 9 patterns of glomerular injury. First, we built a single nine-class classifier based on an ImageNet-pretrained Xception neural network architecture. We reconfigured Xception model to accept input images of 1024 × 1024 pixel size. For each instance of glomerulus, the final classification layer of the model was set to generate nine-class probability output via the softmax activation function. The classifier was trained with a balanced dataset by feeding the model with four image batches. The training was guided by stochastic gradient descent (SGD) optimizer that minimizes categorical cross entropy loss.
One-vs-Rest classification of glomerular injury patterns by multiple binary classifiers
Secondly, we performed a binary classification of glomerular injury patterns in a one-vs-rest setting by splitting the nine-class dataset into nine binary classification problems. For this task, we built nine distinct binary Xception-based classifiers, each aimed to discriminate a particular glomerular injury type from remaining classes. For each binary classifier, we again reconfigured the original Xception architecture for 1024 × 1024 pixel-sized input, but the final classification layer was set to generate single-class probability output via the sigmoid activation function. One-vs-rest classifiers were trained by SGD optimizer to minimize the binary cross entropy loss. Binarization of training labels introduces class imbalance; therefore, we balanced the training set by sampling all the glomeruli from the positive (target) class and proportionally subsampling remaining classes to collect equivalent number of glomeruli to represent negative class. During inference for a particular glomerulus, each binary classifier predicts a class membership probability score. The argmax of these scores defines the overall predicted class of glomerulus injury pattern.
Visualizing classifier attention maps with Grad-CAM
To interpret and explain the classification criteria used by classifiers during inference, we visualized discriminative regions of glomeruli images relevant to distinct injury patterns. For this task, we employed gradient-weighted class activation mapping technique that captures spatial information that is preserved through convolutional layers of the trained classifier. [38] The localization heatmap is calculated as a weighted sum of feature maps in the final convolutional layer of the classifier and up-sampled to match the size of the original image. For display, the up-sampled Grad-CAM localization heatmaps are overlayed on top of glomerular images.
Spatially guided multiclass classification of glomerular injury patterns
Grad-CAM is a post hoc neural network attention method, which means that it does not participate in the classifier training. Localization heatmaps are not learned specifically and are not influenced by any particular model training parameters. Moreover, in Grad-CAM, up-sampling is achieved in a single step going from tiny final convolutional layer-sized localization heatmap up to an input image-sized final heatmap by an integer factor that typically is of the order of dozens, meaning that the final heatmap is coarse and noisy.
Proposed neural network architecture
Therefore, to improve classifier focus on essential parts of the image and increase attention localization heatmap granularity, we attempted building a trainable attention mechanism. For this task, we employed the U-Net-like encoder-decoder structure. We again used Xception architecture as the base model; however, we modified it with U-Net style decoder and skip connections. We designed the network with three output layers. The final convolutional layer of the Xception architecture on the encoder branch is connected to an aggregation block consisting of a global average pooling operation and an intermediate densely connected layer (\({int}_{1}\)). This block feeds the first (auxiliary) densely connected output layer (\({clf}_{aux}\)) that has a softmax activation function to produce nine-class probability output. A similar block (global average pooling followed by dense intermediate (\({int}_{2}\))) is added at the end of the decoder branch. The second (main) dense classification layer (\({clf}_{main}\)) receives concatenated output of intermediate dense layers (\({int}_{1}\)) and (\({int}_{2}\)) to produce another nine-class probability output via softmax activation function. In parallel to the (\({clf}_{main}\)) branch, the single neuron, 1 × 1 2D convolutional layer acts as the third output layer (\(loc\)) that produces a localization heatmap exactly matching input image size. Therefore, main classification branch (\({clf}_{main}\)) is conditioned to depend upon both localization (\(loc\)) and auxiliary classification (\({clf}_{aux}\)) branches. The detailed schema of the proposed neural network architecture is given in Fig. 2.
During the training, network learns to assign features of the glomeruli images to the corresponding ground truth class labels of distinct glomerular injury patterns (\({clf}_{aux}\) and \({clf}_{main}\)) and a functional mapping between pixels in glomeruli images and the pixels in corresponding ground truth localization heatmaps (\(loc\)).
The model is configured to accept 1024 × 1024 pixel-sized glomeruli images and is trained by an optimizer independently minimizing three weighted loss functions – categorical cross entropy loss functions for \({clf}_{aux}\) and \({clf}_{main}\) classification outputs, and a binary cross-entropy for localization heatmap output (\(loc\)). Weighting loss functions enables prioritization of network tasks, e.g., main classification over auxiliary classification (\({w}_{{clf}_{main}}>>{w}_{{clf}_{aux}}\)), localization slightly over main classification (\({w}_{{clf}_{main}}>{w}_{loc}\)). We trained the model with following constraints: \({w}_{{clf}_{main}}+{w}_{{clf}_{aux}}+{w}_{loc}=1\) and \({{w}_{loc}>w}_{{clf}_{main}}\gg {w}_{{clf}_{aux}}\) .
Ground-truth localization heatmaps
First, to produce ground truth localization heatmaps, nephropathologists were asked to highlight hotspots of segmental glomerular injury patterns by placing a simple free-form annotation (as a small as single pixel) in a copy of the glomeruli training set. Multiple hotspots were allowed. Diffuse glomerular injury patterns and normal glomeruli were automatically annotated by a single pixel annotation at a center of the glomerulus mask.
These annotations were then transformed into heatmaps where every pixel in a heatmap gets a value through a non-linear distance-based function:
$${h}_{x,y}=\frac{1}{1+{e}^{-({c}_{1}\cdot {d}_{x,y}+{c}_{2})}}$$
where \({h}_{x,y}\) is a value of a pixel at \(x,y\) position in an image plane, \({d}_{x,y}\) is Euclidean distance from that pixel to the nearest pixel in an annotation, \({c}_{1}\) and \({c}_{2}\) are preselected constants. Briefly, pixels closer to the annotation get values closer to 1.0, and pixels further from the annotation get values closer to 0.0. All pixels outside the glomerulus contour get 0.0 values. Example localization heatmaps are shown in Fig. 3.
Left (\(-7\),\(6)\), middle (\(-9\), \(6),\) and right \((-11\), \(6)\). Transformation coefficients for use in this paper (\({c}_{1}=-9\), \({c}_{2}=6)\) were selected by visual appreciation of resulting ground truth localization heatmaps.
Metrics
Accuracy was used to monitor all classifier models during the training phase (see Supplementary Fig. 1 for model training metrics). During inference, classifier performance was compared by the area under the ROC curve metrics (AUC), and multi-class confusion matrices were used to get deeper insights of classification errors and biases. We compare classifier experiments in a multiclass classification setting, therefore mean classification accuracy over cross-validation folds reported per class in Table 2 in Results section is calculated as true positive rate. The amount of variance in the classifier performance is reported by the standard deviation of accuracies (coefficients of variance are employed to identify values exceeding average). The generalized classification accuracy is calculated over the diagonal of an aggregated multiclass confusion matrix. The quality of predicted localization heat maps was evaluated by an intersection over union metrics between predicted and ground truth localization heat maps at a threshold value. Furthermore, injury pattern area quantification was measured by a percentage of predicted intraglomerular pattern heatmap.
Implementation
All image data manipulation steps (preprocessing of glomeruli images, generation of augmented images and ground truth heatmaps, as well as figures in a manuscript) were done in Python 3.8.10 using ‘scikit image’ v.0.19.1, ‘matplotlib’ v.3.5.1, ‘numpy’ v.1.20.0 and ‘scipy’ v. 1.7.3 libraries. The classifiers were built, trained, and evaluated in Python 3.8.10 using ‘tensorflow’ v.2.7.0 and ‘scikit-learn’ v.1.0.2 on a high-performance graphical processing unit (Nvidia GeForce RTX 3090).