Data Collection
Permission to carry out this study, with waiver of consent and in compliance with HIPAA, was obtained from the UCSF Institutional Review Board (approval 18-24659) and all research was carried out in accordance with relevant guidelines and regulations. This study was a retrospective study, for which the requirement for informed consent was waived by the Institutional Review Board of UCSF due to its data source and methods. The data used for this study is not publicly available due to sensitive medical information, but is available from the corresponding author on reasonable request. CAL measurements were aggregated on patients spanning a three-and-a-half-year period from July 2016 through January 2020. Cases were matched with radiographic images (bitewing and periapical radiographs) acquired within 6 months prior to the periodontal therapy. Cases where periodontal charted values and diagnosis did not match with therapy were excluded as well as negative CAL values and those of 6mm and above to maintain high ground truth data integrity. Purposive sampling selection criteria and cases selected were reviewed by 3 experienced academic practicing clinicians (one periodontist, GL 11 years of experience and two general dentists, RV 22 years of experience and JMW 38 years of experience) to verify ground truth data.
Generative Adversarial Inpainting with Partial Convolutions
The inpainting network is comprised of 2 generators and 3 discriminator CNNs. The Partial Convolution Encoder-Decoder generator focuses the network on missing regions of the images and fills in missing anatomy, while the second Refine Encoder-Decoder generator encourages overall realism of the image and helps refine the predictions from the Partial Convolutional Encoder-Decoder generator. The discriminators utilized a pre-trained VGG network and 1 patchGAN dynamic discriminator. Figure 1 depicts the information flow of the inpainting process as well as the various network components.
The Partial Convolutional Encoder-Decoder generator is comprised of 12 partial convolutional blocks that consist of 3x3 partial convolutional layers paired with leaky rectified linear unit (LeakyReLU) activation and synchronized instance normalization. The encoder stage utilized 6 Partial Convolutional blocks with stride 2 and the decoder stage utilized convolutional kernels with 2x upsampling layers. The outputs of each multi-scale level in the encoding stage are concatenated with the upsampled corresponding decoding multi-scale level as part of the Partial Convolutional apparatus.
The Refine Encoder-Decoder generator consists of 2 7x7 convolutional blocks, and 7 residual blocks. The 7x7 convolutional blocks utilized 7x7 stride 2 convolutional kernels, synchronized instance normalization, 3x3 convolutional kernels, synchronized instance normalization, and ReLU activation. The residual blocks contained a 3x3 convolutional kernel, instance normalization, followed by a 3x3 convolutional kernel, instance normalization, and a residual connection that additively combined the input of the residual block with the output.
The VGG discriminator consisted of 5 4x4 convolutional blocks with instance normalization and a final average pooling layer. The 2nd, 3rd, and 4th blocks were paired with LeakyReLU activation. The patchGAN discriminator consisted of 4 4x4 convolutional kernels paired with instance normalization.
Training, and Hypothesis Testing
The model was trained using five different loss functions. The generator loss and L1 loss enforce spatial congruence between the input and output network signals. The Style loss uses loss from shallow and deep layers to help encourage stylistic consistency of the inpainted region. The VGG losses use deep networks to help enforce general realism of the predictions. The patchGAN discriminator loss helped preserve local texture features in the inpainted region. Figure 2 shows the generator, L1, Style, VGG, and discriminator loss functions.
The model was trained on 80,326 images and validated on 12,901 images. An additional 10,687 images corresponding to 40,077 ground truth CAL measurements were withheld from the data science team and used only once to avoid multiple hypothesis testing. Ground truth radiographic images were provided to blinded data scientists without clinical data for CAL predictions.
During inferencing, the discriminators were discarded and only the generators were used. The resulting inpainted image from the generator was fed into a CAL prediction. The two CAL prediction algorithms used in this study based on two open-source algorithms, Deep Lab (https://github.com/tensorflow/models/tree/master/research/deeplab) and DETR (https://github.com/facebookresearch/detr). We refer to Deep Lab as Method 1 and DETR as Method 2. The accuracy of the inpainted algorithm for method 1 (Inpaint 1) was compared to the non-inpainted algorithm for method 1 (Non-Inpaint 1) and the accuracy of the inpainted algorithm for method 2 (Inpaint 2) was compared to the non-inpainted algorithm for method 2 (Non-Inpaint 2).
The purpose of this study was to demonstrate the affect inpainting using various open-source CAL prediction algorithm. We could not replicate all foreseeable CAL prediction algorithms so only common open-source algorithms were used.
To determine the affect inpainting had on the resulting CAL accuracy, MAE between the ground truth and predicted values was determined as well as comparator p-values from the Kruskal-Wallis test. Additionally, pairwise differences in accuracy between each method was assessed using Dunn’s pairwise comparison. The Dunn’s test for pairwise comparisons was used as a post hoc test and adjusted for familywise error rate. Prediction accuracy among teeth (first molar, second molar, and premolar) was evaluated using the Kruskal-Wallis test. All hypothesis testing assumed a standard significance level of 0.05.