In the present study, we use adult granulosa cell tumor (AGCT) of ovary and uterine leiomyosarcoma as two application scenarios to test our resolution process. AGCT is a low-grade malignant neoplasm with a significant propensity for late recurrence and metastasis9. Histologically, granulosa cell tumors (GCTs) divided into adult and juvenile, the former accounted for 95% of all GCTs 9−10. Although some clinical manifestations such as estrogen stimulation and related hormone levels have an auxiliary role in the diagnosis of this tumor, the final diagnosis still depends on the traditional histopathological examination under light microscopy. Microscopically, tumor cells are arranged in trabecular, island-like, pseudo-adenoid, vesicular or solid lamellae. Due to its variable histological features and various forms often mixed in the same tumor, this undoubtedly caused some difficulties in diagnosis of AGCT. A large number of studies have shown that the Call-Exnar body and coffee bean-like nucleus or longitudinal nucleus are the characteristic changes in typical AGCT, which is an important suggestion for the diagnosis of ovarian adult granulosa cells 10. Thus, to make the diagnosis more accurate, it is necessary for pathologists to see the details or features of nucleus under x40 magnification images.
Uterine leiomyosarcoma is the most common uterine sarcoma with high malignancy and poor prognosis. According to the diagnostic criteria for histopathology developed by Bell et al 11: (1) oderate – diffuses atypia of tumor cells with obvious necrosis; (2) oderate – diffuses atypia of tumor cells without obvious necrosis, mitotic figures ≥ 10 / 10 HPF; (3) mildly atypia of tumor cells with obvious necrosis, mitotic figures are ≥ 10/10 HPF. The diagnosis of uterine leiomyosarcoma from histological perspective requires careful determination of three factors: coagulative necrosis, cell atypia, and mitotic index. The mitotic index requires the pathologist to count at least 4 groups of 10 high power fields in areas where mitotic activity is active. Therefore, clear and high-resolution x40 magnification images are a basic prerequisite for pathologists to distinguish mitotic figures from apoptotic cells, denatured cell nuclei and nuclear debris.
We selected 45 cases of ovarian adult granulosa cell tumor and 32 cases of uterine leiomyosarcoma diagnosed from department of pathology, West China Second University Hospital, Sichuan University. All cases with no history of cancer and did not receive radiotherapy before surgery. Specimens were fixed with 4% neutral formaldehyde, conventional paraffin-embedded, 4 µm thick sections, HE staining, and light microscopy examination. All the HE slides were reviewed by 2 senior pathologists. At last, a total of 100 HE slides from 45 cases of ovarian AGCT and 100 HE slides from 32 cases of uterine leiomyosarcoma were included. All the 200 HE slides were made into full digital scanning section WSI by Hamamatsu Optics' NanoZoomer 2.0HT digital section scanner, the scanning magnification was 20 times objective and 40 times objective.
With the development of AI, deep learning with deep convolutional neural networks (CNNs)12 has been shown to be a powerful algorithm for advancing biomedical image analysis13 − 14. By studying the development and progress in the field of Single Image Super-Resolution(SISR), we find that in these two years deep learning methods are obviously superior to the traditional ones in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). Our goal is to zoom in 40 × from 20×, so we can adopt a single-scale model. For further considering the effect and computation comprehensively, we finally adopt the champion scheme named Enhanced Deep Super-Resolution network (EDSR) on New Trends in Image Restoration and Enhancement (NTIRE) 2017 Challenge on Single-Image-Super-Resolution (SISR).
EDSR is a kind of generative network based on sample learning. The general process is that firstly to get the low-resolution(LR) image down-sampled by interpolation from high-resolution (HR) image, as the LR input of a convolutional neural network(CNN), and the original image will be as the HR input of a CNN; then a large number of such paired samples are used to train the network so as to establish the end-to-end mapping relationship between the LR image and its corresponding HR image; and finally the mapping relationship is used to create a super-resolution(SR) image from LR image.
The author of EDSR constructs the model baseline with residual blocks, whose structure is similar to that of SRResNet15. However, the EDSR does not have the Rectified Linear Unit (ReLU) activation layers outside the residual blocks20. Moreover, the baseline model does not have residual scaling layers and includes only 64 feature maps for each convolution layer.
Super-resolution involves up-sampling operation of image resolution. The Super-Resolution Convolutional Neural Network (SRCNN) 16 applied convolution layers on the pre-upscaled LR image. It is inefficient because all convolutional layers have to compute on high-resolution feature space, yielding much more computation than on low-resolution space. To accelerate processing speed without loss of accuracy, Fast Super-Resolution Convolutional Neural Network (FSRCNN) 17 utilized parametric deconvolution layer at the end of SR network, making all convolution layers compute on LR feature space. Another non-parametric efficient alternative is pixel shuffling 18 (a.k.a., sub-pixel convolution). Pixel shuffling is also believed to introduce less checkerboard artifacts than the deconvolutional layer. We also used the pixel shuffling as the up-sampling operation.
We used the adult granulosa cell tumor images and the leiomyosarcoma images as our datasets (Fig. 1a-b). We get a large of training patches with size of 512 × 512 as the HR images randomly spliting from each training Whole-Slide Imaging(WSI) and validating patches are from each validating WSI and testing patches are from each testing WSI.
Due to computational resources, we just use the baseline model of EDSR, and the specific parameters in training are listed in Table 1. When training, we use the RGB input patches of size 256 × 256 from LR image with the corresponding HR patches. The 512 × 512 RGB input patches from HR image and its bilinear down-sampled image will be as the training output-input pairs. We pre-process all the images by subtracting the mean RGB value based on the default settings. We also train our networks using L1 loss instead of L2, because L1 loss can provides better convergence than L2 as far as we know. We used both the granulosa cell tumor and the leiomyosarcoma output-input pairs. After a few training sessions, the final training loss curve is shown in Fig. 1c and the PSNR curve in validating dataset is shown in Fig. 1d. The last training used the best model before as the pre-trained model.