a. Datasets
In this study, two dataset types, i.e., from animal and human subjects, were obtained from custom designed OCT systems. Two separate custom OCT systems were used for the imaging of animal and human subjects.
i. Animal Dataset
For animal data acquisition, this study collected OCT and OCTA data from mice eyes. Briefly, the mouse custom SD OCT system utilized a near-infrared (NIR) superluminescent diode (SLD; λ = 810 nm; Δλ = 100 nm; Superlum, Carrigtwohill, County Cork, Ireland) light source. A line CCD camera with 2048 pixels (AViiVA EM4; e2v Technologies, Chelmsford, United Kingdom) was used for recording OCT spectra in the custom-build OCT spectrometer. The frame rate of the camera was set to 50 kHz. The axial and lateral resolution were theoretically estimated as 2.9 and 11 µm, respectively. The power illuminated at the mouse cornea was ~ 1 mW.
For the image acquisition of wild type mice (strain: C57BL/6J), an anesthetic agent was intraperitoneally induced by a mixture of ketamine (100 mg / kg body weight) and xylazine (5 mg / kg body weight) and a drop of 1% tropicamide ophthalmic solution (Akorn, Lake Forest, Illinois) was applied to the imaging eye. Next, a cover glass (12-545-80; Microscope over glass, Fisherbrand, Waltham, Massachusetts) with a drop of eye gel (Severe; GenTeal, Novartis, Basel, Switzerland) was placed on the imaging eye. After the mouse was completely anesthetized, the head was fixed by a bite bar and ear bar in the animal holder that provided five degrees of freedom (i.e., x, y, z, pitch, and roll). Volumetric raster scans were acquired from the various retinal quadrants, e.g., dorsal, ventral. Four repeated B-scans at each slow-scan position were collected for OCTA construction; thus, each OCT volume consisted of 4 × 600 × 600 A-scans and covered a FOV of 1.2 × 1.2 mm2. All animal experiments were approved by the local animal care and biosafety office and performed following the protocols approved by the Animal Care Committee (ACC) at the University of Illinois at Chicago (ACC Number: 19–044). This study followed the Association for Research in Vision and Ophthalmology Statement for the Use of Animals in Ophthalmic and Vision Research. For this dataset, we collected a total of 24 OCT datasets from 6 mice, corresponding to one eye and four retinal quadrants, i.e., dorsal, nasal, ventral, and temporal. Datasets that had inhomogeneous brightness and other imaging artifacts were excluded. Therefore, 16 volumes comprised the dataset, 9 volumes for training, 1 volume for validation, and 6 volumes for testing.
ii. Human Dataset
Briefly, the human custom SD OCT system utilized the same NIR SLD λ = 810 nm; Δλ = 100 nm; Superlum, Carrigtwohill, County Cork, Ireland). A line CCD camera with 2048 pixels (AViiVA EM4; e2v Technologies, Chelmsford, United Kingdom) was used for recording OCT spectra in the custom-build OCT. spectrometer. The frame rate of the camera was set to 70 kHz. The axial and lateral resolution were theoretically estimated at 1 and 10 µm, respectively. The illumination power on the human cornea was ~ 600 µW. For the image acquisition of human subjects, no anesthetic agent was used. A custom chin rest was employed to reduce head movements, a fixation target with a dim red light was used to minimize voluntary eye movements, and a pupil camera was used to aid in retinal localization by the photographer. Volumetric raster scans were acquired from the macular. Four repeated B-scans at each slow-scan position were recollected for OCTA construction; thus, each OCT volume consisted of 4 × 300 × 300 A-scans and covered a FOV of 3 × 3 mm2. All human experiments were approved by the Institutional Review Board of the University of Illinois at Chicago and were in pursuance with the ethical standards stated in the Declaration of Helsinki. For this dataset, we collected a total of 16 eyes from 8 healthy subjects. Datasets with inhomogeneous brightness and severe motion artifact were excluded, for this study 10 eyes comprised the dataset, since each eye contains 4 OCT volumes, all of the repeated volumes were used for training as a form of data augmentation for the training dataset. Therefore, 16 volumes were used for training, 1 volume for validation and 5 volumes for testing.
iii. OCTA construction
The OCT scan pre-processing starts registration of the OCT volume. The method that was employed for frame registration the Discrete Fourier Transform (DFT) registration method [28]. Since the OCT volume contains multiple repeated scans, the first step is to perform intra-frame registration, where each repetitive scan is registered to the first scan. This process is repeated for all scans. Next, inter-frame registration is performed to register each of the scans within the volume. After the OCT Volume Pre-processing, OCTA images were constructed by implementing an intensity-based speckle variance (SV) processing for both mouse and human OCT volumes with the following equation [29]:
$$\begin{array}{c}S{V}_{ij}=\frac{1}{N} \sum _{i}^{N}{\left[{I}_{ijk}\left(x,z\right)-\frac{1}{N} \sum _{i}^{N}{I}_{ijk}(x,z)\right]}^{2}=\frac{1}{N} \sum _{i}^{N}{\left[{I}_{ijk}-{\left({I}_{mean}\right)}_{jk}\right]}^{2}\#\left(13\right)\end{array}$$
Where \(i\), \(j\) and \(k\) are the indices of the frames, lateral resolution, and depth pixel of the OCT B-scan, respectively. \(N\) is the number of frames used in the calculation. \({\left({I}_{mean}\right)}_{jk}\) is the averaged frame of \(N\) frames.
iv. Enface Processing
For enface images, the superficial vascular plexus (SVP) is demarcated by the NFL-GCL and the IPL-INL boundaries, and the deep vascular plexus (DVP) is demarcated by the IPL-INL and the OPL-ONL boundaries. For mouse dataset, retinal flattening was performed by realigning each A-line. Since the retinal quadrants, e.g., dorsal, of the mouse retina was smooth, the boundaries of the SVP and DVP were manually segmented. For human dataset, retinal layer segmentation was performed using the Iowa Reference Algorithms (Retinal Image Analysis Lab, Iowa Institute for Biomedical Imaging, Iowa City, IA) to determine the boundaries for the SVP and DVP [30–32]. The enface images were generated using average intensity projection. Image processing was performed on MATLAB R2021b (MathWorks, Natick, Massachusetts) with image processing packages in ImageJ [33].
b. Results
i. TVC and SVC ablation study
In this study, we will perform an ablation study to evaluate the effects of the different numbers of b-scans used for OCTA construction using the temporal vascular connectivity (TVC), i.e., repeated b-scans, and the SVC, i.e., adjacent neighbor b-scans. For both TVC and SVC will use Eq. 13, however the number of frames \(N\) for TVC is representative of the number of repeated b-scans, whereas for SVC is representative of the number of adjacent neighbors.
This procedure will help to determine the optimal number of neighboring b-scans to be used as input into SVC-Net. For this study, each volume contains four repeated b-scans, therefore for qualitative comparison, we illustrate OCTA from SV generated using 2N, 3N and 4N using TVC and SVC as illustrated in Fig. 3 and Fig. 4, for animal and human dataset, respectively. It can be qualitatively observed that for the TVC, as the number of repeated b-scans increases, there are lower levels of noise as illustrated by Fig. 3A-C and Fig. 4A-C. On the other hand, for SVC there is a different trend, it can be observed that the SVC-3N has improved vascular detail as compared to the SVC-2N. However, SVC-4N as compared to SVC-3N, decreases the vascular detail due to the increase in a blur effect, which can be visibly observed in the representative enface for human dataset, Fig. 4F. This type of artifact is commonly described in OCTA as ‘vessel doubling’ due to poor registration. In the case of SV-4N, since the vessels are not completely duplicated, we can refer to this artifact as pseudo vessel doubling.
For the quantitative comparison, since the TVC-4N has the best qualitative performance, it will be the ground truth for comparison. We quantify the MS-SSIM and the PSNR for the TVC 2N and 3N, and the SVC 2N, 3N and 4N enface images. For the animal dataset, we confirm our qualitative observations that increasing the number of b-scans improves performance, as the TVC-3N has the best overall MS-SSIM and PSNR (Fig. 5A-B). Whereas, for the SVC, we observe that for the animal dataset, there is a decreasing trend for the MS-SSIM from SVC-2N to SVC-4N. However, we observe for the PSNR, the optimal number of b-scans was the SVC-3N. For the human dataset, we observe similar trends as compared to the mouse dataset, with the TVC has an improved performance on both metrics when using more b-scans (Fig. 5C-D). For the SVC metrics, we observe that for both the MS-SSIM and PSNR, the optimal number of b-scans was using SVC-3N, since as it has better performance than SVC-2N and SVC-4N. Therefore, for SVC-Net, based on qualitative observation and quantitative analysis we will use a 3N input, i.e., comprised of three adjacent neighboring b-scans.
ii. Comparison on Animal Dataset
On the animal dataset, we compared the effects of SVC and the loss function for OCTA construction using SVC-Net. For comparison of the effects of SVC, both the 1N and 3N models trained on the SSIM were compared. The first qualitative comparison on the initial output of SVC-Net is the OCTA B-scan. The qualitative differences in the B-scans for the 1N and 3N models as illustrated in Fig. 6. We observe that for the large vessels, both the 1N and 3N models, are able to generate with minimal loss in structure, as denoted by the red arrows in Fig. 6C-E. However, we observe that in the 1N model, the smaller capillary level vessels were not predicted with similar intensity values (Fig. 6F2). In contrast, the 3N model was able to predict the smaller capillaries in greater detail (Fig. 6F3).
The primary usage of OCTA is to observe the enface projections of the retinal vascular layers, therefore, we perform both qualitative and quantitative analyses for the enface projection of the SVP and DVP. Examples of the retinal layer segmentation and flattening to generate the enface projections are illustrated in Fig. 6A-B. To compare the effects of the SVC, we qualitatively compare the 1N and 3N models to the ground truth enface of the SVP and DVP in Fig. 7. We notice a consistent observation as the B-scans, in that the large vessels of the SVP were constructed in both the 1N and 3N models. However, an example of a large vessel that progressed to a smaller vessel was observed to have poor construction in the 1N model, encapsulated by the green box in Fig. 7A2. Whereas in the 3N model, the same vessel was constructed properly, as illustrated by the yellow box in Fig. 7A3. Showing that SVC helped to preserve the details of smaller vessels. For the DVP, we observe that the 1N model was able to predict some capillary structures, however due to the poor contrast, it has a noisier appearance compared to the 3N model, which produced finer detailed capillaries.
In this study, we also compare the effects of the different loss functions, i.e., MSE and SSIM loss functions, for OCTA construction on the 1N and 3N models. Examples output enfaces from a mouse eye of each model are illustrated in Fig. 8. Comparison between the 1N models trained with MSE and SSIM, we observe a discernable increase in noise when the model is trained with the MSE loss function Fig. 8A2. In comparison to the 3N models trained with MSE and SSIM, both models have lower levels of noise, with the 3N model trained with SSIM having the overall best contrast (Fig. 8A5, B5). In the DVP, we can observe that the 1N trained with SSIM (Fig. 8B3) has better capillary level structure as compared to the 1N trained with the MSE (Fig. 8B2), and when SVC is employed, it improves the contrast of the capillaries (Fig. 8B4-B5).
The evaluation metrics, MS-SSIM and PSNR, were quantified on both the SVP and DVP enfaces to quantitatively compare the performances of the four models summarized in Fig. 9. For the MS-SSIM metric, it can be observed in the mouse dataset that the 1N model trained with MSE had the lowest performance, followed by the 1N model trained with the SSIM loss function, which had a slight improvement (Fig. 9A). The introduction of the SVC significantly improved the similarity between ground truth and predicted enface images. The 3N model trained with MSE had significantly better results than both 1N models. With the modification of the loss function, the 3N model trained with SSIM had the best performance.
For the PSNR measurement, we observed that in the SVP, there were only slight differences between the loss functions while using the same input types (single or neighbored inputs) (Fig. 9B). This may be due to the presence of large vessels in the SVP, which regardless of the loss functions, the CNN was able to predict consistently and had maximum pixel intensity, e.g., 255. However, when observing the quantitative evaluations for the DVP, we observe PSNR distinguishable improvements of the loss function and input type on the model’s performance. This may be due to the abundance of smaller capillary level structure, where if the predicted image had an increase in noise, it can be reflected in the PSNR value. The use of SVC and the SSIM loss function reduced the noise level and improved the PSNR. For the mouse dataset, the 3N model trained with SSIM loss function had the best overall performance.
iii. Comparison on Human Dataset
On the human dataset, we also compared the effects of the SVC and the loss function for OCTA construction using SVC-Net. For qualitative comparison of the initial output of SVC-Net, the B-scans, we compare the 1N and 3N models trained on the SSIM (Fig. 10). We observe that the 1N and 3N models can predict both the large and small vessels. However, the 1N model does have some areas with poor predictions as denoted by the yellow arrows in Fig. 10E1-E3, where the 1N model is missing the bright vessel signal as compared to the 3N and ground truth. In comparison to the ground truth, the 3N model is able to predict more of the vascular structure (Fig. 10E3).
For qualitative and quantitative comparison, we segment the layers into SVP and DVP. Example of retinal layer segmentation are illustrated in Fig. 10A. To demonstrate the effects of SVC, we compare the enface SVP and DVP projections between the 1N and 3N model in Fig. 11. We can observe that for the human dataset, both the 1N and 3N models can predict both the large and small vessel structures in the SVP. It can be observed in the enlarged images of the SVP in Fig. 11, that the 1N model has poor contrast for the smaller vessels. In the DVP, we can observe that the contrast in the 3N model is better than the 1N model as demonstrated in Fig. 11B2-B3. This correlates with the observations in the B-scan outputs, the 1N model have some mispredictions and suggests that the SVC improves the model’s performance to predict the finer structural details in capillary level vessels.
Next, we compare the effects of the loss functions on the model’s performance in a human eye as illustrated in Fig. 12. We observe that the for the 1N models, the model trained with the MSE loss function has higher levels of noise and poorer contrast as compared to the 1N model trained with the SSIM loss function. The modification of the loss function improved the 1N’s performance to produce the capillary level structures in higher contrast (Fig. 12B2-B3). Similar observations can be seen for the 3N models. The model trained with the MSE loss function, in the SVP, we can observe that some of the capillary level structures are predicted, however there is relatively more noise (Fig. 12A4). Whereas in the model trained with the SSIM loss function, the noise level is reduced (Fig. 12A5). We can also observe that in the DVP, the capillary level structures seem more dilated, this could be due to the lower levels of contrast between the fine vessels (Fig. 12B4). The SSIM trained model is able to produce vessel structures with higher contrast (Fig. 12B5).
In this study, we measured the MS-SSIM and PSNR between the different models, 1N and 3N, trained on different loss functions and for the different vascular layers, SVP and DVP (Fig. 13). We can observe that for the MS-SSIM in both the SVP and DVP, there is a discernable trend of increasing performance, with the 1N model trained with MSE having the worst performance and the 3N model trained with SSIM having the best performance in this study. For the PSNR, we observe a similar trend with increasing performance. However, it was observed that for the 3N models, there were no discernible differences between the loss functions for the SVP. This observation may be due to the fact that in the SVP, there is the presence of large vessels, which may already saturate a substantial portion of the pixels, resulting in similar PSNR values. However, we can observe noticeable PSNR improvements in the DVP, where there are finer vessel structures. In the human dataset, the 3N model trained with SSIM had the best performance.