SVC-Net: A spatially vascular connectivity network for deep learning construction of microcapillary angiography from single-scan-volumetric OCT

doi:10.21203/rs.3.rs-2387074/v1

Download PDF

Article

SVC-Net: A spatially vascular connectivity network for deep learning construction of microcapillary angiography from single-scan-volumetric OCT

https://doi.org/10.21203/rs.3.rs-2387074/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 09 Feb, 2024

Read the published version in Communications Engineering →

Version 1

posted

You are reading this latest preprint version

As one modality extension of optical coherence tomography (OCT), OCT angiography (OCTA) provides unparalleled capability for depth-resolved visualization of retinal vasculature at the microcapillary level resolution. For OCTA image construction, repeated OCT scans from one location are required for temporally vascular connectivity (TVC), i.e., OCT signal variance among sequential images, to identify blood vessels with active blood flow. The requirement for multi-scan-volumetric OCT can reduce OCTA imaging speed, which will induce eye movements, and limit the image field-of-view. In principle, the blood flow should also affect the spatially vascular connectivity (SVC), i.e., reflectance brightness profile along the vessel direction, in a single-scan-volumetric OCT. In other words, the SVC in single-scan-volumetric OCT might be equivalent to the TVC in multi-scan-volumetric OCT for high fidelity OCTA construction. In this article, we report an SVC network (SVC-Net) for deep learning OCTA construction from single-scan-volumetric OCT. The effect of SVC for deep learning OCTA was evaluated by SVC-based speckle variance calculation, revealing three adjacent B-scans has the optimal performance. We further compare the effects of SVC with three adjacent B-scans and a single B-scan inputs into SVC-Net. The structural-similarity index measure (SSIM) loss function was selected to optimize deep learning contrast enhancement of microstructures, such as microcapillaries, in OCT. This was verified by comparative analysis of the SVC-Net performances with SSIM and mean-squared-error (MSE) loss functions. The combination of SVC involvement and SSIM loss function enabled microcapillary resolution OCTA construction from single-scan-volumetric OCT. The performance of the SVC-Net was verified with OCT datasets from both superficial and deep vascular plexus in mouse and human eyes.

Health sciences/Medical research/Biomarkers/Diagnostic markers

Health sciences/Health care/Medical imaging/Three-dimensional imaging

Health sciences/Health care/Medical imaging/Tomography/Computed tomography

optical coherence tomography

optical coherence tomography angiography

deep learning

spatial connectivity

vascular connectivity

Optical coherence tomography (OCT) enables the non-invasive visualization of individual retinal layers with micrometer level resolution. As one modality extension of OCT, OCT-angiography (OCTA) provides unparalleled capability for depth-resolved visualization of retinal vasculature at the microcapillary level resolution. OCTA is label-free and thus is completely non-invasive, compared to the traditional fluorescein angiography. In principle, OCTA can be obtained from existing OCT systems with the addition of unique scan protocols and data processing algorithms [1]. However, the fundamental similarity between all OCTA instruments is that repeated OCT scans from one location are required for correlation analysis of sequential images to identify regions with active blood flow. The requirement for multi-scan-volumetric OCT can reduce OCTA imaging speed which will induce eye movements and limit the image field-of-view. The prolonged image acquisition time may also increase the potential effect of motion artifacts, such as blinking and microsaccades [2].

In recent years, deep learning, a subset of machine learning and artificial intelligence (AI), is making strides in ophthalmic research. The principle behind deep learning is that the algorithm can learn directly from the training data and can objectively perform the required task. An example application is deep learning for AI screening of retinopathies. Current screening procedures require clinicians to manually examine retinal photographs. This can therefore lead to inter- and intra-rater variability; the same clinician could classify the same image differently on different days. Furthermore, to manually screen retinal photographs is a time-consuming process. Therefore, the deployment of AI algorithms could alleviate these problems. Recent studies in deep learning OCTA have primarily been focused on classification of eye diseases such as diabetic retinopathy (DR) [3–5], age-related macular degeneration (AMD) [6–8], and glaucoma [9–11]. Other applications include improving the image quality of OCTA [12, 13] and artery-vein segmentation [14–16]. Recently, deep learning has been also explored for the OCTA construction [17–20]. While deep learning algorithms can detect large blood vessel branches in OCT readily, it is technically challenging to identify microcapillaries reliably.

In traditional OCTA, repeated OCT scans from one location are required for temporally vascular connectivity (TVC) processing to map retinal vasculature at the microcapillary resolution. In principle, the blood flow should affect the reflectance profile, i.e., brightness connectivity along the vessel direction, in a single-scan-volumetric OCT. The purpose of this study is to demonstrate deep learning-based construction using spatial vascular connectivity in single-scan-volumetric OCT. We present, Spatial-Vascular-Connectivity Network, SVC-Net, a fully convolutional network (FCN), an encoder-decoder network architecture. The deep learning pipeline and our proposed methodology using OCT SVC are illustrated in Fig. 1. The study is to verify the feasibility of a deep learning approach using a dataset composed of single-scan-volumetric OCTs from animal and human eyes. This study also highlights the importance of loss functions for improving deep learning performance. This approach can improve clinical implementation of OCTA by reducing acquisition time, i.e., alleviate the requirement of multiple repetitions, thereby can alternatively improve transverse image resolution, or increase field-of-view.

In this section, we provide an overview of the relevant literature on OCTA construction algorithms, and recent deep learning applications of representation learning for OCTA images.

a. OCTA Construction Algorithms

The principle behind OCTA is the use of the variation in the OCT signal caused by moving scatters in the vessels, e.g., red blood cells, as a source of contrast to image blood flow. By imaging at the same location, over time the OCT signal backscattered from the structural tissue remains constant, whereas the OCT signal backscattered from the blood vessels would change over time. Therefore, OCTA algorithms are based on the components of the OCT signal, i.e., the phase and amplitude. The reported OCTA construction algorithms can be classified into three categories, namely, phase, intensity, and complex-signal based algorithms [1, 21]. Phase-based algorithms utilizes the phase variance between sequential B-scans, such as the phase variance OCT (pvOCT) [22]. The advantage of phase-based algorithms is that it has higher sensitivity to detect capillary flow. However, phase noise in the form of bulk motion of tissue or the OCT laser source is prevalent. On the other hand, intensity-based algorithms leverage the amplitude or intensity-based signals. Examples of OCTA construction algorithms are the speckle variance OCT (svOCT), which uses the speckle noise to detect blood flow. An advantage of svOCT is that it is unaffected by phase noises. However, it is still sensitive to sample noise, e.g., breathing, heartbeat, and involuntary tissue bulk motion. Another algorithm is the split-spectrum amplitude-decorrelation angiography (SSADA) [23], which can overcome this bulk tissue motion noise by splitting the signal into sub interferograms with smaller bandwidths, however this method lowers the axial resolution. Complex-signal based OCTA algorithms combine both phase and intensity information to enable flow imaging. An example of complex-signal based OCTA algorithm is the optical microangiography (OMAG) [24]. The advantage of OMAG is that it uses the entire spectrum without loss of axial resolution, however OMAG requires more than two repetitive scans to generate the angiogram, while not an inherent disadvantage, repetitive scans will increase the image acquisition time, which can increase noise, e.g., eye movement, and involves phase, thus is also sensitive to phase noise [25].

b. OCTA Deep Learning

The recent studies that have explored the use of deep learning for OCTA construction have leveraged the repeated scans to improve the signal-to-noise-ratio (SNR). In Liu et al., they evaluate the performance of a deep learning algorithm trained on 4 consecutive B-scans compared to the SSPAGA algorithm that requires 48 consecutive B-scans [17]. In a follow-up study by the same group, by Jiang et al., demonstrated in addition to consecutive scans, the inclusion of SVC can improve the SNR [18]. In this study, they utilized three adjacent locations with four consecutive scans at each location to further improve the SNR. Thereby, suggesting that SVC may be useful for OCTA construction.

Few studies have explored the use of single-scan-volumetric OCT for OCTA construction. The study by Lee et al., employed deep learning for flow map construction using modified U-shaped convolutional neural networks (CNNs). In their deep learning pipeline, they employed a single OCT cross-sectional image (B-scan) as input and their corresponding ground truth was the corresponding OCTA B-scan [19]. The focus of their study was to demonstrate that deep learning-based OCTA can reduce the acquisition time in order to image a larger field-of-view (FOV) in human eyes. They were able to achieve success in visualizing the larger blood vessels, however for smaller vessels, i.e., the capillaries, the contrast was poor. In Li et al., they explored the use of generative adversarial networks (GANs) to generate OCTA [20]. In their study, the input into their model was optimized for an input OCT volume (three adjacent OCT B-scan) and of three OCTA output images. Their study demonstrated that SVC may help the model infer the vascular structures using GANs compared to CNNs. While they did report good quantitative results, the performance was validated only on animal models, i.e., rat retina, and the qualitative results did not reveal capillary level vessels.

We present a novel framework, SVC-Net, for the construction of OCTA for capillary level visualization using a single-scan-volumetric OCT. In principle, the blood flow should affect the reflectance brightness profile along the vessel direction. In other words, spatial intensity variance among the vessel in a single-scan-volumetric OCT can be equivalent to the temporal intensity variance of the same vessel location in the sequential images in the multi-scan-volumetric OCT for conventional OCTA construction. As illustrated in Fig. 1, the input into SVC-Net is comprised of neighboring OCT B-scans, and the output of SVC-Net is a single OCTA B-scan. To determine the optimal number of adjacent neighboring OCT B-scans as input into SVC-Net, we will perform an ablation study using SV calculation with varying numbers of adjacent neighbors, two, three and four neighbors, referred to as 2N, 3N and 4N, respectively. For comparative study, we will test the performance of the SVC-Net with the optimal number of adjacent B-scans and a single OCT B-scan input, referred to as 1N, to verify the effect of SVC on deep learning OCTA construction. We will also test the effect of different loss functions on the optimization and performance of the model. Our hypothesis is that unique combination of SVC and SSIM loss function will provide robust OCTA construction, to enable deep learning visualization of microcapillaries in single-scan-volumetric OCT.

a. Deep Learning Architecture

Figure 2 shows schematic illustration of the SVC-Net architecture. The input mode of the SVC-Net is $(m\times n\times k)$, where $m$, $n$, $k$ are the row, column, and depth of the input image, an example of SVC-Net for 3N, i.e., three adjacent B-scans is shown in Fig. 2A. SVC-Net follows an encoder-decoder architecture; the encoder is based on the EfficientNetB0 design [26] and performs the feature extraction from the OCT input images. The encoder network utilizes a combination of depth-wise convolutions, which results in fewer parameters thus minimizes the ability of the CNN to over-fit. The decoder network was custom designed to reconstruct the output OCTA image. Therefore, it employs Upsampling and 2D convolution operations. SVC-Net is comprised of different blocks; blocks being defined as a series of operation modules on a particular image resolution. In SVC-Net, the blocks are composed of 6 different operation modules. The operations in each module are illustrated in Fig. 2B. SVC-Net employs skip connections between the encoder to decoder which can help to improve the performance of the decoder by passing image features at different image resolutions from the encoder. Studies have demonstrated that the use of skip-connections can help avoid the vanishing gradient problem that is prevalent in deep learning training. The final layer of the decoder utilizes a linear activation function, as the output is a grayscale image, and therefore this model follows a pixel-wise regression task.

b. Loss Functions

The loss layer of a neural network compares the output of the network with the ground truth. In this paper, we evaluate the effect of two loss functions, mean-squared error (MSE) and structural similarity index measure (SSIM), on the performance of the model for OCTA construction. Therefore, in this section we define the MSE and SSIM loss functions.

i. Mean-Squared Error Loss

The most commonly used loss function for regression related tasks is MSE, which compares the pixels of the ground truth image to the predicted output image of the network. The MSE, which measures the quadratic mean of the overall pixel difference between the corresponding reference image and the predicted image. The advantage of using MSE is that there is only one global minimum and no local minima. The MSE is easy to compute, however it is sensitive to outliers. The formulation of MSE is detailed as follows:

$$\begin{array}{c}{L}_{MSE}=MSE=\frac{1}{MN} \sum _{i=1}^{M}\sum _{j=1}^{N}{\left({X}_{ij}-{Y}_{ij}\right)}^{2}\#\left(1\right)\end{array}$$

Where $({X}_{ij}-{Y}_{ij})$ denote the pixel-wise error difference between the predicted image, $X$, and the ground truth, $Y$, and $M$, $N$ denotes the number of rows and columns in the images [27].

ii. Structural Similarity Loss

The SSIM is a perceptually motivated function that takes into comparison distinct aspects of the image that is inspired by the human visual system (HVS), that is the luminance, contrast, and structure.

For two image patches, $x$ and $y$, being compared, the luminance parameter can be determined by:

$$\begin{array}{c}l\left(x,y\right)=\frac{2{\mu }_{x}{\mu }_{y}+{C}_{1}}{{\mu }_{x}^{2}+{\mu }_{y}^{2}+{C}_{1}}\#\left(2\right)\end{array}$$

The contrast parameter can be determined by:

$$\begin{array}{c}c\left(x,y\right)=\frac{2{\sigma }_{x}{\sigma }_{y}+{C}_{2}}{{\sigma }_{x}^{2}+{\sigma }_{y}^{2}+{C}_{2}}\#\left(3\right)\end{array}$$

The structural parameter is defined by:

$$\begin{array}{c}s\left(x,y\right)=\frac{{\sigma }_{xy}+{C}_{3}}{{{\sigma }_{x}\sigma }_{y}+{C}_{3}}\#\left(4\right)\end{array}$$

Where, ${\mu }_{x}$ and ${\mu }_{y}$ represents the mean, ${\sigma }_{x}$ and ${\sigma }_{y}$ represents the variance, and ${\sigma }_{xy}$ represents the covariance. The ${C}_{1}$, ${C}_{2}$, and ${C}_{3}$ are small regularization constants to avoid instability for image regions with local mean or standard deviation close to zero and are determined as follows:

$$\begin{array}{c}{C}_{1}={\left(0.01*L\right)}^{2}\#\left(5\right)\end{array}$$

$$\begin{array}{c}{C}_{2}={\left(0.03*L\right)}^{2}\#\left(6\right)\end{array}$$

$$\begin{array}{c}{C}_{3}=\frac{{C}_{2}}{2},\#\left(7\right)\end{array}$$

Where $L$ is the dynamic range of the image, e.g., for data type uint8, $L=255$.

Therefore, combining the different parameters together, the SSIM can be defined as:

$$\begin{array}{c}SSIM\left(x,y\right)={\left[l\left(x,y\right)\right]}^{\alpha }\cdot {\left[c\left(x,y\right)\right]}^{\beta }\cdot {\left[s\left(x,y\right)\right]}^{\gamma }\#\left(8\right)\end{array}$$

Where $x$ and $y$ are the predicted and ground truth images, respectively. Variables, $\alpha$, $\beta$, and $\gamma$ are parameters to define the relative importance of the three components. For the purpose of this study, $\alpha =\beta =\gamma =1$. SSIM can also be written as the following:

$$\begin{array}{c}SSIM\left(x,y\right)=\frac{\left(2{\mu }_{x}{\mu }_{y}+{C}_{1}\right)\left(2{\sigma }_{xy}+{C}_{2}\right)}{\left({\mu }_{x}^{2}+{\mu }_{y}^{2}+{C}_{1}\right)\left({\sigma }_{x}^{2}+{\sigma }_{y}^{2}+{C}_{2}\right)}\#\left(9\right)\end{array}$$

To formulate SSIM as a loss function, we define it as follows:

$$\begin{array}{c}{L}_{SSIM}=1-SSIM\left(x,y\right)\#\left(10\right)\end{array}$$

c. Evaluation Metrics

In this study, to evaluate the performance of each model, we will measure the peak signal to noise ratio (PSNR) and the multiscale structural similarity index measure (MS-SSIM).

i. Peak Signal to Noise Ratio

The PSNR is defined as a relationship between the maximum intensity value and the mean-squared-error. Therefore, the PSNR can be determined as:

$$\begin{array}{c}PSNR=10\cdot \text{log}\left(\frac{Ma{x}_{I}^{2}}{MSE}\right)\#\left(11\right)\end{array}$$

Where the $Ma{x}_{I}$ is the maximum pixel intensity, i.e., for a uint8 image is 255, and $MSE$ is formulated in (1).

ii. Multiscale Structural Similarity Index Measure

The MS-SIM is an extension of SSIM, which utilizes three different visual perception parameters, namely the luminance, contrast, and structural parameters for localized regions. The MS-SSIM evaluates SSIM at multiple scales, i.e., at different resolutions and visual perceptions. In order to obtain MS-SSIM, an iterative approach is employed where the reference and the output images are scaled $M$-1 times and down sampled by a factor of two after each iteration. The contrast and structural parameter are calculated at each scale while the luminance parameter is computed only at the $M$-th scale. The formulation of MS-SSIM is as follows:

$$\begin{array}{c}MS-SSIM\left(x,y\right)={\left[{l}_{M}\left(x,y\right)\right]}^{{\alpha }_{M}}\cdot \prod _{j=1}^{M}{\left[{c}_{j}\left(x,y\right)\right]}^{{\beta }_{j}}{\left[{s}_{j}\left(x,y\right)\right]}^{{\gamma }_{j}}\#\left(12\right)\end{array}$$

d. Implementation Details

The training procedure utilized the Adam optimizer with a learning rate of 0.0001, the loss function used to train the model was either the MSE or SSIM, and a batch size of 32. Transfer learning was employed using pretrained weights from the ImageNet Dataset for the encoder, and the decoder was initialized with random weights. Data augmentation, in the form of, i.e., horizontal flips, zoom, vertical and horizontal shifting, was used to prevent overfitting. The deep learning implementation was on a workstation with Ubuntu operating system (v20.04). Three NVIDIA Quadro 6000, graphics processing units (GPU) were utilized. All deep learning implementations were performed using Keras API with Tensorflow backend (v2.9.1).

a. Datasets

In this study, two dataset types, i.e., from animal and human subjects, were obtained from custom designed OCT systems. Two separate custom OCT systems were used for the imaging of animal and human subjects.

i. Animal Dataset

For animal data acquisition, this study collected OCT and OCTA data from mice eyes. Briefly, the mouse custom SD OCT system utilized a near-infrared (NIR) superluminescent diode (SLD; λ = 810 nm; Δλ = 100 nm; Superlum, Carrigtwohill, County Cork, Ireland) light source. A line CCD camera with 2048 pixels (AViiVA EM4; e2v Technologies, Chelmsford, United Kingdom) was used for recording OCT spectra in the custom-build OCT spectrometer. The frame rate of the camera was set to 50 kHz. The axial and lateral resolution were theoretically estimated as 2.9 and 11 µm, respectively. The power illuminated at the mouse cornea was ~ 1 mW.

For the image acquisition of wild type mice (strain: C57BL/6J), an anesthetic agent was intraperitoneally induced by a mixture of ketamine (100 mg / kg body weight) and xylazine (5 mg / kg body weight) and a drop of 1% tropicamide ophthalmic solution (Akorn, Lake Forest, Illinois) was applied to the imaging eye. Next, a cover glass (12-545-80; Microscope over glass, Fisherbrand, Waltham, Massachusetts) with a drop of eye gel (Severe; GenTeal, Novartis, Basel, Switzerland) was placed on the imaging eye. After the mouse was completely anesthetized, the head was fixed by a bite bar and ear bar in the animal holder that provided five degrees of freedom (i.e., x, y, z, pitch, and roll). Volumetric raster scans were acquired from the various retinal quadrants, e.g., dorsal, ventral. Four repeated B-scans at each slow-scan position were collected for OCTA construction; thus, each OCT volume consisted of 4 × 600 × 600 A-scans and covered a FOV of 1.2 × 1.2 mm². All animal experiments were approved by the local animal care and biosafety office and performed following the protocols approved by the Animal Care Committee (ACC) at the University of Illinois at Chicago (ACC Number: 19–044). This study followed the Association for Research in Vision and Ophthalmology Statement for the Use of Animals in Ophthalmic and Vision Research. For this dataset, we collected a total of 24 OCT datasets from 6 mice, corresponding to one eye and four retinal quadrants, i.e., dorsal, nasal, ventral, and temporal. Datasets that had inhomogeneous brightness and other imaging artifacts were excluded. Therefore, 16 volumes comprised the dataset, 9 volumes for training, 1 volume for validation, and 6 volumes for testing.

ii. Human Dataset

Briefly, the human custom SD OCT system utilized the same NIR SLD λ = 810 nm; Δλ = 100 nm; Superlum, Carrigtwohill, County Cork, Ireland). A line CCD camera with 2048 pixels (AViiVA EM4; e2v Technologies, Chelmsford, United Kingdom) was used for recording OCT spectra in the custom-build OCT. spectrometer. The frame rate of the camera was set to 70 kHz. The axial and lateral resolution were theoretically estimated at 1 and 10 µm, respectively. The illumination power on the human cornea was ~ 600 µW. For the image acquisition of human subjects, no anesthetic agent was used. A custom chin rest was employed to reduce head movements, a fixation target with a dim red light was used to minimize voluntary eye movements, and a pupil camera was used to aid in retinal localization by the photographer. Volumetric raster scans were acquired from the macular. Four repeated B-scans at each slow-scan position were recollected for OCTA construction; thus, each OCT volume consisted of 4 × 300 × 300 A-scans and covered a FOV of 3 × 3 mm². All human experiments were approved by the Institutional Review Board of the University of Illinois at Chicago and were in pursuance with the ethical standards stated in the Declaration of Helsinki. For this dataset, we collected a total of 16 eyes from 8 healthy subjects. Datasets with inhomogeneous brightness and severe motion artifact were excluded, for this study 10 eyes comprised the dataset, since each eye contains 4 OCT volumes, all of the repeated volumes were used for training as a form of data augmentation for the training dataset. Therefore, 16 volumes were used for training, 1 volume for validation and 5 volumes for testing.

iii. OCTA construction

The OCT scan pre-processing starts registration of the OCT volume. The method that was employed for frame registration the Discrete Fourier Transform (DFT) registration method [28]. Since the OCT volume contains multiple repeated scans, the first step is to perform intra-frame registration, where each repetitive scan is registered to the first scan. This process is repeated for all scans. Next, inter-frame registration is performed to register each of the scans within the volume. After the OCT Volume Pre-processing, OCTA images were constructed by implementing an intensity-based speckle variance (SV) processing for both mouse and human OCT volumes with the following equation [29]:

$$\begin{array}{c}S{V}_{ij}=\frac{1}{N} \sum _{i}^{N}{\left[{I}_{ijk}\left(x,z\right)-\frac{1}{N} \sum _{i}^{N}{I}_{ijk}(x,z)\right]}^{2}=\frac{1}{N} \sum _{i}^{N}{\left[{I}_{ijk}-{\left({I}_{mean}\right)}_{jk}\right]}^{2}\#\left(13\right)\end{array}$$

Where $i$, $j$ and $k$ are the indices of the frames, lateral resolution, and depth pixel of the OCT B-scan, respectively. $N$ is the number of frames used in the calculation. ${\left({I}_{mean}\right)}_{jk}$ is the averaged frame of $N$ frames.

iv. Enface Processing

For enface images, the superficial vascular plexus (SVP) is demarcated by the NFL-GCL and the IPL-INL boundaries, and the deep vascular plexus (DVP) is demarcated by the IPL-INL and the OPL-ONL boundaries. For mouse dataset, retinal flattening was performed by realigning each A-line. Since the retinal quadrants, e.g., dorsal, of the mouse retina was smooth, the boundaries of the SVP and DVP were manually segmented. For human dataset, retinal layer segmentation was performed using the Iowa Reference Algorithms (Retinal Image Analysis Lab, Iowa Institute for Biomedical Imaging, Iowa City, IA) to determine the boundaries for the SVP and DVP [30–32]. The enface images were generated using average intensity projection. Image processing was performed on MATLAB R2021b (MathWorks, Natick, Massachusetts) with image processing packages in ImageJ [33].

b. Results

i. TVC and SVC ablation study

In this study, we will perform an ablation study to evaluate the effects of the different numbers of b-scans used for OCTA construction using the temporal vascular connectivity (TVC), i.e., repeated b-scans, and the SVC, i.e., adjacent neighbor b-scans. For both TVC and SVC will use Eq. 13, however the number of frames $N$ for TVC is representative of the number of repeated b-scans, whereas for SVC is representative of the number of adjacent neighbors.

This procedure will help to determine the optimal number of neighboring b-scans to be used as input into SVC-Net. For this study, each volume contains four repeated b-scans, therefore for qualitative comparison, we illustrate OCTA from SV generated using 2N, 3N and 4N using TVC and SVC as illustrated in Fig. 3 and Fig. 4, for animal and human dataset, respectively. It can be qualitatively observed that for the TVC, as the number of repeated b-scans increases, there are lower levels of noise as illustrated by Fig. 3A-C and Fig. 4A-C. On the other hand, for SVC there is a different trend, it can be observed that the SVC-3N has improved vascular detail as compared to the SVC-2N. However, SVC-4N as compared to SVC-3N, decreases the vascular detail due to the increase in a blur effect, which can be visibly observed in the representative enface for human dataset, Fig. 4F. This type of artifact is commonly described in OCTA as ‘vessel doubling’ due to poor registration. In the case of SV-4N, since the vessels are not completely duplicated, we can refer to this artifact as pseudo vessel doubling.

For the quantitative comparison, since the TVC-4N has the best qualitative performance, it will be the ground truth for comparison. We quantify the MS-SSIM and the PSNR for the TVC 2N and 3N, and the SVC 2N, 3N and 4N enface images. For the animal dataset, we confirm our qualitative observations that increasing the number of b-scans improves performance, as the TVC-3N has the best overall MS-SSIM and PSNR (Fig. 5A-B). Whereas, for the SVC, we observe that for the animal dataset, there is a decreasing trend for the MS-SSIM from SVC-2N to SVC-4N. However, we observe for the PSNR, the optimal number of b-scans was the SVC-3N. For the human dataset, we observe similar trends as compared to the mouse dataset, with the TVC has an improved performance on both metrics when using more b-scans (Fig. 5C-D). For the SVC metrics, we observe that for both the MS-SSIM and PSNR, the optimal number of b-scans was using SVC-3N, since as it has better performance than SVC-2N and SVC-4N. Therefore, for SVC-Net, based on qualitative observation and quantitative analysis we will use a 3N input, i.e., comprised of three adjacent neighboring b-scans.

ii. Comparison on Animal Dataset

On the animal dataset, we compared the effects of SVC and the loss function for OCTA construction using SVC-Net. For comparison of the effects of SVC, both the 1N and 3N models trained on the SSIM were compared. The first qualitative comparison on the initial output of SVC-Net is the OCTA B-scan. The qualitative differences in the B-scans for the 1N and 3N models as illustrated in Fig. 6. We observe that for the large vessels, both the 1N and 3N models, are able to generate with minimal loss in structure, as denoted by the red arrows in Fig. 6C-E. However, we observe that in the 1N model, the smaller capillary level vessels were not predicted with similar intensity values (Fig. 6F2). In contrast, the 3N model was able to predict the smaller capillaries in greater detail (Fig. 6F3).

The primary usage of OCTA is to observe the enface projections of the retinal vascular layers, therefore, we perform both qualitative and quantitative analyses for the enface projection of the SVP and DVP. Examples of the retinal layer segmentation and flattening to generate the enface projections are illustrated in Fig. 6A-B. To compare the effects of the SVC, we qualitatively compare the 1N and 3N models to the ground truth enface of the SVP and DVP in Fig. 7. We notice a consistent observation as the B-scans, in that the large vessels of the SVP were constructed in both the 1N and 3N models. However, an example of a large vessel that progressed to a smaller vessel was observed to have poor construction in the 1N model, encapsulated by the green box in Fig. 7A2. Whereas in the 3N model, the same vessel was constructed properly, as illustrated by the yellow box in Fig. 7A3. Showing that SVC helped to preserve the details of smaller vessels. For the DVP, we observe that the 1N model was able to predict some capillary structures, however due to the poor contrast, it has a noisier appearance compared to the 3N model, which produced finer detailed capillaries.

In this study, we also compare the effects of the different loss functions, i.e., MSE and SSIM loss functions, for OCTA construction on the 1N and 3N models. Examples output enfaces from a mouse eye of each model are illustrated in Fig. 8. Comparison between the 1N models trained with MSE and SSIM, we observe a discernable increase in noise when the model is trained with the MSE loss function Fig. 8A2. In comparison to the 3N models trained with MSE and SSIM, both models have lower levels of noise, with the 3N model trained with SSIM having the overall best contrast (Fig. 8A5, B5). In the DVP, we can observe that the 1N trained with SSIM (Fig. 8B3) has better capillary level structure as compared to the 1N trained with the MSE (Fig. 8B2), and when SVC is employed, it improves the contrast of the capillaries (Fig. 8B4-B5).

The evaluation metrics, MS-SSIM and PSNR, were quantified on both the SVP and DVP enfaces to quantitatively compare the performances of the four models summarized in Fig. 9. For the MS-SSIM metric, it can be observed in the mouse dataset that the 1N model trained with MSE had the lowest performance, followed by the 1N model trained with the SSIM loss function, which had a slight improvement (Fig. 9A). The introduction of the SVC significantly improved the similarity between ground truth and predicted enface images. The 3N model trained with MSE had significantly better results than both 1N models. With the modification of the loss function, the 3N model trained with SSIM had the best performance.

For the PSNR measurement, we observed that in the SVP, there were only slight differences between the loss functions while using the same input types (single or neighbored inputs) (Fig. 9B). This may be due to the presence of large vessels in the SVP, which regardless of the loss functions, the CNN was able to predict consistently and had maximum pixel intensity, e.g., 255. However, when observing the quantitative evaluations for the DVP, we observe PSNR distinguishable improvements of the loss function and input type on the model’s performance. This may be due to the abundance of smaller capillary level structure, where if the predicted image had an increase in noise, it can be reflected in the PSNR value. The use of SVC and the SSIM loss function reduced the noise level and improved the PSNR. For the mouse dataset, the 3N model trained with SSIM loss function had the best overall performance.

iii. Comparison on Human Dataset

On the human dataset, we also compared the effects of the SVC and the loss function for OCTA construction using SVC-Net. For qualitative comparison of the initial output of SVC-Net, the B-scans, we compare the 1N and 3N models trained on the SSIM (Fig. 10). We observe that the 1N and 3N models can predict both the large and small vessels. However, the 1N model does have some areas with poor predictions as denoted by the yellow arrows in Fig. 10E1-E3, where the 1N model is missing the bright vessel signal as compared to the 3N and ground truth. In comparison to the ground truth, the 3N model is able to predict more of the vascular structure (Fig. 10E3).

For qualitative and quantitative comparison, we segment the layers into SVP and DVP. Example of retinal layer segmentation are illustrated in Fig. 10A. To demonstrate the effects of SVC, we compare the enface SVP and DVP projections between the 1N and 3N model in Fig. 11. We can observe that for the human dataset, both the 1N and 3N models can predict both the large and small vessel structures in the SVP. It can be observed in the enlarged images of the SVP in Fig. 11, that the 1N model has poor contrast for the smaller vessels. In the DVP, we can observe that the contrast in the 3N model is better than the 1N model as demonstrated in Fig. 11B2-B3. This correlates with the observations in the B-scan outputs, the 1N model have some mispredictions and suggests that the SVC improves the model’s performance to predict the finer structural details in capillary level vessels.

Next, we compare the effects of the loss functions on the model’s performance in a human eye as illustrated in Fig. 12. We observe that the for the 1N models, the model trained with the MSE loss function has higher levels of noise and poorer contrast as compared to the 1N model trained with the SSIM loss function. The modification of the loss function improved the 1N’s performance to produce the capillary level structures in higher contrast (Fig. 12B2-B3). Similar observations can be seen for the 3N models. The model trained with the MSE loss function, in the SVP, we can observe that some of the capillary level structures are predicted, however there is relatively more noise (Fig. 12A4). Whereas in the model trained with the SSIM loss function, the noise level is reduced (Fig. 12A5). We can also observe that in the DVP, the capillary level structures seem more dilated, this could be due to the lower levels of contrast between the fine vessels (Fig. 12B4). The SSIM trained model is able to produce vessel structures with higher contrast (Fig. 12B5).

In this study, we measured the MS-SSIM and PSNR between the different models, 1N and 3N, trained on different loss functions and for the different vascular layers, SVP and DVP (Fig. 13). We can observe that for the MS-SSIM in both the SVP and DVP, there is a discernable trend of increasing performance, with the 1N model trained with MSE having the worst performance and the 3N model trained with SSIM having the best performance in this study. For the PSNR, we observe a similar trend with increasing performance. However, it was observed that for the 3N models, there were no discernible differences between the loss functions for the SVP. This observation may be due to the fact that in the SVP, there is the presence of large vessels, which may already saturate a substantial portion of the pixels, resulting in similar PSNR values. However, we can observe noticeable PSNR improvements in the DVP, where there are finer vessel structures. In the human dataset, the 3N model trained with SSIM had the best performance.

In this study, we reported a fully automated FCN, SVC-Net, for OCTA construction that leverages the use of spatial vascular connectivity in OCT for vascular structure prediction using a single OCT volume. We quantitatively determined the optimal number of adjacent b-scans for the input into SVC-Net, and the differences in the number of b-scans used for SV calculation between TVC and SVC. We conclude that three adjacent neighbors, 3N, is the most optimal input into SVC-Net. We quantitatively compare the effects of SVC by comparing performance of using two different inputs, single OCT B-scan input,1N, and three adjacent OCT B-scan inputs, 3N. We demonstrate that the 3N model has superior performance compared to the 1N model. In addition, we also compare the effects of different loss functions, i.e., MSE and SSIM loss function, on the model’s performance. Our study demonstrates that the SSIM loss function has superior performance over the MSE loss function. Our proposed method has been trained and tested on both animals and human OCT datasets. The ability to leverage single OCT volumes to generate OCTA can increase the speed of image acquisition by alleviating the need to for multiple repetitions, reduce eye movement and can potentially increase the FOV.

a. Spatially Vascular Connectivity

OCTA construction requires acquisition of multiple OCT requisitions at the same imaging location, which therefore limits the imaging speed and FOV. In this study, we performed an ablation study to compare the effects of using different number of adjacent b-scans using SV calculation for OCTA construction. For quantitative comparison, the 2N and 4N had more noise compared to the 3N. Qualitative observation, in particular for the human dataset, 4N results in a pseudo vessel doubling artifact due to the larger area used for SV calculation. Therefore, 3N had the optimal performance and was chosen as input into SVC-Net.

This observation has theoretical support, in that the adjacent b-scans correspond to both spatial and temporal differences. Therefore, it carries information that can be used to estimate areas of hemodynamic changes, i.e., vascular tissue. For the SV calculation, the method uses a vector to determine the OCTA, i.e., for 3N it uses a vector of length 3. In principle using an FCN, the model can leverage a localized region. For example, the standard convolutional filter size is of $3\times 3\times N$, as an input into the first layer of the SVC-Net, the FCN is using a localized region of $3\times 3\times 3$. In addition, as the information is carried through the FCN, global information is also used in the decision-making process. Therefore, the FCN can better predict vascular tissue compared to the SV method due to the larger number of pixels it can leverage. The performance of SVC-Net using the 3N model reveals improved vascular connectivity compared to the SV-3N method, supporting the hypothesis that the FCN is able to leverage a larger number of pixels for vessel prediction. On the human dataset, we do note that the smaller vessels in the SVP have less contrast compared to the DVP in the deep learning models. This may be due to the strong signal from the NFL which may minimize the signal for the smaller vessels in the SVP. In the DVP, the vessel structure between the different models, i.e., 1N and 3N, are similar because the DVP is bounded by two hypo-reflective layers, namely the INL and ONL. Therefore, the contributing signal for vascular prediction can be clearly determined by the CNN.

There have been a limited number of studies that have explored methods to alleviate this limitation using deep learning. In Lee et al., they demonstrated the single OCT B-scan input for OCTA construction in a human dataset using a similar U-Net type model [19]. In their work, they demonstrated that using an input-B-scan to output B-scan strategy, they can primarily predict the large blood vessels. Whereas the smaller capillary sized vessels have poor contrast and higher levels of noise. The results in this study for single OCT input are consistent with the results presented in Lee et al. This could primarily be due to the large vessels having better contrast compared to the smaller vessels in the OCT B-scan. In the study by Li et al., they demonstrate an input volume to output volume strategy in animal models using a GAN [20]. Where the input is three adjacent OCT B-scans, and the output is three adjacent OCTA B-scans. While their results did not demonstrate capillary level vessel structures, they do demonstrate that the use of SVC can help the deep learning model to predict with higher performance metrics. The results in our study for SVC demonstrate capillary level vessels in both animal and human datasets. Overall, our methodology differs from the two aforementioned studies, in that our strategy follows an input-volume to output B-scan strategy. The connectivity between the adjacent B-scans can provide the required information to accurately predict vessels of varied sizes.

b. Loss Function on CNN Optimization

In deep learning, there are many different hyperparameters that can be optimized for improved performance. Many studies often focus on the network architecture design, e.g., the depth or width of the network, or they develop different operations, e.g., atrous convolutions, depth-wise convolutions etc. While all of these hyperparameters play a role in the model’s performance, one of the most fundamental hyperparameter of a CNN is the loss function layer. The choice of the loss function ultimately drives CNN’s ability to learn its intended task [34]. In this study, we performed an ablation study to compare two loss functions, the MSE and SSIM. The results of our study, when compared to Lee et al. [19], using a single input and optimized with the MSE loss function on the animal and human dataset demonstrate mainly large vessels are predicted and the smaller vessels have poorer contrast. In our study, when we optimize the model using the SSIM loss function, we can observe a lower level of noise and improved vascular prediction for the animal dataset and human dataset. There are also quantifiable differences as measured using the PSNR, we can observe an improved PSNR for both the SVP and DVP between the MSE and SSIM models.

Traditionally MSE has been used for image reconstruction tasks. The MSE compares the ground truth and the predicted CNN image at the individual pixel level. The MSE models a quadratic function, therefore, it can be easy to optimize due to its singular global minima characteristic. However, in many cases an individual pixel is related to its surrounding pixel. In this case there is a limitation to how much the MSE can optimize the deep learning model. On the other hand, the SSIM as a loss function evaluates three different parameters, the luminance, contrast, and structure for a localized patch in the image. SSIM has been extensively used as an image quality metric. Therefore, it reasons that applying the SSIM as a loss function in image construction tasks can better optimize the deep learning model. When we combine the SVC and the loss function, we achieve the best performing model. The model is trained to use the localized connectivity of the input, i.e., the adjacent B-scans and is further optimized to achieve localized structural similarity in the predicted OCTA B-scan.

c. Limitations

We have proposed a novel approach for OCTA construction using a single OCT volume for capillary level visualization. However, there are some limitations with this study, for each of the dataset type (animal or human), the study is limited to a single OCT device. To demonstrate the generalization of this method validation on different devices should be implemented. In addition, as an initial study, the dataset is limited to healthy eyes, in particular for human subjects. For future considerations, we would need to evaluate this method in different eye conditions and disease states. Different eye conditions may affect the connectivity of the vasculature and therefore need to be further elucidated. Another point to consider is that this method is primarily validated for OCTA constructed using the SV method, future studies should consider validating this method for other types of OCTA construction algorithms, e.g., OMAG and SSADA.

The SVC-Net for deep learning construction of microcapillary resolution OCTA from single-scan-volumetric OCT has been developed and validated. Comparative study shows that the SVC in single-scan-volumetric OCT provides equivalent information to the TVC in multi-scan-volumetric OCT for robust OCTA construction. The SSIM loss function provides superior performance, compared to MSE loss function, to optimize deep learning visualization of microstructures, such as microcapillaries, in single-scan-volumetric OCT. The combination of SVC involvement and SSIM loss function enabled robust OCTA construction from single-scan-volumetric OCT. With single-volumetric-scan OCT for rapid OCTA construction, the SVC-Net holds great promise to increase the imaging speed, and thus to enable rapid wide field OCTA and dynamic monitoring of vascular changes to advance clinical management of eye diseases.

Choi, W.J., Imaging motion: a comprehensive review of optical coherence tomography angiography. Advanced Imaging and Bio Techniques for Convergence Science, 2021: p. 343-365.
De Carlo, T.E., et al., A review of optical coherence tomography angiography (OCTA). International journal of retina and vitreous, 2015. 1(1): p. 1-15.
Le, D., et al., Transfer learning for automated OCTA detection of diabetic retinopathy. Translational Vision Science & Technology, 2020. 9(2): p. 35-35.
Heisler, M., et al., Ensemble deep learning for diabetic retinopathy detection using optical coherence tomography angiography. Translational Vision Science & Technology, 2020. 9(2): p. 20-20.
Zang, P., et al., A Diabetic Retinopathy Classification Framework Based on Deep-Learning Analysis of OCT Angiography. Translational vision science & technology, 2022. 11(7): p. 10-10.
Motozawa, N., et al., Optical coherence tomography-based deep-learning models for classifying normal and age-related macular degeneration and exudative and non-exudative age-related macular degeneration changes. Ophthalmology and therapy, 2019. 8(4): p. 527-539.
Thakoor, K., et al. Hybrid 3d-2d deep learning for detection of neovascularage-related macular degeneration using optical coherence tomography B-scans and angiography volumes. in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI). 2021. IEEE.
Thakoor, K.A., et al., A multimodal deep learning system to distinguish late stages of AMD and to compare expert vs. AI ocular biomarkers. Scientific reports, 2022. 12(1): p. 1-11.
Bowd, C., et al., Deep Learning Image Analysis of Optical Coherence Tomography Angiography Measured Vessel Density Improves Classification of Healthy and Glaucoma Eyes. American Journal of Ophthalmology, 2022. 236: p. 298-308.
Bowd, C., et al., Deep-learning enface image classifier analysis of optical coherence tomography angiography images improves classification of healthy and glaucoma eyes. Investigative Ophthalmology & Visual Science, 2021. 62(8): p. 1024-1024.
Schottenhamml, J., et al., Glaucoma classification in 3 x 3 mm en face macular scans using deep learning in a different plexus. Biomedical Optics Express, 2021. 12(12): p. 7434-7444.
Gao, M., et al., Reconstruction of high-resolution 6× 6-mm OCT angiograms using deep learning. Biomedical Optics Express, 2020. 11(7): p. 3585-3600.
Gao, M., et al., An Open-Source Deep Learning Network for Reconstruction of High-Resolution OCT Angiograms of Retinal Intermediate and Deep Capillary Plexuses. Translational Vision Science & Technology, 2021. 10(13): p. 13-13.
Alam, M., et al., AV-Net: deep learning for fully automated artery-vein classification in optical coherence tomography angiography. Biomedical optics express, 2020. 11(9): p. 5249-5257.
Gao, M., et al., A Deep Learning Network for Classifying Arteries and Veins in Montaged Widefield OCT Angiograms. Ophthalmology Science, 2022. 2(2): p. 100149.
Abtahi, M., et al., MF-AV-Net: an open-source deep learning network with multimodal fusion options for artery-vein segmentation in OCT angiography. Biomedical Optics Express, 2022. 13(9): p. 4870-4888.
Liu, X., et al., A deep learning based pipeline for optical coherence tomography angiography. Journal of Biophotonics, 2019. 12(10): p. e201900008.
Jiang, Z., et al., Weakly supervised deep learning-based optical coherence tomography angiography. IEEE Transactions on Medical Imaging, 2020. 40(2): p. 688-698.
Lee, C.S., et al., Generating retinal flow maps from structural optical coherence tomography with artificial intelligence. Scientific reports, 2019. 9(1): p. 1-11.
Li, P.L., et al. Deep learning algorithm for generating optical coherence tomography angiography (OCTA) maps of the retinal vasculature. in Applications of Machine Learning 2020. 2020. SPIE.
Zhang, A., et al., Methods and algorithms for optical coherence tomography-based angiography: a review and comparison. Journal of biomedical optics, 2015. 20(10): p. 100901.
Schwartz, D.M., et al., Phase-variance optical coherence tomography: a technique for noninvasive angiography. Ophthalmology, 2014. 121(1): p. 180-187.
Jia, Y., et al., Split-spectrum amplitude-decorrelation angiography with optical coherence tomography. Optics express, 2012. 20(4): p. 4710-4725.
An, L., J. Qin, and R.K. Wang, Ultrahigh sensitive optical microangiography for in vivo imaging of microcirculations within human skin tissue beds. Optics express, 2010. 18(8): p. 8220-8228.
Onishi, A.C. and A.A. Fawzi, An overview of optical coherence tomography angiography and the posterior pole. Therapeutic advances in ophthalmology, 2019. 11: p. 2515841419840249.
Tan, M. and Q. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. in International conference on machine learning. 2019. PMLR.
Ahmed, S., et al., ADC-Net: An Open-Source Deep Learning Network for Automated Dispersion Compensation in Optical Coherence Tomography. Frontiers in Medicine, 2022. 9.
Guizar, M., Efficient subpixel image registration by cross-correlation. MATLAB Central File Exchange, 2020.
Son, T., et al., Optical coherence tomography angiography of stimulus evoked hemodynamic responses in individual retinal layers. Biomedical optics express, 2016. 7(8): p. 3151-3162.
Abràmoff, M.D., M.K. Garvin, and M. Sonka, Retinal imaging and image analysis. IEEE reviews in biomedical engineering, 2010. 3: p. 169-208.
Li, K., et al., Optimal surface segmentation in volumetric images-a graph-theoretic approach. IEEE transactions on pattern analysis and machine intelligence, 2005. 28(1): p. 119-134.
Garvin, M.K., et al., Automated 3-D intraretinal layer segmentation of macular spectral-domain optical coherence tomography images. IEEE transactions on medical imaging, 2009. 28(9): p. 1436-1447.
Schneider, C.A., W.S. Rasband, and K.W. Eliceiri, NIH Image to ImageJ: 25 years of image analysis. Nature methods, 2012. 9(7): p. 671-675.
Zhao, H., et al., Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging, 2016. 3(1): p. 47-57.

There is NO Competing Interest.

Download PDF

Journal Publication

published 09 Feb, 2024

Read the published version in Communications Engineering →

Version 1

posted

You are reading this latest preprint version

SVC-Net: A spatially vascular connectivity network for deep learning construction of microcapillary angiography from single-scan-volumetric OCT

Status:

Journal Publication

Version 1

Abstract

Figures

I. Introduction

Ii. Related Work

Iii. Method

Iv. Experiments And Results

V. Discussion

Vi. Conclusion

References

Additional Declarations

Status:

Journal Publication

Version 1