Hybrid Optical Diffractive Neural Networks (HODNNs) for Speckle Reconstruction and Physical Auto-encoding

doi:10.21203/rs.3.rs-1869694/v1

Download PDF

Research

Hybrid Optical Diffractive Neural Networks (HODNNs) for Speckle Reconstruction and Physical Auto-encoding

https://doi.org/10.21203/rs.3.rs-1869694/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

All-optical Diffractive Deep Neural Network (D2NN) architecture can learn to implement various functions after the deep learning-based design of passive diffractive layers. D2NN offers unique advantages for parallel processing, speed of light processing, and low power consumption. Here, we extend by proposing Hybrid Optical Diffractive Neural Networks (HODNNs) for speckle reconstruction and physical auto-encoding. HODNNs include an all-optical Diffractive Deep Neural Network (D2NN) for light-speed, energy-efficient phase information extractions and electrical convolutional neural networks (CNNs) for high-performance feature extraction with less hardware complexity. The performance of HODNNs for speckle reconstruction and lens imaging is very comparative with NPCC, SSIM, and PSNR reach -0.98, 0.98, and 27dB, respectively. In the future, the presented frameworks can be used for deep tissue imaging and lensless microscopic imaging.

Hybrid Optical Diffractive Neural Networks (HODNNs)

Speckle Reconstruction

Physical Auto-encoding

Optical Artificial Intelligence

In recent years, deep learning is the fastest growing machine learning method. An artificial neural network takes a biological neural network as a model and designs corresponding algorithms to simulate some intelligent activities of the human brain. Because it can analyze and reason a large amount of data, it can perform various high-level tasks and has brought enormous conveniences to human life such as image analysis [1, 2, 3], face recognition [4], autopilot [5], language translation [6], and etc.

With the increase in data processing capacity and computing resources, the development of electronic chips is approaching the physical limit, and Moore's law of electronic computing is slowing down [7]. Therefore, the traditional electric neural network is also limited. To solve this problem, Lin et al. propose a Diffractive Deep Neural Network (D2NN) [8] where the neural network is physically formed by multiple layers of diffractive surfaces that work in collaboration to optically perform the classification of handwritten digits and imaging reconstructions as physical auto-encoder. At the same time, they also proved that D2NN has the advantage of depth, the effect was improved with the increase of network layers, and nonlinear materials can be used to achieve optical nonlinearity [9].

However, the D2NN structure can only operate in real space, with limited processing performance for some higher-level tasks with fewer advantages over the traditional electric neural network. Therefore, D2NN in Fourier space is proposed, which can preserve spatial communication and facilitate the mapping task from image to image. Although it shows advantages in cell segmentation and handwritten digit recognition, the disadvantage is that its structure is too complex. Due to multiple 2F systems, the network needs at least10 layers to achieve the best effect, which means that there are 10 lenses in the network, and the distance between the two layers is only 1mm. By placing the cumbersome diffractive modulation layers on both Fourier space and real space, a ten-layer nonlinear hybrid D2NN configuration has a classification accuracy of 98.1%. But as the number of layers increases, it decreases the performance of saliency detection and reduces the feasibility of physical implementation [10].

In recent research, there are many ways to reconstruct the network structures [11, 12, 13, 14, 15, 16]. For example, the Res-D2NN framework [17] was inspired by Res-Net with residual connections [18]. The framework aims to solve the problem of gradient extinction and gradient explosion through learnable light shortcut paths. But for a shallow neural network, the gradient vanishing issue is not that severe, and the contributions of those light shortcut paths are minor, In the classification task, the improvement in accuracy is only 0.5% for the 5-layer and 0.2% for the 10-layer networks with optical shortcuts paths, and 1.1% and 2.4% for the 15-layer and 20-layer networks, respectively, so the number of layers need to be increased to achieve a large improvement. Thus, the hardware complexity and optical information loss can be increased due to the need for more than 20-layer. In addition, the increase in the number of optical shortcuts can also lead to higher requirements on the actual physical environment of the model, thus widening the gap between the actual effect and the simulated effect. Therefore, to avoid the drawbacks of the residual part in the actual deployment, we can design the residual block as a part of the electrical neural network [19, 20]. Zhou T proposed an in situ optical backpropagation training method [21, 22] to solve the error problem from simulation to actual deployment. The phase of neurons in each layer is modulated by cascaded phase SLM, and the output of neurons in each layer is input to the sensor in the corresponding layer through the optical path. The output amplitude and phase were calculated by the four-step phase-shift method. The disadvantage of this model is that the actual deployment requires a large number of SLM and sensors, which is expensive and hardly be applied on a large scale in real life. Therefore, the optimal solution could be the fine feature extraction part can be processed by an electric neural network, and the low-priced diffraction plate made by the lithography can be used as a light-speed, paralleling pre-processing unit.

In the paper, we design all-optical cascaded U-net neural networks and demonstrate their application in gray-scale digital imaging as a lens and speckle reconstruction using the MNIST and ImageNet datasets. Here we name them Hybrid Optical Diffractive Neural Networks (HODNNs), the U-net includes two types: convolutional block-based U-net and residual block-based U-net, which are pre-cascaded by a multilayer all-optical network. About the training process of the model: we first trained the all-optical neural network, normalized the phase information through a layer of Tanh, and then enter the U-net network for further feature extraction with Softmax as the final output layer. Compared with the previous D2NN and Fourier-domain D2NN (F-D2NN), HODNNs not only have the advantages of low power consumption, light-speed processing, and high throughput of light but also have high feature extraction ability by convolutional block-based U-net. Whether it is for speckle reconstruction, or the network trained as a lens to enlarge and reduce the physical size, also has advantages. The overall performance for speckle reconstruction and auto-encoding has lower complexity and higher precision than D2NN. For the task of speckle reconstructions of handwritten digits, NPCC reaches − 0.999, SSIM reaches 0.984, and PSNR reaches 30.439 dB. Besides, there is no need to perform Fourier transform on the input image. In other words, HODNNs simplify the whole network model and make the network more portable, less hardware complexity, and high efficiency in processing.

2.1 Architecture
Although F-D2NN can preserve the spatial features of objects through Fourier space, its structure is a complex and low possibility of physical implementation. What's more, in situ and reconfigurable have high hardware costs. Therefore, we propose an optoelectronic hybrid neural network to reduce the cost and have a strong feature extraction ability. The structure of HODNNs is shown in Fig. 1. The all-optical network is shown in the left half of Fig. 1(a) and Fig. 1(b), and the two functions are the same. The hybrid model consists of the following two parts: the right half of Fig. 1(a) is a convolutional block-based U-net, and the right half of Fig. 1(b) is a ResBlock-based U-net.

2.1.1 All-optical neutral networkD2NN is an all-optical machine learning framework composed of diffraction plates. Each layer is a grid of pixels, each of which represents an artificial neuron that can independently modulate the phase or amplitude of the incident light, as shown in Eq. (1)

${t}_{i}^{l}\left({x}_{i},{y}_{i},{z}_{i}\right)={a}_{i}^{l}\left({x}_{i},{y}_{i},{z}_{i}\right)\text{e}\text{x}\text{p}\left(j{\psi }_{i}^{l}\left({x}_{i},{y}_{i},{z}_{i}\right)\right)$  (1)

where $i$ and $l$ denote the neuron and diffractive layer number, respectively [20]. The amplitude is assumed to be a constant in Eq. (1).

Since ${\psi }_{i}^{l}$ will not be between 0 ~ 2π during training, the phase ${\psi }_{i}^{l}$ will be normalized by sigmoid function and multiplied by 2π during training in Eq. (2)

$${{\phi }}_{i}^{l}\left({x}_{i},{y}_{i},{z}_{i}\right)=2{\pi }sigmoid\left({\psi }_{i}^{l}\right) \left(2\right)$$

Here, ψ is the phase sum of all the neurons in the front layer.

Using the law of angular spectrum, the free space propagation of the transfer function can be expressed as Eq. (3),$z$represents the physical distance between two diffraction plates, $\lambda$ is the illumination wavelength,${ f}_{\text{x}}$ and ${f}_{y}$ is the coordinate position of the spectrum, $j=\sqrt{-1}$.S

$$H\left({f}_{X},{f}_{Y}\right)=exp\left[j2{\pi }\frac{z}{{\lambda }}\sqrt{1-{\left({\lambda }{f}_{X}\right)}^{2}-{\left({\lambda }{f}_{Y}\right)}^{2}}\right]$$

The intensity distribution of the input light is transformed into the frequency domain by a two-dimensional Fourier transform. By multiplying the transfer function, the phase information with inverse Fourier transform is propagated to the next layer and is further modulated by the bias, i.e., phase delay. Thus, the field distribution of i th neuron on layer l can be expressed as:

$${I}_{i}^{l}={\mathcal{ℱ}}^{-1}\left[\mathcal{ℱ}\left({I}_{i}^{l-1}\right)\times {H}_{i}^{l-1}\right]\times {t}_{i}^{l}$$

2.1.2 U-Net Architecture
At present, the U-net network has been used in various tasks [23], and our U-net neural network is a typical encoder-decoder CNN architecture. In the U-net in Fig. 1, the encoder is used to extract features from the input image. The decoder aims at reconstructing features from coded maps. Skip connection can fuse features. When compared with a fully connected neural layer, the convolution layer is far more sparse and robust because of its computation principle. The sparse connection can alleviate the well-known overfitting problem when the number of training data is limited.

The U-net neural network has three ingenious characteristics, i.e., downsampling, upsampling, skip connection, and ResBlock module. The downsampling process includes convolution layers and max-pooling layers. The convolution layer just requires a small number of parameters to get better performance on the equivalent scale of the problem. Using ResBlock can solve the problem of difficult optimization when training a deep network, it helps vanish gradient. A feature map is well-extracted by a convolution layer. Downsampling operation is used to magnify the receptive field of the kernel. Congregating these two operations, the encoder is not only efficient in computation but also excellent in maintaining spatial information and extracting features of an image than conventional methods. The upsampling process contributes to reconstructing features from highly-coded feature maps. Combined with a skip connection, which enables the network to fuse high dimensional and low dimensional features, the decoder could precisely rebuild the desired output. Due to the existence of a U-net in the network, the proposed all-optical neural network can have strong feature extraction ability without convolution.

2.2 Data Acquisition2.2.1 Data set for speckle reconstruction

The speckle reconstruction task selects the first 2000 images of the ImageNet data set as the data set we use, every image upscaled to 240×240 pixels using bilinear interpolation.We use the transport matrix as a mask of a purely random phase, The refractive index difference between air and diffuser material (Δ𝑛 ≈ 0.74). The wavelength selected was 450nm of visible light, The random surface height 𝐷(𝑥, 𝑦) can be modeled as follows [24]. 𝐷(𝑥, 𝑦) = 𝑊(𝑥, 𝑦) ∗ 𝐾(𝜎) (5)

where 𝑊(𝑥, 𝑦) follows a normal distribution with a mean 𝜇 and a standard deviation 𝜎₀, i.e.𝑊(𝑥, 𝑦)~𝒩(𝜇, 𝜎0). and K (σ) is a zero-mean Gaussian smoothing kernel having a full-width half-maximum (FWHM) value of σ. Here we choose the related variables for a 220 grits diffuser with µ = 63µm, ${\sigma }_{0}$= 14µm, and $\sigma$= 15.75µm.

2.2.2 Data set for physical-encoding
The physical size scaling task uses the filtered MNIST dataset as the dataset. The dataset is divided into 10 after class through three aspects: The length and width of the digital pattern and the proportion of white pixels in the total number after binarization of the image, and the appropriate data set is selected by limiting the length, width, and proportion. Each class finally selects 200 numbers, and a total of 2000 numbers are used as data sets. every image was upscaled from 28×28 pixels to 160×160 pixels using bilinear interpolation, then padded with zeros to cover 240×240 pixels. All data sets are divided into a test set and training set in a ratio of 2:8.
2.3 Data Processing
Optical image reconstruction was carried out on images distorted by 220 grids of randomly frosted glass under 450nm illumination. The frosted glass with random phase adjustment is placed at the position 53$\lambda$ from the input layer [15], and the appropriate distance between the diffraction plates is selected by adjusting. The distance between the diffraction plate of the last layer and the sensor is 9.3$\lambda$, and the light intensity signal received by the sensor is calculated as the output of the optical diffraction neural network. The structure of the diffraction neural network is physically composed of multiple diffraction plates stacked in parallel, and each diffraction plate is composed of 240×240 pixels block that can adjust the phase. The training batch size was set to be 4, loss function is NPCC (negative Pearson correlation coefficient), learning rate was set to be 1e-2 and 1e-3, for the diffraction neural network and the electronic neural network, respectively.

We first demonstrate the performance of our framework for gray-scale digital speckle reconstruction and compare the effects of different layers and different layer spacing in Fig. 3. The network has 20 different configurations, and the optimal solution is found in the training process. The result is the best when the number of layers is 3 and the layer spacing is 1λ. The handwritten digits datasets are used for the network training, and the predictions of the network are shown in Fig. 3(d), and the related labels are shown in Fig. 3(b). According to the results, we can see that the speckle reconstruction of handwritten digits is relatively successful. At the same time, we use four measurement indicators for the results, namely NPCC1, NPCC2, PSNR, and SSIM, where NPCC1 is the loss of all-optical network and NPCC2 is the loss of hybrid network (the closer NPCC is to -1, the better the effect will be. PSNR and SSIM represent peak signal-to-noise ratio and structure similarity respectively, and the larger both represent the result better. The indicators of different network parameters are shown in Fig. 3(e). For the optimum configuration, NPCC2 reaches − 0.999, SSIM reaches 0.984, and PSNR reaches 30.439dB.

When we change the dataset from handwritten digits to ImageNet, the result is shown in Fig. 4, it can also achieve good results. These results show that our network structure is successful in speckle reconstruction. The effect is best when the number of layers is three and the layer spacing is 2.7λ.

Next, we train the network to enlarge and reduce the physical size of the image. Comparing the effects of different layers and different layer spacing in Fig. 5 and Fig. 6, we trained our network with the data set of the physical encoder. The optimal distance between diffraction plates is found during training. For the zoom-out task, we zoom out the 240$\times$240 pixels picture to 120$\times$120 pixels, and for the zoom-in task, we zoom in twice the number in the picture, using 4-layer and the physical distance between two diffraction plates is 1$\lambda$to get the best result.

We have successfully demonstrated the application of HODNNs in speckle reconstruction and physical encoding. About the complexity, the FLOPs of 4-layer D2NN is 0.83G, the number of parameters is 0.06M, and the inference time is 1×10^–13 s, while the FLOPs of multilayer perceptron with the same architecture is 1.66G, the number of parameters is 829.5M, and the inference time is 1×10^− 3 s. It is worth noting that the inference time of all-optical is ${10}^{10}$less than that of multi-layer perceptron. Taking the physical size reduction as an example, the NPCC reaches − 0.983, SSIM reaches 0.962, and PSNR reaches 28.114dB. Compared with D2NN and F-D2NN, HODNNs not only have light-speed processing capacity but also achieve results comparable to the electrical network. Compared with the F-D2NN and in situ D2NN, HODNNs have the advantages of lower complexity, high computational efficiency, and high precision in feature extraction ability. In the future, HODNNs can be used for optical artificial intelligence assisted microscopic imaging.

D2NN

Diffractive deep neural network

F-D2NN

Fourier-domain diffraction deep neural network

HODNNs

Hybrid optical diffractive neural networks

NPCC

Negative Pearson correlation coefficient

SSIM

Structural similarity measure

PSNR

Peak signal-to noise ratio

FLOPs

Floating-point operations per second

Acknowledgments

Not applicable.

Author’s contributions

YW proposed the idea and framework of this research. PX, ZH,YW carried out the computations and data processing, PX, ZH, BL, HW, YW analysed and discussed the results. All the authors contributed to the writing of manuscript.

Funding

China Postdoctoral Science Foundation (2020M671169).

Availability of data and materials

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate

There is no ethics issue for this paper.

Consent for publication

All authors agreed to publish this paper.

Competing interests

The authors declare that they have no competing interests.

Geert L, et al. A survey on deep learning in medical image analysis. Medical Image Analysis. 2017;42:60–88.
Wang YYD, et al. High-generalization deep sparse pattern reconstruction: feature extraction of speckles using self-attention armed convolutional neural networks. Optics express. 2021;29:35702-35711.
Wang YYD, et al. High-accuracy, direct aberration determination using self-attention-armed deep convolutional neural networks. Journal of microscopy. 2022;286:13-21.
Zhao ZQ, Zheng P, Xu ST, et al. Object detection with deep learning: a review. IEEE Transactions on Neural Networks and Learning Systems. 2019;30:3212- 3232.
Li-Hua W, Kang-Hyun J. Deep learning-based perception systems for autonomous driving: A comprehensive survey. Neurocomputing. 2022;489: 255-270.
Li MX, Xiang QY, Chen ZH, Wang MW. A Unified Neural Network for Quality Estimation of Machine Translation. IEICE Transactions on Information and Systems. 2018;489:2417-2421.
Shastri BJ, Tait AN, et al. Unconventional Computing: In: Peper, Ferdinand, editors. A Volume in the Encyclopedia of Complexity and Systems Science. New York: America; 2018.p.631-639.
Lin X, et al. All-optical machine learning using diffractive deep neural networks. Science. 2018;361:1004-1008.
Mengu D, Luo Y, Rivenson Y, et al. Response to Comment on" All-optical machine learning using diffractive deep neural networks s". arXiv preprint arXiv. 2018;1810:04384.
Tao Y, et al. Fourier-space Diffractive Deep Neural Network.[J]. Physical review letters. 2019;123:023901.
Luo Y, et al. Design of task-specific optical systems using broadband diffractive neural networks. Light: Science & Applications. 2019;8:112.
Kulce O, et al. All-optical information-processing capacity of diffractive surfaces. Light Sci Appl. 2021;10:1–17.
Lu L, Zhu L, Zhang Q, Zhu B, Yao Q, Yu M, et al. Miniaturized diffraction grating design and processing for Deep Neural Network. IEEE Photonics Technol Lett. 2019;31:1952–5.
Yi L, et al. Design of task-specific optical systems using broadband diffractive neural networks. Light Sci Appl. 2019;8:1–14.
Mengu D, Rivenson Y, Ozcan A. Scale-, shift-, and rotation-invariant diffractive optical networks. ACS Photonics. 2021;8:324–34.
Idehenre IU, Mills MS. Multi-directional beam steering using diffractive neural networks. Opt Express 2020;28:25915–34.
Dou H, et al. Residual D2NN: training diffractive deep neural networks via learnable light shortcuts. Optics Letters. 2020;45:2688-2691.
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016;abs/1512.03385:770-778
Chang JL, et al. Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Scientific reports. 2018;8:12324.
Mengu D, Luo Y, Rivenson Y, Ozcan A. Analysis of Diffractive Optical Neural Networks and Their Integration With Electronic Neural Networks. IEEE Journal of Selected Topics in Quantum Electronics. 2020;26:1-14.
Zhou T, et al. In situ optical backpropagation training of diffractive optical neural networks. Photonics Research. 2021;46:5260-5263.
Zhou T, et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nature Photonics. 2021;15:367-373.
Lin ZU, et al. Expansion of Depth-of-Field of Scattering Imaging Based on DenseNet. Journal of the optical. 2022;42:0436001.
Li S, et al. Imaging through glass diffusers using densely connected convolutional networks. Optica. 2018;5:803–813.

Download PDF

Version 1

posted

You are reading this latest preprint version

Hybrid Optical Diffractive Neural Networks (HODNNs) for Speckle Reconstruction and Physical Auto-encoding

Status:

Version 1

Abstract

Figures

Introduction

Methods

Results

Conclusion

Abbreviations

Declarations

References

Status:

Version 1