Self-supervised dynamic learning for long-term high-fidelity image transmission through unstabilized diffusive media

doi:10.21203/rs.3.rs-2858332/v1

Download PDF

Article

Self-supervised dynamic learning for long-term high-fidelity image transmission through unstabilized diffusive media

https://doi.org/10.21203/rs.3.rs-2858332/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 19 Feb, 2024

Read the published version in Nature Communications →

Version 1

posted

You are reading this latest preprint version

Multimode fiber (MMF) which supports parallel transmission of spatially distributed information is a promising platform for remote imaging and capacity-enhanced optical communication. However, the variability of the scattering MMF channel poses a challenge for achieving long-term accurate transmission over long distances, of which static optical propagation modeling with calibrated transmission matrix or data-driven learning will inevitably degenerate. In this paper, we present a self-supervised dynamic learning approach that achieves long-term, high-fidelity transmission of arbitrary optical fields through unstabilized MMFs. Multiple networks carrying both long- and short-term memory of the propagation model variations are adaptively updated and ensembled to achieve robust image recovery. We demonstrate >99.9% accuracy in the transmission of 1024 spatial degree-of-freedom over 1km-length MMFs lasting over 1000 seconds. The long-term high-fidelity capability enables compressive encoded transfer of high-resolution video with orders of throughput enhancement, offering insights for artificial intelligence promoted diffusive spatial transmission in practical applications.

Physical sciences/Optics and photonics/Optical techniques/Imaging and sensing

Physical sciences/Optics and photonics/Applied optics/Fibre optics and optical communications

Propagation of optical field for long distance is the essential requirement in remote imaging and optical communication applications. Optical fibers, especially the single-mode fiber, have been elaborately deployed in internet connection, however, they only allow for the transmission of a single Gaussian fundamental beam. Multimode-fiber (MMF), on the other hand, permits hundreds to several thousand spatial modes to pass through. This makes them a promising platform for direct image transmission and mode-division multiplexing transmission to boost the channel capacity^[1-5], showing potential in applications such as compact endoscopy in bio-imaging^[6-11] high-rate optical communication^[12], and quantum key teleportation^[13-15].

However, the inevitable spatial mode dispersion in the MMFs induces a complex optical field mixing of the input image as it propagates through the fiber, resulting in intense scrambling of the output intensity distribution. Image restoration through diffuse media entails precise characterization of the physical system. Methods have been developed that measure the MMF’s forward process in the form of a complex transmission matrix (TM), and unscramble the output image via back-projection^[3-4,^16-17]. However, the calibration of TM with huge number of elements is time-consuming and complicated in implementation. Artificial intelligent approaches have been proposed to model the forward process^[2,19] or directly estimate the decoding process^[20-24] for image inference from the scrambled output. Notwithstanding the notable progress in modeling scattering process, it is foreseen that the instability of the MMF propagation induced by system drifting and environmental disturbance will accumulate and severely deviate from the characterized model after long time, especially for long-distance transmission. As a consequence, most reported literatures have experimented on short MMFs (<10 m)^[2-4,24] and were impeded from practical long-term high-fidelity image transmission. Therefore, developing a technique to support long-term accurate spatial information transmission over long-distance for remote video transfer remains a challenge.

One method to tackle the variation of scattering is to train a general network using large sets of data describing diverse conditions^[22-25]. However, this requires a long-duration training data collection and will hinder its flexible application in practice. Moreover, the generalized model often achieves poorer performance at a certain timepoint than using a ‘specific’ network trained on data acquired within small time durations. Another inspiring approach is to use the mixture-of-expert framework, in which multiple expert networks each responsible for one sub-problem are fused together to achieve better generalization^[25-28]. This architecture has been successfully applied for image denoising^[29-31] and phase retrieval^[32] to improve reconstruction accuracy and robustness to noise. However, the implementations have so far been confined to static contexts.

In this work, we develop a dynamic learning framework that can adaptively handle the time-varying optical propagation in long MMFs. The proposed network, termed multi-scale memory dynamic-learning network (MMDN), leverages the multi-expert framework to separately model the long- and short-term dynamics of the unstable MMF channel with multiple networks and combine the multi-scale memory by adaptive weighted ensemble. The network parameters are dynamically updated over time in a self-supervised manner, by learning from currently predicted pseudo-labels to synthesize the optimal inverse transmission model for subsequent image inference. MMDN achieves adaptive and accurate tracing of the variations on the optical propagation model in MMF, enabling parallel transmission of 1024 spatial degrees of freedom through 1km-length fibers with >99.9% accuracy for over 1000s-duration. The high-fidelity performance enables efficient transmission of high-resolution video with several orders throughput enhancement by using compressive encoding, showing the feasibility of long-term high-fidelity spatial transmission. The proposed dynamic memory framework opens up a new paradigm for demixing through unstabilized diffuse media.

Multi-scale memory dynamic-learning network

We experimented on an amplitude-to-amplitude optical setup (see Fig. 1 (a) and Supplementary Fig. 1). The input pattern coupled into the MMF is transformed into a complex distorted field at the distal end due to spatial mode dispersion. We calibrated and analyzed the variations of diverse 100m and 1km-length MMF transmission channels and observed slowly changing system drifting (see Supplementary Fig. 2). Deep learning approaches have been reported to solve this nonlinear transformation problem to retrieve the input image from output speckle, however, one generalized neural network will fail to precisely model the gradually varying MMF channel over a long time. In this work, we proposed a dynamic learning framework that adaptively tracks the optical propagation characteristics to achieve long-term image transmission with high accuracy through long MMFs. Instead of training a fixed and generalized neural network, we designed a compact network that generalizes well on short-term transmission. The MMDN model will be dynamically updated by online training on the unlabeled streaming data using predicted input images as labels, hence working in a self-supervised architecture. According to the calibration results in Supplementary Fig. 2, the MMF transmission channels could sustain nearly unchanged within a 10-second duration, which ensures the feasibility of the self-supervised dynamic updating strategy.

Besides slow-varying system drift, we also witnessed intense jitter effects that will cause severe MMF channel deviations (illustrated in Supplementary Fig. 3). This physical prior knowledge inspires us to develop an ensemble framework of subnetworks which carry multi-scale memory of the system variations for the MMDN block. As shown in Fig. 1 (c), the output speckle image is fed into three networks, where S1 and S2 indicate two transmission models with different short-term memories, and L3 indicates one model with long-term memory. A grading module was additionally designed for confidence evaluation of expert subnetworks, allowing adaptive memory fusing to produce an optimized prediction that has good adaptability to both slow drifting and abrupt jitter effect. For continuous data transmission, the multi-scale models are self-supervised updated at a short interval (i.e., 5 seconds). For S1 and S2 that aim to carry short-term information, we introduced the forgetting mechanism by alternatively wiping its memory of current model and rebuilding it with only the latest predicted data at specific intervals alternatively, as depicted in Fig. 1 (b); while the L3 will always keep the memory of previous transmission characterizations during the update process. The proposed MMDN is supposed to show better performance in long-term data recovery through scattering MMFs, attributed to the enriched perspectives provided by the adaptively updated multi-timescale models.

Long-term spatial transmission through unstabilized MMFs

We evaluated the performance of the proposed MMDN for long-term spatial transmission through diffuse media by experimenting on 100m- and 1km-length MMFs placed on the optical table without vibration isolation. To verify the ability to transmit arbitrary spatial information, we showed reconstruction of random binary patterns with spatial degree-of-freedom of 16×16 and 32×32, as opposed to the specified images such as the MNIST dataset. We additionally reshaped the common grid pixel assignment to accommodate the round-shape property of the MMF entry, and proved slightly enhanced performance (see Supplementary Fig. 5). The output speckle patterns were captured by a camera at 100 fps. At the preparation stage of data transmission, a sequence of known patterns is transmitted, and the MMDN is statically trained on the initial instances. Once finished, the MMDN is ready for dynamic update to support continuous image transmission.

In Fig. 2, we show the transmission results for over 1000 seconds on four different settings. The static neural network (StaticNN) trained on the first 500-second data and tested on the following 500-second is taken as the baseline method. The MMDN was dynamically updated at each 5-second interval. Comparing the reconstructed accuracy averaged on each 5-second batch, we observed an obvious degradation after 200s using StaticNN, especially for 1km fibers; while MMDN achieved ~ 100% accuracy throughout the 1000s period. Although the increase of fiber distance makes its scattering channel vary more intensely, the dynamic learning ability of proposed MMDN allows it to precisely model the dynamics and adapt to the current state of the physical process. We also illustrated representative recovered spatial profiles and the averaged spatial distribution of transmission accuracy. For 1km transmission, MMDN enables < 0.1% error rate across the spatial position and the average accuracy of 99.97%, which is two orders improved than using StaticNN. For 100m transmission cases, MMDN even reduce the averaged error rate to 2e-6 throughout the transmission period. As a result, we demonstrated that MMDN can permit long-term spatial information transmission with high fidelity.

Multi-scale memory ensemble improves instability robustness

The dynamic learning strategy for MMF dynamics modeling intuitively raises a critical demand for accurate inference on current batch, so as to provide reliable training sets for the subsequent update. However, two main factors exist that could hinder its naïve implementation from practical usage. First of all, the inevitable inference error for previous batch induces a bias to the model update, which will accumulate and largely deteriorate the network effectiveness after a long period. Besides, the abrupt disturbances happen frequently for unstabilized long MMFs, resulting in significant transmission characteristics distinguishment even between adjacent batches. MMDN incorporates the physical prior inspired multi-scale memory mixture architecture, providing a solution for precise and robust modeling of system dynamics.

We next investigated the benefits of long and short-term memory ensemble network design. We chose two highly variable MMF channels including a 100m-length gradient-indexed MMF of 200 ${\mu }$m diameter (GI-200) and a 1km-length step-indexed MMF of 200 ${\mu }$m diameter (SI-200) for spatial transmission of 16×16 and 32×32-pixel random binary patterns. We analyzed the contributions of different subnetworks in MMDN by evaluating their intermediate prediction accuracy. For GI-200 case, as shown in Fig. 3(a), we presented the average accuracy over a 1000s duration using S1, S2, L3 and the confidence-based ensemble network, and observed that the multi-scale model improved the accuracy by around 10-fold compared to the short-term models and the long-term model. Comparing the time-varying transmission error rate shown in Fig. 3(b), the long-term model shows degraded performance especially at the 40-55s period, while the short-term models provide ‘wiser’ predictions and contribute to producing a good ensemble model. Similar comparisons in averaged transmission accuracy and error rate at each timestep are also presented for the SI-200 case (see Fig. 3 (c)(d)). In this experiment, the long-term model works better than short-term models for most of the time, as the magnified time clip within 40-45s, and the multi-scale model outperforms all other single-scale models throughout the period. The above results verified that using a mixture of expert networks with different memory scales will enrich perspectives of the dynamic scattering channel by learning from the same data, and overcome diverse variations in MMFs to achieve spatial transmission with high accuracy over long periods.

Scalability and generalization of MMDN

Having validated long-term high-fidelity transmission of generalized patterns, we also noted that the MMDN is well applicable to transmit specific types of natural images. In Fig. 4 (a,b), we show a set of recovered hand-written digits and fashion symbols transmitted through a 1km-long fiber. The input images of 28×28 pixels are upsampled to be 32×32 and binarized by the greyscale threshold of 0.5. The initial MMDN model trained on 20,000 paired data is then dynamically updated to support continuous image transmission of over 1000s. To evaluate the quality of reconstructed images, we quantitively compared the SSIM and accuracy obtained using StaticNN and proposed MMDN. We observed nearly errorless transmission of both MNIST and fashion-MNIST datasets with MMDN during the whole period, and the SSIM is mostly above 0.99. Whereas, the static model will degrade in transmission accuracy and image similarity. This indicates the feasibility of image transmission in unstabilized scattering MMF using MMDN, laying the foundation for direct natural image transmission for remote imaging applications such as biological endoscopy.

One big concern of learning-based image reconstruction approach is whether the network is able to transfer to other unseen categories of images. To demonstrate the generalized performance of proposed network, we used the MMDN dynamically trained on Latin alphabet images for inference on the digit dataset (see Fig. 4(c)) and measured the transmission accuracy (see Fig. 4(d)). Examples of reconstructed input patterns right after the switch of image category as well as 10s and 150s after the switch are presented in Fig. 4(e), respectively. We observed an average accuracy of 99.44% and good visual fidelity with a high SSIM of 0.978 even without fine-tuning the network (i.e., T = 0). The accuracy was further elevated up to 99.68% and the SSIM up to 0.989 after 2 batches (i.e., T = 10s) attributed to the dynamic update ability of MMDN, and the high accuracy could be well retained during the following 150s. This experiment demonstrates that the proposed MMDN has good generalization performance for image transmission.

Long-term high-throughput video transmission

We further propose a practical high-throughput video transmission protocol via unstabilized long MMFs, as shown in Fig. 5 (a). High-resolution video streams are compressively encoded into bit streams and spatially arranged to form a sequence of spatial-multiplexing-coded patterns. The spatial-multiplexed MMF setup can be used to support up to thousands of independent communication channels transmitting in parallel, and the proposed MMDN addresses the encoded spatial pattern recovery from recorded speckle images with nearly 100% accuracy to guarantee high-quality video decoding.

We experimented on the 1km transmission of a 6-minute full-color video of 480$\times$480-pixel resolution and 20 fps. The video was encoded in the form of H.264 with a compression ratio of 1.39‰ and then converted into 32×32-pixel spatially multiplexed patterns. The proof-of-concept MMF setup with 150 fps acquisition rate guarantees adequate bandwidth for real-time transmission of encoded video. We evaluated the transmission accuracy of the spatial-multiplexed encoded signals in Fig. 5(b) and the visual fidelity of decoded video frames using SSIM metric in Fig. 5(c). When using the static network for image recovery, we observed that a slight signal accuracy deterioration of less than 2% starting at 100s would cause a significant drop in decoded video quality, leading to the SSIM of recovered video frames drastically dropping below 0.4. Noticeably, even when the transmission performance can be restored to good accuracy afterward, such as the 160-180s period as highlighted with the asterisk in Fig. 4 (b), the video decoding corruption still remains irreparable. By comparison, our proposed MMDN ensures high accuracy spatial transmission over long-term, hence can be incorporated into the highly-compressed encoded video transmission protocol to achieve high-quality image recovery with the SSIMs all above 0.995 within the 6-minute duration. Example recovered frames within 93-99s period are illustrated in Fig. 5(d) to visualize the decoding corruption caused by the slight transmission error using static network. The results validated the superiority in decoding accuracy using MMDN for encoded video transmission, which offered over 700-fold efficiency enhancement compared to the plain coded video transmission scheme (see Supplementary Fig. 6).

Spatial reconstruction from diffusive MMF transmission is a challenging task, and has so far been limited to short-distance fibers and faced the tradeoff between high accuracy and long-time generalization. In the work, we described the MMDN, a multi-scale dynamic learning approach that can achieve >99.9% accuracy of long-time image transmission through unstabilized 1km MMF channels. The MMDN technique is developed based on the physically informed model of the diffusive MMF system that fully describes the long-term drifting and short-term abruption effects, therefore enabling good accommodation with the dynamics of the system characteristics. We compared MMDN with previous approaches in terms of time duration, reconstruction quality, and transmission distance, etc., and demonstrated a comprehensive improvement of our approach (see Supplementary Fig. 7). In particular, the high-fidelity long-term transmission performance of MMDN will allow for efficient video transmission in a highly compressed scheme. The superior decoding capability will also facilitate practical applications such as minimally invasive endoscopic imaging, spatial-multiplexed optical communication, and optical security taking diffusive channel as the decoding medium.

Experimental set-up

The optical setup for image transmission through the MMFs is described in Supplementary Fig. 1. The continuous-wave laser beam at wavelength 561nm is expanded by a pair of lenses (${f}_{1}$ = 10 mm, ${f}_{2}$ = 100mm) and projected onto the SLM (V-7001) for binary amplitude modulation. The spatial modulated optical field is coupled by an objective lens (Nikon, 20X, 0.25NA) into the input facet of the MMF. At the output of the MMF, the speckle field at the distal end is magnified by another objective (Nikon, 20X, 0.25NA) and imaged by the CMOS (MER2-230-167U3M).

Stability analysis of MMF channel

We adopted three metrics to evaluate the transmission channel stability over a large variety of MMFs, as summarized in Supplementary Fig. 2. The first metric measures the SSIM of a sequence of output speckle patterns to the reference pattern captured at $T=0$ while transmission a fixed image. The SSIM calculation can be expressed as

$$\begin{array}{c}SSIM=\frac{\left(2*{\mu }_{x}*{\mu }_{y}+{C}_{1}\right)*\left(2*{\sigma }_{xy}+{C}_{2}\right)}{\left({\mu }_{x}^{2}+{\mu }_{y}^{2}+{C}_{1}\right)*\left({\sigma }_{x}^{2}+{\sigma }_{y}^{2}+{C}_{2}\right)}. \#\left(1.\right)\end{array}$$

Here, $x$ and $y$ are the two images being compared, ${\mu }_{x}$, ${\mu }_{y}$ and ${\sigma }_{x}^{2}$, ${\sigma }_{y}^{2}$ are the mean values and variations of images $x$ and $y$, and ${\sigma }_{xy}$ is the covariance between two images. The ${C}_{1}$ and ${C}_{2}$ are two constants used to prevent the denominator from becoming zero or the result becoming infinite.

The second metric measures the Helinger distance of the speckle energy distribution of two batches. The speckle energy distribution of each batch is obtained by summing up all speckle images in that batch, and one batch consists of 500 output speckle images captured within 5 seconds. For two discrete energy distributions $P=({p}_{1},{p}_{2}\dots ,{p}_{n})$ and $Q=({q}_{1},{q}_{2}\dots ,{q}_{n})$, $n$ is the total number of pixels in a single speckle image, the Hellinger distance can be calculated as

$$\begin{array}{c}{H}^{2}\left(P,Q\right)=\frac{1}{2}\sum _{i=0}^{n}{\left(\sqrt{{q}_{i}}-\sqrt{{p}_{i}}\right)}^{2}.\#\left(2.\right)\end{array}$$

The third metric computes the pixel-wise accuracy of recovered image at each time point, by using an approximate reconstruction algorithm termed real-value inverse transmission matrix (RVITM)^[3]. The test image is transmitted right after the approximal transmission matrix is calibrated. The accuracy for each recovered image is the percentage of mis-distinguished pixels to the full number of pixels.

Data preparation

Our research employed arbitrary random binary images, as well as natural scene images including the Fashion-MNIST dataset, Latin letters from the E-MNIST dataset, and handwritten digitals from the MNIST dataset. The input images tested in the experiments are of 16×16-pixel and 32×32-pixel resolution. According to the maximum number of spatial modes that is allowed to transmit through the MMF, it is theoretically possible to input images of larger pixel resolution. We used a binary SLM as the amplitude modulator to encode the spatial information, and the output speckles are captured by the monochromatic CMOS. To reduce computational complexity, we down-sampled the speckle images to 100×100-pixel and 150×150-pixel for reconstructing input patterns of 16×16-pixel and 32×32-pixel, respectively.

Neural network architecture

We designed an ensemble network consisting of three sub-networks with the same structure but updated differently over time. Each subnetwork is a convolutional neural network consisting of three convolutional layers and one fully-connected layer. Each convolution layer is followed by Dropout, Batch normalization, and ReLu activation. A sigmoid activation function is applied to the output layer. We employed cross-entropy as the loss function, and trained the networks with the AdaDelta optimizer at a learning rate of 0.1. The network training was executed on a workstation equipped with NVidia RTX3090 GPU.

Confidence-based ensemble algorithm

Each subnetwork of the ensemble network calculates the average confidence level of one predicted instance as:

$$\begin{array}{c}c=\frac{1}{N}\sum _{i=1}^{N}\left(\left|{P}_{i} -0.5\right|+0.5\right) ,\#\left(3.\right)\end{array}$$

where ${P}_{i}$ represents the predicted value of pixel $i$ from the subnetwork, which is binarized to be either 0 or 1, and $N$ is the spatial resolution of the input patterns. The confidence-based weight of each subnetwork is defined as:

$$\begin{array}{c}{w}_{k}=\text{exp}\left(\frac{1-{c}_{k}}{\left(1-{c}_{1}\right)+\left(1-{c}_{2}\right)+\left(1-{c}_{3}\right)}\right).\#\left(4.\right)\end{array}$$

The final ensemble prediction is the weighted summation of predictions from three subnetworks:

$$\begin{array}{c}P=\frac{{w}_{1}*{p}_{1}+{w}_{2}*{p}_{2}+{w}_{3}*{p}_{3}}{{w}_{1}+{w}_{2}+{w}_{3}}. \#\left(5.\right)\end{array}$$

Acknowledgement

This work was sponsored by the National Natural Science Foundation of China (62231018), and Shanghai Science and Technology funding (2021SHZDZX0103).

Author Contribution

Z.L. conceived of the project; W.Z. developed and implemented the reconstruction algorithm; Z.L., S.Z. and W.Z. build the optical setup; W.Z. and Z.L. designed the experiments and analyzed the experimental data; W.Z. and S.Z. collected the data; N.C. and Q.D. provided mentoring support; Z.L. and W.Z. wrote the manuscript with input from all authors.

Competing interests

The authors declare no competing interests.

Data availability

All data are available from the corresponding author upon reasonable request.

Code availability

The code is available from the corresponding author upon reasonable request.

Richardson, David J., John M. Fini, and Lynn E. Nelson. "Space-division multiplexing in optical fibres." Nature photonics 7.5 (2013): 354–362.
Caramazza, Piergiorgio, et al. "Transmission of natural scene images through a multimode fibre." Nature communications 10.1 (2019): 2029.
Zhao, Tianrui, et al. "Seeing through multimode fibers with real-valued intensity transmission matrices." Optics Express 28.14 (2020): 20978–20991.
Fan, Weiru, et al. “High-Fidelity Image Reconstruction through Multimode Fiber via Polarization‐Enhanced Parametric Speckle Imaging.” Laser & Photonics Reviews 15.5 (2021): 2000376.
Li, Shuhui, et al. "Compressively sampling the optical transmission matrix of a multimode fibre." Light: science & applications 10.1 (2021): 88.
Bianchi, Silvio, and Roberto Di Leonardo. "A multi-mode fiber probe for holographic micromanipulation and microscopy." Lab on a Chip 12.3 (2012): 635–639.
Turtaev, Sergey, et al. "High-fidelity multimode fibre-based endoscopy for deep brain in vivo imaging." Light: Science & Applications 7.1 (2018): 92.
Choi, Youngwoon, et al. "Scanner-free and wide-field endoscopic imaging by using a single multimode optical fiber." Physical review letters 109.20 (2012): 203901.
Plöschner, Martin, Tomáš Tyc, and Tomáš Čižmár. "Seeing through chaos in multimode fibres." Nature Photonics 9.8 (2015): 529–535.
Turtaev, Sergey, et al. "High-fidelity multimode fibre-based endoscopy for deep brain in vivo imaging." Light: Science & Applications 7.1 (2018): 92.
Vasquez-Lopez, Sebastian A., et al. "Subcellular spatial resolution achieved for deep-brain imaging in vivo using a minimally invasive multimode fiber." Light: science & applications 7.1 (2018): 110.
Richardson, David J., John M. Fini, and Lynn E. Nelson. "Space-division multiplexing in optical fibres." Nature photonics 7.5 (2013): 354–362.
Li, Wei, et al. "High-rate quantum key distribution exceeding 110 Mb s–1." Nature Photonics (2023): 1–6.
Ding, Yunhong, et al. "High-dimensional quantum key distribution based on multicore fiber using silicon photonic integrated circuits." npj Quantum Information 3.1 (2017): 25.
Zhou, Yiyu, et al. "High-fidelity spatial mode transmission through a 1-km-long multimode fiber via vectorial time reversal." Nature communications 12.1 (2021): 1866.
Popoff, S. M., et al. "Controlling light through optical disordered media: transmission matrix approach." New Journal of Physics 13.12 (2011): 123021.
Čižmár, Tomáš, and Kishan Dholakia. "Exploiting multimode waveguides for pure fibre-based imaging." Nature communications 3.1 (2012): 1027.
Tahir, Waleed, Hao Wang, and Lei Tian. "Adaptive 3D descattering with a dynamic synthesis network." Light: Science & Applications 11.1 (2022): 42.
Rahmani, Babak, et al. "Actor neural networks for the robust control of partially measured nonlinear systems showcased for image propagation through diffuse media." Nature Machine Intelligence 2.7 (2020): 403–410.
Fan, Pengfei, et al. "Learning enabled continuous transmission of spatially distributed information through multimode fibers." Laser & Photonics Reviews 15.4 (2021): 2000348.
Borhani, Navid, et al. "Learning to see through multimode fibers." Optica 5.8 (2018): 960–966.
Resisi, Shachar, Sebastien M. Popoff, and Yaron Bromberg. "Image transmission through a dynamically perturbed multimode fiber by deep learning." Laser & Photonics Reviews 15.10 (2021): 2000553.
Li, Yunzhe, Yujia Xue, and Lei Tian. "Deep speckle correlation: a deep learning approach toward scalable imaging through scattering media." Optica 5.10 (2018): 1181–1190.
Rahmani, Babak, et al. "Multimode optical fiber transmission with a deep learning network." Light: science & applications 7.1 (2018): 69.
Tahir, Waleed, Hao Wang, and Lei Tian. "Adaptive 3D descattering with a dynamic synthesis network." Light: Science & Applications 11.1 (2022): 42.
Gupta, Harshit, et al. "CNN-based projected gradient descent for consistent CT image reconstruction." IEEE transactions on medical imaging 37.6 (2018): 1440–1453.
Li, Fengqiang, et al. “Compressive ghost imaging through scattering media with deep learning.” Optics Express 28.12 (2020): 17395–17408.
Sun, Yu, Zhihao Xia, and Ulugbek S. Kamilov. "Efficient and accurate inversion of multiple scattering with deep learning." Optics express 26.11 (2018): 14678–14688.
Weigert, Martin, et al. "Content-aware image restoration: pushing the limits of fluorescence microscopy." Nature methods 15.12 (2018): 1090–1097.
Agostinelli, Forest, Michael R. Anderson, and Honglak Lee. "Adaptive multi-column deep neural networks with application to robust image denoising." Advances in neural information processing systems 26 (2013).
Choi, Joon Hee, Omar A. Elgendy, and Stanley H. Chan. "Optimal combination of image denoisers." IEEE Transactions on Image Processing 28.8 (2019): 4016–4031.
Deng, Mo, et al. "Learning to synthesize: robust phase retrieval at low photon counts." Light: Science & Applications 9.1 (2020): 36.

There is NO Competing Interest.

SupplementaryMovie1.mp4
Demo of continuous transmission of generalized patterns via unstabilized long MMFs
SupplementaryMovie2.mp4
Demo of continuous transmission of hand-written letters and fashion images via 1km-length MMF
SupplementaryMovie3.mp4
Demo of high-throughput encoded video transmission via unstabilized 1km-length MMF
MMDNSupplementary.docx

Download PDF

Journal Publication

published 19 Feb, 2024

Read the published version in Nature Communications →

Version 1

posted

You are reading this latest preprint version

Self-supervised dynamic learning for long-term high-fidelity image transmission through unstabilized diffusive media

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Results

Multi-scale memory dynamic-learning network

Long-term spatial transmission through unstabilized MMFs

Multi-scale memory ensemble improves instability robustness

Scalability and generalization of MMDN

Long-term high-throughput video transmission

Discussion

Methods

Experimental set-up

Stability analysis of MMF channel

Data preparation

Neural network architecture

Confidence-based ensemble algorithm

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1