Transformer-based Deep Learning Model for Fluorescence Lifetime Parameter Estimations using Pixelwise Instrument Response Function

doi:10.21203/rs.3.rs-5151657/v1

Download PDF

Research Article

Transformer-based Deep Learning Model for Fluorescence Lifetime Parameter Estimations using Pixelwise Instrument Response Function

https://doi.org/10.21203/rs.3.rs-5151657/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Fluorescence lifetime imaging (FLI) is an important molecular imaging modality that can provide unique information for biomedical applications. FLI is based on acquiring and processing photon time of arrival histograms. The shape and temporal offset of these histograms depends on many factors, such as the instrument response function (IRF), optical properties, and the topographic profile of the sample. Several inverse solver analytical methods have been developed to compute the underlying fluorescence lifetime parameters, but most of them are computationally expensive and time-consuming. Thus, deep learning (DL) algorithms have progressively replaced computation methods in fluorescence lifetime parameter estimation. Often, DL models are trained with simple datasets either generated through simulation or a simple experiment where the fluorophore surface profile is mostly flat; therefore, DL models often do not perform well on samples with complex surface profiles such as ex-vivo organs or in-vivo whole intact animals. Herein, we introduce a new DL architecture, MFliNet (Macroscopic FLI Network), that takes an additional input of IRF together with TPSF, addressing discrepancies in the photon time-of-arrival distribution. We demonstrate the model’s performance through carefully designed, complex tissue-mimicking phantoms and preclinical in-vivo cancer xenograft experiments.

Optics/Lasers

Biomedical Engineering

Nuclear Medicine & Medical Imaging

Fluorescence Lifetime Imaging

Deep learning

MFLI

instrument response function

lifetime parameter estimation

topographic variations

transformers

Fluorescence lifetime imaging (FLI) is a powerful molecular imaging technique with high sensitivity and the ability to provide unique signatures with high specificity [1]. Fluorescence lifetime and its associated parameters enable multiplexing studies [2–4] and can report on numerous unique biological signatures, including microenvironmental parameters, protein conformations, metabolic states, protein-protein interactions, and/or ligand-target engagement [5, 6]. FLI has known constant growth over the last three decades, with a significant acceleration in its dissemination thanks to the availability of a user-friendly FLI microscope [7, 8]. In parallel, over the last two decades, FLI has found an increased utility in translational applications, ranging from the mesoscopic (mFLI) [9] to the macroscopic regime (MFLI) [10–12]. Compared to microscopic implementations, mFLI and MFLI are significantly more challenging due to the requirement of using Near Infrared (NIR) fluorophores for deeper tissue penetration. As fluorophores are red shifted, it is typical that their lifetimes are shorter (nanosecond (ns) or sub-nanosecond compared to few nanoseconds in the visible) [12], whereas large-format detectors exhibit low quantum efficiency (a few percent only) [13, 14]. Hence, quantifying lifetime and its associated parameters can be challenging due to very short fluorescence decays and/or low photon counts [15–17]. Unlike microscopic imaging, where the sample preparation allows for precise control over the imaging plane, mFLI, and MFLI samples can exhibit a large depth of field (DOF). These lead to significant variations in the time of arrival of the acquired data, which needs also be taken into account for accurate lifetime quantification. This is especially important in clinical systems, such as endoscopic or fluorescence-guided surgical instruments in which the tissue profiles can lead to DOF variations of a few centimeters.

To address these challenges, understanding the underlying methodology for estimating lifetime parameters becomes important. In mFLI and MFLI, lifetime parameters can be estimated by deconvolving the temporal point spread function (TPSF) and instrument response function (IRF). TPSF is the temporal histogram of the acquired fluorescence photons exiting the surface of the sample after a pulse excitation. The IRF represents the temporal response of the imaging system to pulsed illumination [18]. Considering the complexity involved in estimating FLI parameters across diverse imaging conditions, fast and advanced data processing techniques are necessary to enhance both the precision and efficiency of these analyses. Recently, the field has seen a shift toward rapid, fit-free deep learning (DL) methodologies to alleviate the computational burden and reliance on user expertise, typically associated with methods such as nonlinear least squares fitting (NLSF) for FLI parameter estimation. FLI-Net, a DL model developed for FLI parameter estimation [19], is used to analyze FLI data quickly, producing 2D quantitative images of the lifetimes and corresponding parameters directly without requiring manual parameter adjustments, outputting 2D quantitative images of the lifetime parameters directly. FLI-Net is versatile of FLI experiments, including visible and near-infrared (NIR) imaging, making it adaptable to a large range of biomedical applications. FLI-Net takes the TPSF as an input and outputs FLI parameters. The experimental IRF was used in data generation of the training data; hence, it was represented in the TPSFs. However, FLI-Net was not designed to analyze pixel-wise IRFs while it predicts the lifetime parameters.

Despite the advancements in deep learning for FLI data analysis, the lack of pixelwise IRF considerations poses limitations. The IRF integrates both the excitation part (including the laser source temporal profile) and detection (including the electronic limited reaction time) aspects of the optical setup. The complex broadening or distortion of the intrinsic fluorescence decay is caused by the detection part of the IRF from imaging system characteristics; however, the temporal offset in the IRF is caused by the photon time-of-arrival delays caused by the distance between the imager and sample surface. Hence, topographic variations in the sample surface can lead to variations in delays in photon arrival times per pixel. In such cases, it is important to incorporate pixel-wise IRF in the FLI data processing pipeline. To address these limitations and consider the essential role of the IRF in accurate FLI parameter estimation, we propose a novel deep-learning approach tailored specifically for processing FLI data.

Following the developments in DL models, we leverage herein the ability of transformers to handle sequential data. Generally, transformers have a natural ability to capture long-range connections within data [20–23]. In the context of FLI, they can effectively identify and learn the relationships between the TPSF and IRF, for accurate lifetime estimation. Furthermore, the self-attention mechanism in transformers allows them to focus on the most relevant parts of the input data for making predictions [23]. Adapting to the data at hand is expected to enhance the model’s accuracy, especially in challenging imaging scenarios, e.g. low signal-to-noise ratio, low photon counts, and high variability in tissue properties. In this study, we present MFliNet, a transformer-based deep-learning model that outputs lifetimes and associated parameters while incorporating pixel-wise IRF to account for shifts caused by variations in the DOF. Its key contributions include integrating pixel-wise IRF with the TPSF and enhancing accuracy in complex biological scenarios with topographical variability.

We conducted the ladder phantom experiment designed to validate the MFliNet model under controlled conditions that mimic biological tissues. Figure 1 shows the phantom experiment results, where the analysis was done using three methods: NLSF, FLI-Net, and MFliNet. To compare the precision and stability of each method under varying conditions reflective of real-world applications, results were evaluated across five different heights: ground, 5 mm, 10 mm, 15 mm, and 20 mm. A 160 picoseconds (ps) shift in the IRF was observed from ground level to a height of 20 mm, with a shift of approximately 40 ps for each 5 mm increment in height. For simplicity in comparison, the amplitude-weighted average lifetime was calculated using Eq. 1 for all outputs.

τ _M = (A_Rτ₁ + (1 − A_R)τ₂) (1)

NLSF analysis using pixel-wise IRF showed consistency in lifetime estimation across all tested heights. The mean fluorescence lifetime values obtained by NLSF were clustered around 1.01 ± 0.02 ns. The NLSF without offset correction results deteriorated with each increase in height. At the ground level, NLSF without offset correction began with a mean value of 1.04 ± 0.01 ns. However, as the height increased, a steady decline in lifetime estimation was observed, reaching a mean value of 0.93 ± 0.01 ns at 20 mm. FLI-Net, in contrast, demonstrated a wider variation in estimated fluorescence lifetime values. At the ground level, it reported a mean value of 0.96 ± 0.01 ns, which was lower than the NLSF values. As the distance increased, FLI-Net’s estimations deviated further, peaking at 1.14 ± 0.01 ns at 10 mm and estimating 20 mm with a mean value of 1.10 ± 0.02 ns. In contrast, MFliNet showed closer results with NLSF, particularly at the ground, 15 mm, and 20 mm heights, where the mean values were within the same range as NLSF. At the 10 mm distance, however, MFliNet showed increased variability with a mean value of 0.96 ± 0.04 ns, diverging from the NLSF’s mean. Moreover, in terms of processing speed, NLSF took approximately 6 hours to analyze 598 pixels (covering only the tumor area), whereas MFliNet analyzed the entire dataset of 90,480 pixels in just 63 seconds.

Following the phantom studies, in-vivo experiments were conducted using the HER2 + breast tumor xenograft model in mice to evaluate the model’s performance in a more complex, biologically variable environment. The experimental results, illustrated in Fig. 2, demonstrate the comparative analysis of the HCC1954 cell line using NLSF and MFliNet. For the smaller tumor (HCC1954 (A)), the NLSF method reported a short fluorescence lifetime of 0.40 ns with a standard deviation of 0.14 ns and a long lifetime of 1.15 ± 0.01 ns. The MFliNet showed a comparable short lifetime of 0.43 ± 0.17 ns and a long lifetime of 1.13 ± 0.05 ns. In the case of the larger tumor (HCC1954 (B)), both NLSF and MFliNet methods estimated the short lifetime at around 0.47 ns, with slight differences in standard deviations (0.09 ns for NLSF and 0.14 ns for MFliNet). For the long lifetime, both methods again yielded closely aligned values: 1.15 ± 0.01 ns for NLSF and 1.12 ± 0.06 ns for MFliNet.

In this study, we introduced MFliNet to address the challenges of accurate FLI parameter estimation, particularly in complex and variable biological environments. Our results, as illustrated in Fig. 4, demonstrate shifts in IRFs at varying heights, highlighting how each organ’s unique geometry and composition contribute to IRF offsets. This variation in the IRF offset underscores the challenge of accurately estimating the FLI parameters and the necessity for MFliNet, which can adapt to these complexities. The integration of pixel-wise IRF analysis within MFliNet specifically addresses the effects of surface irregularities on early photon arrival times, which is often overlooked in other DL models. MFliNet leverages the transformer architecture, known for its effectiveness in handling sequential data, to interpret the relationships of TPSF and IRF. This approach allows the model to focus to various parts of the data, enhancing accuracy in complex imaging scenarios. The integration of a CNN residual block post-attention layers enables the capture of local temporal features, further refining the feature extraction process crucial for modeling the variations.

Comparative analysis indicates that MFliNet not only matches the accuracy of NLSF analysis but also enhances processing speed. MFliNet eliminates the need for manual user dependency and extensive user training, making it better suited for real-time applications. In addition, as shown in the phantom experiment, an increasing trend in lifetime estimations from the FLI-Net suggests a distance-related bias, which reflects an underlying limitation in the model’s ability to account for variations in time-of-flight. The effect of the IRF offset on lifetime estimation is further validated through NLSF analysis without using the offset correction, where lifetime estimations result in noticeable declines, potentially leading to systematic underestimations of fluorescence lifetimes and inaccuracies in diagnostics.

The significance of these improvements is particularly relevant in complex imaging environments such as fluorescence-guided surgery (FGS), where the understanding of these variables can significantly impact the quality of imaging and, consequently, the surgical outcomes. Potential integration of MFliNet with existing FGS systems can lead to the development of advanced surgical guidance systems that offer real-time, precise imaging for cancer surgery [24]. Moreover, the capabilities of MFliNet extend beyond clinical applications, offering potential benefits in various research applications. In drug development, for instance, MFliNet’s enhanced accuracy could be used to determine drug-target interactions more precisely, thus accelerating the development of therapeutics by providing clearer insights into molecular engagements. Additionally, in biological research, the improved measurement accuracy of molecular interactions facilitated by MFliNet could foster a deeper understanding of cellular functions and disease mechanisms. This could open new avenues for exploring and developing targeted therapies.

4.1 Imaging setup

All experimental data used in this work were captured on our MFLI system, where the detailed information can be found in [25]. Briefly, the system uses a large-format Intensified Charge-Coupled Device (ICCD) camera (Picostar HR, LaVision GmbH, Germany), for wide-field detection over a 8 × 6 cm² in combination with a Digital micro-mirrors device (DMD), DLi 4110, Texas Instruments, TX, USA, for a wide-field illumination. As an excitation source, we used a tunable Ti-Sapphire laser, Mai Tai HP, Spectra-Physics, CA, USA, which delivers 100 femtosecond pulses at 80 MHz. A gate width of 300 ps and gate delay of 40 ps were used for capturing time-resolved fluorescence decays (for in vivo and in vitro experiments) with a total of 176-time points, which is referred to as a number of gates (G = 176). An emission filter at 740 × 10 nm (FF01-740/13–25, Semrock, IL, Rochester, NY, USA) is used to capture the TPSFs, with the laser set at a 700 nm wavelength, and Alexa Fluor 700 dye was used to obtain the time-resolved fluorescence signals.

4.2 Generation of training data and classical Fluorescence lifetime processing

Fluorescence Lifetime decay follows exponential decay. Depending on the number of components present in the sample, the decay kinetics can be described by a combination of multi-exponential functions. Most FLI imaging experiments involve up to two components; hence, a bi-exponential model is typically used. The two-component or bi-exponential model also includes mono-exponential cases (where fractional amplitudes A_R are one or zero). Mathematically, the TPSF is the convolution of the IRF and the fluorescence decay associated with the lifetime parameters, as shown in Eq. 2, where lifetime decays are denoted as τ₁, τ₂, and A_R is the amplitude fraction.

$$\:TPSF\left(t\right)\:=\:IRF\left(t\right)\:*\:({A}_{R}{e}^{\frac{-t}{{\tau\:}_{1}}}+\left(1\:-\:{A}_{R}\right){e}^{\frac{-t}{{\tau\:}_{2}}})$$

The in-silico data used for training and validating the proposed model was generated using Eq. 2. Initially, time-resolved fluorescence lifetime images with dimensions of 28×28 pixels were generated by using the MNIST dataset. Fluorescence decays were generated for a range of lifetime values commonly used in NIR applications: 0.2 ns to 0.8 ns for τ₁ (short-lifetime component) and 0.8 ns to 1.5 ns for τ₂ (long-lifetime component). The range of the A_R (fraction amplitude) was set from 0–100%, respectively (both bound corresponding to mono-exponential cases). To ensure that our simulated data accurately represents experimental applications, pixel-wise IRFs were used. To capture experimental IRFs, a white diffused paper was placed on the imaging table and illuminated using the DMD with an excitation wavelength of 700 nm. The reflected light was captured using a neutral density (ND) filter. Subsequently, each TPSF was generated by convolving randomly selected IRF from the dataset with simulated fluorescence decay profiles. To approximate the noise characteristics of real-world measurements, system-derived noise, including read-out noise, dark noise, etc., as explained in [25], was incorporated into the simulated TPSFs. This approach ensures that the simulated data closely matches the noise dynamics observed in the actual system.

To evaluate and compare the model’s performance in the absence of experimental ground truth, we used the NLSF method, which is commonly used to estimate the FLI parameters described in Eq. 2. Traditionally, FLI parameters are estimated from experimental data through iterative fitting optimization methods such as the NLSF, which incorporates the Levenberg-Marquardt algorithm [26], or center of mass (CMM) analysis [27]. For our NLSF analysis, we utilized a software named AlliGator [28], allowing adjustments and constraints on fitting parameters, including short and long lifetimes, fraction amplitudes, and offsets, depending on experimental conditions. We selected between single and double exponential decay models according to the complexity of our datasets. AlliGator also provides an option for offset correction when there is a mismatch between the TPSF and IRF. We evaluated the importance of offset correction in our NLSF analysis by comparing data with and without this feature in our phantom experiments. Additionally, we benchmarked our results against those obtained using FLI-Net to provide a comprehensive evaluation of our approach in the context of established methodologies.

4.3 Deep learning network architecture

MFliNet was designed to process data using a transformer encoder-decoder architecture developed by Vaswani et. al. [23], as illustrated in 3. The model’s design includes two inputs (IRF and TPSF) and three outputs, short lifetime, long lifetime, and fraction amplitude. The architecture of MFliNet allows the network to focus on different parts of the input sequences, thanks to the Transformers’ attention mechanism.

In MFliNet, the attention mechanism is implemented using multi-head self-attention layers. Each layer has 16 attention heads, allowing the model to process various aspects of the input data simultaneously. In the self-attention mechanism, three vectors, Query (Q), Key (K), and Value (V), are generated from the input data. These vectors are created through linear transformations, which means that the same input tensor is multiplied by different learned weight matrices to produce the Q, K, and V vectors:

Q = XW^Q, K = XW^K, V = XW^V (3)

where X is the input sequence, and W^Q, W^K, and W^V are the weight matrices for the query, key, and value projections, respectively. This means that although Q, K, and V come from the same original data, they are transformed into different representations that the model uses to compute attention. The self-attention process begins with the dot-product of the query vector with the key vectors, resulting in a score matrix:

where d_k is the dimension of the key vectors. This score matrix indicates the relevance of each time step in the sequence relative to the current time step. The scores are then scaled by the square root of d_k to maintain stable gradients. After scaling, the scores are passed through a SoftMax function to produce attention weights for computing a weighted sum of the value vectors:

These weights are adjusted to determine the focus of the model on different parts of the input, producing a context vector that is highlighting key segments from the input. This approach enables the model to capture dependencies by recognizing relationships between elements in the sequence. The self-attention mechanism achieves this by comparing each part of the sequence with every other part, allowing the model to identify and leverage patterns that span the entire sequence that traditional models might overlook. The context vectors from all attention heads are then concatenated and linearly transformed to produce the final output of the multi-head self-attention layer:

After the tuning process, using two identical stacks of transformer encoder layers yielded the most optimal results for our application. Each encoder layer contains a multi-head self-attention mechanism followed by a fully connected feed-forward network (FFN). The FFN, composed of linear transformations and activation functions, further processes the attention outputs, enhancing the model’s capacity to learn complex patterns:

FFN(x) = ReLU(xW₁ + b₁)W₂ + b₂ (7)

Residual connections and layer normalization are incorporated into each encoder layer to stabilize training and improve gradient flow. Layer normalization standardizes the outputs, reducing training time and enhancing model performance. In each layer of the encoder and the embedding layers, outputs are consistently maintained at a dimension of the inputs. This consistency, as stated in [23], is crucial for the effective use of residual connections and layer normalization, both essential for the network’s stability and performance. Furthermore, to enhance the model’s ability to discern more global and complex relationships across the input sequence, self-attention layers are placed after the encoder layers for both inputs. This structure allows the model to understand and focus on the variations in the offset of IRF and encode these variations between the inputs and the FLI parameters.

The outputs from the two paths are concatenated and passed through a convolutional neural network (CNN) residual block to integrate and refine the extracted features. While the self-attention layers and the encoder layers are adept at capturing dependencies and long-range relationships within the data, where CNNs fall short, CNNs, on the other hand, are more suited to capturing local patterns and can identify local or short-term temporal dependencies [29, 30]. The CNN residual block is introduced after the attention layers to address this complementary need. This capability was used for modeling the variations in time-domain data, such as subtle fluctuations in captured signals. The CNN residual block consists of several key components. The block contains multiple convolutional layers that apply convolution operations to the input data. After each convolutional layer, batch normalization is applied. This technique normalizes the outputs of the convolutional layers, stabilizing the learning process and improving the convergence speed. Skip connections help address the vanishing gradient problem, allowing gradients to flow more easily through the network, further improving training stability.

The network splits into three branches to compute the short lifetime, long lifetime, and fraction amplitude, allowing a focused analysis of each output simultaneously. Each branch passes through additional self-attention layers with layer normalization and transformer decoder layers. By doing so, the model can better filter out the noise in the inputs, leading to improved performance in extracting and predicting the lifetime parameters. The transformer decoder in MFliNet incorporates both self-attention and cross-attention layers. While self-attention captures in-sequence importance, the cross-attention layer adjusts the weights to align the processed outputs with the decoder’s current state, effectively comparing different sequences and mapping them to the correct FLI parameters. This helps transform and integrate the previously learned characteristics of the input signals and map them to the correct FLI parameters, where the attention mechanism here follows the same formula as Eq. 5, with the weights adapted accordingly. The final layers in each pathway are CNN layers with a kernel size of 1x1, incorporating L2 regularization. This configuration produces the final output parameters.

The model was trained using an NVIDIA RTX 3090 GPU. The training data consisted of 2000 samples (2000x28x28 images resulting in 1,568,000 generated TPSFs and corresponding IRFs), where 10% were used for validation. After hyperparameter tuning, the RMSProp optimizer was chosen following the previous study, with an adaptive learning rate starting from 0.001. Mean Squared Error (MSE) was used as the loss function for each output branch. This architecture and training process ensured that MFliNet could accurately model and predict the desired outputs from the FLI data.

4.4 Phantom preparation and the in vivo experiment

For experimental validation, we designed a step ladder phantom to introduce variations in sample-detector distance, as depicted in Fig. 4(a). A 3D printable case was designed to accommodate five discrete containers arranged at various heights: ground level, 5 mm, 10 mm, 15 mm, and 20 mm. Each container was crafted with dimensions of 40×40×10 mm to accommodate tissue-mimicking phantoms. The phantoms were made with agar constituting 1% of the total volume (80 cm³). To prepare the phantoms, agar was first dissolved in distilled water, heated until fully integrated, and then allowed to cool slightly before further processing. The optical properties of the phantoms (absorption coefficient (µ_a) of 0.005 mm^− 1 and a reduced scattering coefficient ) of 1 mm^− 1) were controlled through the addition of India Ink and intralipid solutions to provide absorption and scattering contrasts, respectively. In each ladder step, a specific area was designated for the placement of a cuboidal fluorescence embedding with dimensions of 5×5×40 mm. This embedding consisted of Alexa Fluor 700 dye dissolved in phosphate-buffered saline to achieve a concentration of 20 µM. The embeddings were placed at a depth of 1 mm from the surface of each phantom.

For in vivo MFLI imaging experiments, we imaged HER2 + breast tumor xenografts HCC1954 in athymic nude mice. The cell line was sourced from ATCC (Manassas, VA, USA) and maintained in RPMI 1640 media enriched with 10% fetal bovine serum (ATCC) and 50 units/mL/50 µg/mL penicillin/streptomycin from ThermoFisher Scientific (Waltham, MA, USA). We initiated tumor xenografts by subcutaneously injecting 5 × 10⁶ HCC1954 cells suspended in PBS and mixed in a 1:1 ratio with Cultrex BME (R&D Systems Inc, Minneapolis, MN, USA) into the inguinal mammary fat pads of female athymic nude mice aged 4 weeks (CrTac: NCR-Foxn1nu, Taconic Biosciences, Rensselaer, NY, USA). Tumors were monitored daily for 4 weeks. The mouse was administered with a retro-orbital injection of AF700 conjugated with MDT-TZM (MDT-TZM-AF700) at 20 µg and AF750 conjugated with MDT-TZM (MDT-TZM-AF750) at 40 µg in a 2:1 acceptor to donor ratio through staggered injection. Donor injection was performed 2 hours ahead of acceptor injection through the retro-orbital route. MFLI Imaging was conducted 24 hours post-injection using the MFLI imaging setup. Throughout the imaging process, the mouse was anesthetized with isoflurane, and the body temperature was maintained with a Rodent Warmer X2 (Stoelting, IL, USA). All animal procedures were conducted with the approval of the Institutional Animal Care and Use Committee (IACUC) at both Rensselaer Polytechnic Institute and Albany Medical College. The animal facilities of both institutions have been accredited by the American Association for Accreditation for Laboratory Animals Care International.

To examine the offset variation across different heights on the phantom and in live intact animals, we plotted the pixelwise IRFs for comparison, as shown in Fig. 4. For each specified height in the step ladder phantom, we plotted the average IRFs in Fig. 4 (a). Moreover, we also illustrated the IRFs of various anatomical regions in live intact animals, including the liver, urinary bladder (UB), and tumors, in Fig. 4(b). Lastly, in Fig. 4(c), we examined the variability of the IRF within a single tumor. This plot contains IRFs on three points within a tumor: the top, middle, and bottom. The top refers to an IRF from the upper region of the tumor, the middle corresponds to an IRF from the central area in terms of height, and the bottom shows an IRF from the lowest part of the tumor.

Conflict of interest.

The authors declare no conflicts of interest.

Acknowledgements.

The authors thank Dr. Xavier Michalet for his support with the AlliGator software. This work was supported by the National Institutes of Health (NIH) Grants R01CA237267, R01CA250636 and R01CA271371.

Data availability.

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Becker, W.: Fluorescence lifetime imaging–techniques and applications. Journal of microscopy 247(2), 119–136 (2012)
Ochoa, M., Smith, J.T., Gao, S., Intes, X.: Computational macroscopic lifetime imaging and concentration unmixing of autofluorescence. Journal of biophotonics 15(12), 202200133 (2022)
Rudkouskaya, A., Sinsuebphon, N., Ochoa, M., Chen, S.-J., Mazurkiewicz, J.E., Intes, X., Barroso, M.: Multiplexed non-invasive tumor imaging of glucose metabolism and receptor-ligand engagement using dark quencher fret acceptor. Theranostics 10(22), 10309 (2020)
Kumar, A.T., Hou, S.S., Rice, W.L.: Tomographic fluorescence lifetime multiplexing in the spatial frequency domain. Optica 5(5), 624–627 (2018)
Wang, M., Tang, F., Pan, X., Yao, L., Wang, X., Jing, Y., Ma, J., Wang, G., Mi, L.: Rapid diagnosis and intraoperative margin assessment of human lung cancer with fluorescence lifetime imaging microscopy. BBA clinical 8, 7–13 (2017)
Suhling, K., Hirvonen, L.M., Levitt, J.A., Chung, P.-H., Tregido, C., Le Marois, A., Rusakov, D.A., Zheng, K., Ameer-Beg, S., Poland, S., et al.: Fluorescence lifetime imaging (flim): Basic concepts and recent applications. Advanced Time Correlated Single Photon Counting Applications, 119–188 (2015)
Datta, R., Heaster, T.M., Sharick, J.T., Gillette, A.A., Skala, M.C.: Fluorescence lifetime imaging microscopy: fundamentals and advances in instrumentation, analysis, and applications. Journal of biomedical optics 25(7), 071203–071203 (2020)
Dmitriev, R.I., Intes, X., Barroso, M.M.: Luminescence lifetime imaging of three dimensional biological objects. Journal of Cell Science 134(9), 1–17 (2021)
Gao, S., Li, M., Smith, J.T., Intes, X.: Design and characterization of a time domain optical tomography platform for mesoscopic lifetime imaging. Biomedical Optics Express 13(9), 4637–4651 (2022)
Venugopal, V., Chen, J., Intes, X.: Development of an optical imaging platform for functional imaging of small animals using wide-field excitation. Biomedical optics express 1(1), 143–156 (2010)
Kumar, A.T.: Macroscopic fluorescence imaging. In: Imaging from Cells to Animals In Vivo, pp. 91–106. CRC Press, (2020)
Berezin, M.Y., Achilefu, S.: Fluorescence lifetime measurements and biological imaging. Chemical reviews 110(5), 2641–2684 (2010)
Chavez, L., Gao, S., Intes, X.: Characterization of fluorescence lifetime of organic fluorophores for molecular imaging in the shortwave infrared window. Journal of Biomedical Optics 28(9), 094806–094806 (2023)
Nothdurft, R., Sarder, P., Bloch, S., Culver, J., Achilefu, S.: Fluorescence lifetime imaging microscopy using near-infrared contrast agents. Journal of microscopy 247(2), 202–207 (2012)
Rudkouskaya, A., Smith, J.T., Intes, X., Barroso, M.: Quantification of trastuzumab–her2 engagement in vitro and in vivo. Molecules 25(24), 5976 (2020)
Rudkouskaya, A., Sinsuebphon, N., Ward, J., Tubbesing, K., Intes, X., Barroso, M.: Quantitative imaging of receptor-ligand engagement in intact live animals. Journal of controlled release 286, 451–459 (2018)
Marcu, L.: Fluorescence lifetime techniques in medical applications. Annals of biomedical engineering 40, 304–331 (2012)
Chen, S.-J., Sinsuebphon, N., Rudkouskaya, A., Barroso, M., Intes, X., Michalet, X.: In vitro and in vivo phasor analysis of stoichiometry and pharmacokinetics using short-lifetime near-infrared dyes and time-gated imaging. Journal of biophotonics 12(3), 201800185 (2019)
Smith, J.T., Yao, R., Sinsuebphon, N., Rudkouskaya, A., Un, N., Mazurkiewicz, J., Barroso, M., Yan, P., Intes, X.: Fast fit-free analysis of fluorescence lifetime imaging via deep learning. Proceedings of the National Academy of Sciences 116(48), 24019–24030 (2019)
Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., Sun, L.: Transformers in time series: A survey. arXiv preprint arXiv:2202.07125 (2022)
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: A survey. ACM computing surveys (CSUR) 54(10s), 1–41 (2022)
Zaheer, M., Guruganesh, G., Dubey, K.A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang, L., et al.: Big bird: Transformers for longer sequences. Advances in neural information processing systems 33, 17283–17297 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L ., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Ochoa, M.I., Ruiz, A., LaRochelle, E., Reed, M., Berber, E., Poultsides, G., Pogue, B.W.: Assessment of open-field fluorescence guided surgery systems: implementing a standardized method for characterization and comparison. Journal of Biomedical Optics 28(9), 096007–096007 (2023)
Venugopal, V.: A Small Animal Time-resolved Optical Tomography Platform Using Wide-field Excitation. Rensselaer Polytechnic Institute, (2011)
Lakowicz, J.: In Principles of Fluorescence Spectroscopy. Springer, (2006)
Li, D.D.-U., Arlt, J., Tyndall, D., Walker, R., Richardson, J., Stoppa, D., Charbon, E., Henderson, R.K.: Video-rate fluorescence lifetime imaging camera with cmos single-photon avalanche diode arrays and high-speed imaging algorithm. Journal of biomedical optics 16(9), 096012–096012 (2011)
Chen, S.-J., Sinsuebphon, N., Barroso, M., Intes, X., Michalet, X.: Alligator: A phasor computational platform for fast in vivo lifetime analysis. In: Optical Molecular Probes, Imaging and Drug Delivery, pp. 2–2 (2017). Optica Publishing Group
Lara-Ben´ıtez, P., Carranza-Garc´ıa, M., Riquelme, J.C.: An experimental review on deep learning architectures for time series forecasting. International journal of neural systems 31(03), 2130001 (2021)
Yang, H., Wei, S., Wang, Y.: Stfeformer: Spatial–temporal fusion embedding transformer for traffic flow prediction. Applied Sciences 14(10), 4325 (2024)

The authors declare no competing interests.

Download PDF

Version 1

posted

You are reading this latest preprint version

Transformer-based Deep Learning Model for Fluorescence Lifetime Parameter Estimations using Pixelwise Instrument Response Function

Status:

Version 1

Abstract

Figures

1 Introduction

2 Results

3 Discussion

4 Methods

4.1 Imaging setup

4.2 Generation of training data and classical Fluorescence lifetime processing

4.3 Deep learning network architecture

4.4 Phantom preparation and the in vivo experiment

Declarations

Conflict of interest.

Acknowledgements.

Data availability.

References

Additional Declarations

Status:

Version 1