Principle of the data acquisition system
This section describes and demonstrates how to measure the reflection spectrum of a target sample under sunlight, and the simplified schematic is shown in Fig. 1. When light is shone on the target picture, a transformation occurs as follows:
$$\int L\left(\lambda \right)S\left(\lambda ,X,Y\right)T\left(\lambda \right)D\left(\lambda \right)d\lambda =I\left(X,Y\right)$$
1
Makes the conversion into coded input data that can be fed into the algorithm model. In the formula, \(L\left(\lambda \right)\) represents daylight light information, \(S\left(\lambda ,X,Y\right)\)is the detection target, where \(\left(X,Y\right)\) is the two-dimensional position coordinates of the single-point spectrum, \(D\left(\lambda \right)\) is the detection camera's response function, \(T\left(\lambda \right)\) is the designed encoding matrix, and \(I\left(X,Y\right)\) is the encoding intensity of the corresponding position coordinates. This conversion process requires first pasting the designed encoding matrix in front of the camera, which is used to test the spectral information of the reflected light of the sample \(S\left(\lambda ,X,Y\right)\), so the spectral intensity is \(L\left(\lambda \right)S\left(\lambda ,X,Y\right)\) before entering the camera conversion. Then the encoded intensity \(I\left(X,Y\right)\) is finally obtained by the camera's encoding matrix \(T\left(\lambda \right)\) and response function \(D\left(\lambda \right)\). The specified encoding matrix can be turned into a chip that fits exactly with the camera to make spectral measurement more convenient.
Filter function design and construction of data set
The data construction in this study is divided into two parts. To construct an accurate spectral reconstruction model, suitable input and output data needs to be collected. The first part of the input data is a complete library of filtering functions simulated by FDTD and set by the expectation to 0.85 as a threshold condition as an initial sieve, and then utilized the cross-correlation function:
$$\rho \left(X,Y\right)=\frac{Cov(X,Y)}{\sqrt{D\left(X\right)D\left(Y\right)}}=\frac{E\left(XY\right)-E\left(X\right)E\left(Y\right)}{\sqrt{E\left({X}^{2}\right)-{E}^{2}\left(X\right)}\sqrt{E\left({Y}^{2}\right)-{E}^{2}\left(Y\right)}}$$
2
\(X,Y\) denote different curves respectively, the value of \(Cov\) is the covariance of \(X,Y\) curves, and its calculation result ρ is a scalar and the range \(\rho\)∈[-1,1]; when \(\rho\) >0, it means positive correlation, and the closer to 1 means the stronger correlation; \(\rho\) <0, it means negative correlation and the closer to -1 means the stronger correlation; when the correlation coefficient is closer to 0, the correlation is weaker, and equals to 0 when two variables are not Correlation[33]. Using this formula, each of the filter function curves is used to calculate the correlation between the filter functions screened out at the beginning. Finally, the 100 filter functions whose absolute value is closest to 1 are sorted out and arranged into a 10x10 coded array.
In the second part of the experiment, spectral simulations were first performed using advanced basis functions to generate a series of spectral curves. These simulated spectral curves introduced varying degrees of Gaussian noise on top of the actual spectral data, including noise levels of 5%, 10%, 15%, and 20%. In total, 20,000 such simulated spectral data were generated to ensure broad coverage of noise levels and spectral features.
Next, these simulated spectral data were coded and transformed into uncoded input data using a pre-constructed filter function coding matrix. Then use the following formula to convert the simulated spectrum into decoded data:
$${I}_{i}={\int }_{{\lambda }_{1}}^{{\lambda }_{2}}{T}_{i}\left(\lambda \right)(f\left(\lambda \right)+{e}_{i})d\lambda +\approx \sum _{j=1}^{M}{T}_{i}\left({\lambda }_{i}\right)(f\left({\lambda }_{i}\right)+{e}_{i}),i=1,\cdots ,N$$
3
The appropriate input-encoded data were obtained in this manner, where \({T}_{i}\left(\lambda \right)\) is the encoding matrix, \(f\left(\lambda \right)\) is the spectral data, and \({e}_{i}\) is the noise level. Its integration can be decomposed into the sum of multiple points, where M is the number of measurements within the measurement band and N is the number of encodings. This step was designed to simulate the process of encoding and reconstruction of spectral data in real applications. Through this process, a total of 100,000 simulated data was obtained, which included both training and validation sets.
To ensure the accuracy and robustness of the model, these 100,000 simulated data were divided into training and validation sets in a ratio of 8:2. It is worth noting that the selection of the validation set was done randomly to ensure extensive testing of different spectral data.
Finally, an additional 1000 simulated spectral data that were not in the training and validation sets were transformed to serve as an independent test set to evaluate the accuracy and performance of the model on unseen data. This step is important to test the performance of our model in real-world applications.
The framework of the neural network model
The neural network architecture for spectrum reconstruction can be expressed easily as
\(FC\left(100\right)\to LR\to Self-Attention\left(100\right)\to FC\left(120\right)\to LR\to FC\left(150\right)\to LR\to FC\left(200\right)\to LR\) . Each digit represents the number of cells in the relevant layer. The ReLU function is denoted by the symbol LR. fc means a fully connected layer, 100 random spectral filters correspond to the number of input cells, and 200 denotes the reconstructed spectral channels (8um-12um, 0.02um steps). In this paradigm, the self-attention mechanism and residuals are used for training. To get more relevant features, the input incoming data is first Processed to a self-attention operation, and then residuals are incorporated in each neural layer to minimize parameter explosion and overfitting during training. Deionizing in normal compressed sensing algorithms, on the other hand, heavily relies on prior knowledge, and parameters are typically updated manually throughout the iterative process to offset the bias generated by noise, which is effective but does not provide convincing results when the noise level varies. As a result, regularization parameters are introduced to the training model to improve its robustness [31][32].
In which Self-Attention is computed in the following steps:
1. Generate Query, Key and Value:
First, the query vector (Q), key vector (K), and value vector (V) are generated by a linear transformation of the input data (usually using a weight matrix).
Generating these vectors usually involves the multiplication of weight matrices to map the input data into a low-dimensional representation space for subsequent computation.
2. Compute the attention scores:
Next, the attention scores between each query (Q) and each key (K) are computed, usually using the dot product (Dot Product) method. For a single query vector, the dot product is computed with all key vectors, and then a scaling factor is applied to control the range of scores. The formula is as follows:
$$Attention\left(Q,K\right)=Softmax\left(\right(Q*{K}^{\wedge }T)/sqrt({d}_{k})$$
4
Where Q is the query vector, K is the key vector and \({d}_{k}\) is the dimension of the key vector. The scores can be converted to probability distributions by applying the Softmax function.
3. Weighted Summation:
The computed attention scores perform a weighted summation on the value vector (V) to generate the final self-attention representation:
$$Self-Attention\left(Q,K,V\right)=Attention\left(Q,K\right)*V$$
5
This step involves a weighted summation of the value vectors based on the attention scores to produce a self-attention representation. Finally, the reconstructed spectral data is derived from the self-attention values in the subsequent full connectivity. The residual mechanism is added at each fully connected layer to prevent the gradient from disappearing during training. The result is shown in Fig. 2B.
Training and validation
In our model training process, we chose the mean square error (MSE) as the loss function, which is a loss function commonly used in regression problems. The MSE measures the mean squared error between the model predictions and the actual observations. By minimizing the MSE, our model can better fit the spectral data and ensure that the resulting reconstruction is as close as possible to the real spectrum.
To improve the stability and speed of training, the batch normalization technique is applied. Batch normalization helps to normalize the inputs of each layer in the neural network, ensuring that the distribution of the data is stable and reducing the problem of gradient explosion or gradient vanishing. This is very useful when training deep neural networks and helps the model to converge faster. The Adam optimization algorithm was chosen to tune the model parameters, which combines the characteristics of the momentum method and the adaptive learning rate and is usually able to find the global minimum faster.
In addition, the appropriate learning rate for model training was set to ensure that the weights were updated appropriately during the training process to avoid the model falling into local minima or divergence. Meanwhile, to avoid the overfitting problem, two regularization methods are used, namely L1 regularization and drops out. L1 regularization drives the model to be sparser and reduces the possibility of overfitting by applying penalties to the weight parameters of the model. Dropout, however, randomly discards some neurons during training to prevent the model from relying too much on the training data and improve the generalization performance.
To verify the performance and practical application value of the model, we used actual spectral data measured by an optical spectrometer. These data have actual physical significance and reflect the nature of the actual spectral signals, thus ensuring that the model can be accurately reconstructed in the face of spectral data in real environments, thus demonstrating the practical usability and effectiveness of the model. This experimental design helps to generalize the research results to practical applications and provides solutions to practical problems in related fields.
Spectral reconstruction performance indicators
Two related indicators were used as reconstruction indicators. The R2 similarity function is used to evaluate the following metrics:
$${R}^{2}=1-\frac{{{{\Sigma }}_{i=1}^{n}\left({y}_{i}-{Y}_{i}\right)}^{2}}{{{{\Sigma }}_{i=1}^{n}\left({Y}_{i}-\stackrel{-}{Y}\right)}^{2}}$$
6
where \({y}_{i}\) is the reconstructed spectrum's intensity value, \({Y}_{i}\) is the simulated spectrum's intensity value, and \(\stackrel{-}{Y}\) is the simulated spectrum's average intensity value; as well as MSE (mean square error function):
$$MSE=\frac{1}{n}\sum _{i=1}^{n}{\left({y}_{i}-{Y}_{i}\right)}^{2}$$
7
Where \({y}_{i}\) is the intensity value of the reconstructed spectrum and \({Y}_{i}\) is the intensity value of the simulated spectrum.