1. LPM curves of FNDs providing feasibility for 3D anticounterfeiting
Diamond provides promising material properties for fabricating PUF labels, including ultrahigh photostability, long-term stability, and tolerance to physical stress. Specifically, both the Raman signal of diamond and the fluorescent signal of NV centers can be continuously emitted without blinking or bleaching31, 32, 38, which provides the basis for maintaining the reproducibility of optical readout results. In addition, ultrahigh hardness35 and inertness34 make diamond-based labels tolerant to physical stress and long-term storage, respectively. However, despite the huge potential of existing diamond-based PUF labels38, 39, 48, it has not yet been possible to achieve the much-desired high-dimensional (> 2D) encoding. We here propose a method to achieve a 3D encoded diamond-based PUF label, based on the linear polarization modulation (LPM)47 curves of FNDs with random orientations.
Fabricated via an electrostatic absorption approach (See Methods for details), our PUF label is composed of FNDs with both high density and satisfactory dispersion on the cover glass (See SEM image in Fig. 2b). Due to the above two characteristics, there are 200–450 bright spots (BSs) over 30 µm × 30 µm area with a high probability close to diffraction-limited size, in fluorescent images of FND PUF labels. A typical example is given in Fig. 2a. As for the 3D anticounterfeiting information from FND PUF labels, the large number of BSs provides the basis for the distinguishability of encoded images (See Supplementary Notes 1 for detailed analysis), and diffraction-limited sizes of BSs are essential for sensitive optical readout (See latter content for details).
In the fluorescent images of our FND PUF labels, diffraction-limited BSs with LPM curves provide the foundation for obtaining three-dimensional anticounterfeiting information. Specifically, based on the optical polarization selective excitation phenomenon of NV centers30, the LPM curves show the relationship between the polarization direction of linearly polarized excitation laser (\(\beta\)) and fluorescent intensity of FNDs (\({I}_{\beta }\)) (Fig. 1a). In actual measurement, when \(\beta\) is changed at a constant speed through rotating a half wave plate with an electric rotation stage, wide-field fluorescent images are taken for a PUF label with a gap of \(\beta\) as 6 degrees (See Methods for more details). To exactly extract the fluorescent signal corresponding to diffraction-limited BS (size is around 10 pixels × 10 pixels), \({I}_{\beta }\) is calculated via the total signal of 13 pixels × 13 pixels matched with the identified position. Experimental results show that the LPM curves of the identified diffraction-limited BSs can be well fitted (solid lines in Fig. 2c) via Eq. 147 with a coefficient of determination usually larger than 0.85.
$${I}_{\beta }={A}_{1}-{A}_{2}{cos}^{2}\left(\alpha -\beta \right) \left(1\right)$$
Where \({A}_{1}, {A}_{2}, \alpha\) are fitting parameters (\({A}_{1}>0, {A}_{2}>0\)), and \({I}_{\beta }\), \(\beta\) are input parameters. Examples of six fitted LPM curves are shown in Fig. 2c. It should be noted that these LPM curves show different LPM contrast values (\(\frac{{A}_{2}}{{A}_{1}}\)) and LPM phase (the fitting result of \(\alpha\)), corresponding to different orientations of FNDs. Therefore, based on the fluorescent images under different \(\beta\) values, it is possible to obtain the 3D encoded information, including LPM contrast values, LPM phases, and the positions of the diffraction-limited BSs.
For achieving a sensitive optical readout of the above 3D anticounterfeiting information, a sufficiently high LPM contrast value is required. We define the following criteria to determine a sufficiently high LPM contrast value: larger than 15%, which is 10 times the fluorescent intensity error (around 1.5%) in long-term detection (Figure S1). Experimental results show that diffraction-limited BSs own LPM contrast values usually larger than 15%, but the LPM contrast values of bigger BSs have a high probability of less than 15% (Figure S2). Therefore, the crucial point to obtain a sufficiently high LPM contrast value is the high probability of finding diffraction-limited BSs. Our PUF label is well matched with this crucial point (see the previous description of Fig. 2a), which causes a high ratio of obtaining high enough contrast values. A typical example is given in Fig. 2d: 215 of 306 identified BSs own LPM contrast values larger than 15%.
2. Anticounterfeiting performance of 3D encoded images
Then, based on the above mentioned 3D anticounterfeiting information, we proposed a 3D encoding scheme to obtain digitized images. With the classical and widely used authentication method called point-by-point comparison5, the anticounterfeiting performances of digitized images were tested for distinguishability, reproducibility, and long-term stability.
Digitized results were obtained based on the optical images of FND PUF labels under different \(\beta\) values with pixel resolution of 32 × 32. (See Methods for details) To effectively show the information in the two dimensions including LPM contrast value and LPM phase, a feasible method is to utilize the relative change of \({I}_{\beta }\) corresponding to different \(\beta\). To reflect this relative change, we convert the photon number to contrast values in the image pixels via Eq. 2:
$${contrast}_{\beta =n}=\frac{{counts}_{\beta =n}-{counts}_{\beta =0}}{{counts}_{\beta =0}} \left(2\right)$$
Where \({contrast}_{\beta =n}\) and \({counts}_{\beta =n}\) mean the contrast values and photon number in image pixels, respectively, with the condition of \(\beta =n\). In the digitization process, 9 contrast levels are set according to Table S2, which is designed based on the precision and range of the contrast value. An example of the digitized images is given in Fig. 3a: there are three encoding dimensions, including contrast levels, polarization angles, and pixel position. Calculated via the inserted formula5, 11 in Fig. 3a, the encoding capacity of our PUF label can arrive at 910×1024 (109771), which is much larger than the commonly suggested minimum encoding capacity (10300)5. In addition, the readout time corresponding to these digitized images is just 7.5 s. Therefore, we achieved a sufficiently large encoding capacity as the basis for unbreakable encryption, within a relatively short readout time.
To demonstrate the feasibility of the above encoding method in distinguishing different PUF labels, we applied the broadly used authentication method called point-by-point-comparison5. Specifically, when two groups of digitized images are compared pixel-by-pixel, the ratio of the same pixels is recorded as the similarity index11. If there exists an evident gap between the similarity indexes of the same labels and different labels, a threshold value within the gap can be chosen to correctly distinguish the different PUF labels. In our authentication process, the similarity indexes are calculated among the two groups of digitized images for 100 PUF labels. Calculation results are shown in Fig. 3b: the similarity indexes of the same labels are always higher than 76%, and the similarity indexes of the different labels are always less than 69%. Therefore, we can choose the threshold value as 75% (within the gap of 69–76%) to successfully distinguish all the 100 PUF labels.
Moreover, based on the above threshold value, we evaluated the reproducibility and long-term stability of the digitized readout results for our PUF label. First, for reproducibility, we calculated the similarity indexes among 10 groups of digitized images for the same PUF label. All the authentication results for 20 PUF labels show satisfactory reproducibility in Fig. 3c: the similarity indexes are always significantly higher than the threshold value (75%) (See Figure S3 for specific examples of the reproducibility of a specific label). Then, long-term stability was tested based on repeated readout results of one PUF label for a period of around 150 days. For each readout date, corresponding similarity indexes are calculated between the digitized readout results on this date and the digitized readout results on the first day. The authentication result confirmed satisfactory long-term stability: the similarity indexes are always stable at around 84%, much higher than the threshold value (75%) (Fig. 3d).
3. Deep metric learning for authenticating noise-affected digitized images
In terms of practical authentication for anticounterfeiting labels, the system’s tolerance to common noise sources is crucial. Specifically, in the practical authentication process, manufacturers provide images taken under ideal laboratory conditions, but end users may employ images taken in environments saturated with various additional noise sources. Even if the optical signal from a PUF label is stable, these noise sources influence the readout process, which might prevent the authentication algorithms (i.e., point-by-point comparison) from working42.
To this end, simulating the actual authentication process, we conduct a noise resilience evaluation for point-by-point comparison method with FND PUF label. In the noise resilience evaluation, images of PUF labels taken under two different optical readout conditions are used for authentication based on the similarity index, as shown in the left panel of Fig. 4a. Specifically, our experiment mirrors the image capture process by both the manufacturer and the end users, where an optical readout of 50 PUF labels was conducted under ideal laboratory conditions and laboratory conditions contaminated with three kinds of noise sources, respectively (refer to Supplementary Notes 3 for details). These noise sources include background light, sample drift, and out-of-focus, all of which are common noise sources for the readout process. Digitized images corresponding to the above two optical readout conditions show evident differences, with a typical example in the right panel of Fig. 4a. As shown in Fig. 4b, experimental results show that there is an overlap between the distributions of the similarity index for intra-digitized images and inter-digitized images. This implies that it is impossible to find an appropriate threshold to distinguish PUF labels. In noise resilience evaluation results for other PUF labels based on point-by-point comparison42, negative results have also been found and were attributed to the widespread limitation of the point-by-point comparison method, i.e., sensitive to some noise sources. Therefore, this highlights the need for a more robust authentication algorithm.
To develop a more robust algorithm tolerant of noise, our key inspiration was that it was hard, using the naked eye, to differentiate digitized images of a PUF label in the presence of noise (such as Fig. 4a). However, it is possible for us to recognize natural images with the naked eye, even when they are mixed with noise49. The reason is that most of the noise in our daily life affects information at the pixel level, while we have seen many natural images and have been “trained” to recognize them based on high-level semantic information50. Thus, to solve the challenge in noise resilience evaluation, the critical factor is to show our machine as many PUF labels as possible at an early stage (i.e., prior information), and teach them to differentiate different PUF labels based on patch-level information.
Given the profound capacity of deep learning51–53 to learn prior information and the ability of a convolutional neural network to extract deep patch-level features from images, we propose to exploit these features for our authentication system. In particular, we propose the use of deep metric learning44 to conduct anticounterfeiting authentication. Specifically, metric learning excels at identifying essential differences between data instances, making it well-suited for authentication tasks. By learning a distance metric that reflects intrinsic similarities and dissimilarities among instances, metric learning has demonstrated effectiveness in actual applications such as robust face verification in the wild54, variation-tolerant face recognition46, 55, and person re-identification56.
The core concept of metric learning, as shown in Fig. 5a, is to enable the accurate clustering of images based on their content, even when subjected to noise or distortions. In the original image space depicted on the left side of Fig. 5a, we have an image \({I}_{X}\), which becomes more distant from its original position when affected by noise, resulting in the image \({I}_{{X}^{{\prime }}}\). Consequently, if we measure the distance with the point-by-point comparison, Image \({I}_{X}\) becomes closer to another image \({I}_{Y}\), which could cause a wrong matching. To tackle this issue, our metric learning framework utilizes a neural network trained to extract noise-resistant features, allowing images with similar content to be accurately categorized together, as depicted on the right side of Fig. 5a. In other words, high-level information of the digitized image is extracted and represented like a deep key in the metric space. In this way, we can compare digitized images’ similarity by calculating a similarity score (see method) in the metric space, instead of a point-by-point comparison in the original image space.
To achieve this goal, the key innovation of metric learning lies in its unique training strategy. As shown in Fig. 5b, a deep neural network is trained to bring features (i.e., \({F}_{X}\) and \({F}_{{X}^{{\prime }}}\)) belonging to the same PUF label closer together while pushing features (i.e., \({F}_{Y}\) and \({F}_{{X}^{{\prime }}}\)) from different PUF labels further apart. Before training, features from the same label might be far apart in the metric space (or too close to a different label), which is contrary to our desired outcome. In such cases, the loss function penalizes the network, forcing it to adjust its parameters in the appropriate direction via back-propagation. After several iterations, our network learns how to extract the key information from images, resulting in a metric space where features of the same label are close together, and features of different labels are farther apart. Compared to similarity indices based on point-by-point comparison, neural networks typically extract patch-level structural features, which are more resilient to noise.
We demonstrate the robustness of our method against a variety of noise sources commonly encountered during the readout process. Especially, during training, we do not assume prior knowledge about the noise that may present in practical use and apply PUF data from the ideal condition for training. This avoids introducing biased evaluation results. For validation, we use the same 50 pairs of PUF images employed in Fig. 4b. As depicted in Fig. 5c,d, even under challenging conditions, our method accurately distinguishes between different pairs of PUF labels and identical pairs of PUF labels with 100% precision. It can be seen that there is a significant gap (approximately 27% − 38%) between the similarity score distributions of intra-pair PUF labels and inter-pair PUF labels. By contrast, authentication with point-by-point comparison reveals a tendency to confusion (see Fig. 4b,c). These results demonstrate that our algorithm can recognize PUF labels better than the point-by-point comparison method in the presence of the investigated noise sources. Additionally, we provide validation results of our method under ideal conditions in Figure S4, highlighting even more distinct decision boundaries. This further proves the robustness and reliability of our metric learning-based approach for accurate PUF label authentication, whether under ideal conditions or in real-life noise environments.
4. Characteristics of metric learning method relative to classification method.
Some previous studies9, 12, 16, 43 have explored the use of AI methods for random pattern image authentication, and their methods are mainly based on classification. Here, we analyzed the difference between our metric learning-based authentication method and classification-based authentication methods in the training and testing stages.
First, our method is more data-efficient during the training phase. Specifically, learning a classifier requires a lot of data for each category, as it aims to predict the probability for each category given an input. If there is not enough data for a category, the model is prone to overfitting the training data and performs poorly during testing57. Metric learning, on the other hand, focuses on learning a similarity measurement to evaluate whether a given pair of readout results are similar or not, specifically, predicting a similarity score for a pair of readout results. This task can be accomplished with pairs or triplets of samples (a reference sample, a positive sample from the same category, and a negative sample from a different category), and therefore does not require a large amount of sample data for each category57. A striking example of this data efficiency in the training stage is shown in Fig. 6a. For each PUF label, to enable robust recognition, many (tens to thousands) readout results under various conditions should be collected to train the classification method57, while our method requires only two readout results. In sum, our method needs much less training data, thereby saving a lot of time that would otherwise be consumed in the repeated readout of PUF labels.
Second, our method can effectively authenticate novel PUF labels unseen during training (Fig. 5c,d), a capability that classification-based methods lack9, 12, 43. As shown in Fig. 6b, classification methods fix the number of categories (e.g., ten) during the training phase. In this case, the model predicts the probabilities of ten categories and utilizes the highest one to determine the class to which a PUF label belongs. Consequently, if a provider manufactures a new 11th PUF label, the network will still only predict probabilities of ten categories. Under these circumstances, the method would either predict all ten probability values to be low, thereby deeming the label to be false, or one probability might be high, leading to an incorrect classification of the label. By contrast, our method compares whether two readout results of PUF labels are similar and the learned similarity metric can be applied for novel PUF labels. Consequently, even if we only use digitized images of certain PUF labels (such as PUF1-PUF10) during the training stage, we can still compare two new readout results (for instance, from PUF11 and PUF12). Our approach achieves 100% authentication performance on novel PUF labels, as shown in Fig. 5c,d and Figure S4. We emphasize that we use novel PUF labels for all our experimental evaluations. Therefore, our method is well-suited for use in real-world business situations where new PUF labels are continually being created, while the classification technique’s requirement for retraining makes it inconvenient to use in such conditions.