Improved Human Identification by Multi-biometric Image Sensor Integration With a Deep Learning Approach

doi:10.21203/rs.3.rs-4002983/v1

Download PDF

Research Article

Improved Human Identification by Multi-biometric Image Sensor Integration With a Deep Learning Approach

https://doi.org/10.21203/rs.3.rs-4002983/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Biometric identification technology has become a common part of daily life due to the global demand for information security and security legislation. Due to its capacity to circumvent several fundamental drawbacks of unimodal biometric systems, multimodal biometrics technology has attracted attention and grown in popularity in this respect. This research presents a novel multimodal biometric person identification system based on a VGG19 with softmax classifier (VGG19-SC) for iris and facial biometrics. The system's architecture is built on VGG19-SC, which extracts features from and categorizes images. The system was created by combining the iris and face portions of two VGG19-SC models. VGG-19 was employed to construct the well-known pertained model. A few methods, including picture augmentation and dropout techniques, were used to prevent overfitting. The VGG19-SC models were fused using feature-level and score-level fusion methods to investigate the effects of these fusion methods on recognition performance. The results demonstrated that three biometric features were more effective than two and one biometric traits in biometric identification systems. The findings similarly demonstrated the suggested method easily surpassed other cutting-edge approaches by obtaining an accuracy of 99.39% in a multi-biometric verification system.

Deep Learning

biometric systems

face

iris

VGG19-SC

Adopting efficient user recognition systems is necessary for limiting access to these technologies, now more important than ever with an recent acceleration of the emergence of contemporary technical resources. The most potent choice at this time is biometric recognition technologies. The science of using behavioral or physical characteristics, such as the voice and signature, to establish a person's identification by partially or completely automated methods is known as biometrics. Because biometric data is one-of-a-kind and cannot be duplicated, lost, or stolen, it has several benefits over conventional recognition techniques like passwords[1].The matching, decision-making, feature extraction, and sensor modules make up the core of the biometric identification system. Unimodal and multimodal biometric identification systems are both available. The unimodal system identifies the person by a single biometric characteristic. Unimodal systems are limitations even if they are dependable and have demonstrated advantages over previously employed conventional methods. These include problems with similarity within and across classes, noise within sensed data, non-universality, susceptibility to spoofing attempts, and non-uniformity [2]. Multimodal biometric systems require several characteristics to identify people. They are frequently used in real-world settings because they may solve issues with unimodal biometric systems. The information accessible in one of the biometric system's modules can be utilized to combine the many qualities in multimodal biometric systems. A particularly enticing secure identification method, comparing multimodal and unimodal biometric systems, reveals benefits[3]. Several biometrics researchers have employed machine learning methods for recognition. Machine learning algorithms must first structure the raw biometric data and extract its features. Machine learning algorithms (ML) also need several preprocessing steps before feature extraction. In addition, different biometric types andvarious data sets with the same biometrics may need to work better with specific extraction methodologies.

Additionally, they cannot deal with changes to biometric images like zooming and rotating. Deep learning has lately significantly impacted biometrics systems and produced excellent results. Deep learning algorithms have provided solutions to several shortcomings of conventional ML algorithms, notably those that stem from feature extraction techniques [4]. This technology has some drawbacks, although multi-biometric image sensor integration can improve the precision and dependability of human recognition systems. A multi-biometric imaging system's implementation might be pricey. In addition to the infrastructure required for processing and storing various biometric data types, it entails purchasing and maintaining a variety of biometric sensors or devices. Hardware, software, and maintenance expenses might be high. A system's complexity and resource needs may grow if it incorporates many biometric features. The system's capacity to scale may become a problem when the variety of biometric modalities increases, which might slow down processing [5].This study offers a unique multimodal biometric person identification system for iris and face biometrics based on a VGG19 with softmax classifier (VGG19-SC).

The following parts of the article: An overview of related works is given in Section 2, a more thorough explanation of the methodology is given in Section 3, and experimental data sets and simulation results are presented and discussed in Section 4. Section 5 concludes the study and offers suggestions for more research.

The paper [6] suggested providing a more suitable base from which to work by employing a more thorough method to establish a complete grasp of fusion techniques of wearable sensors. The paper [7] developed the reliable Multimodal Biometric System (MBS); the suggested study uses a Deep Learning Convolutional Neural Network (DLCNN) technique.The paper [8] used of K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) classifiers for image classification and face re-identification are two examples of machine learning applications that have been greatly enhanced by metric learning. The paper [9] proposed three different models to improve heterogeneous (cross-sensor) iris identification based on the ensemble of convolutional and residual blocks. The paper [10] suggested a unique, robust, and trustworthy identification method based on multimodal biometrics, which combines fingerprint, ECG, and facial image data using deep learning. This method is especially effective for gender identification and identification. This paper [11] enhanced the convolution deep learning model (ICDLM) to control the patient attributes in the hospital. In addition to the convolutional neural network (CNN) and Hamming Distance (HD), a new hybrid approach for edge detection and segmentation, and feature extraction and classification, was introduced [12]. The paper [13] provided type-2 fuzzy blended enhanced Evidence D-S (DLF) combination rule for multisensory data fusion that eliminates paradoxes of the Dempster-Shafer (D-S) combination rule. The paper [14] developed an ensemble multi-class multimodal classification framework using hybrid multi-class multimodal segmentation, ensemble-based feature selection ranking measure, and hybrid segmentation on various biometric variables. The paper [15] deployed score-level fusion and a challenge-response scenario, a multimodal Presentation Attack Detection (PAD) approach against photo-attack and video-attack in face recognition systems.

This paper proposes a VGG19-SC-based face and iris biometric system. Figure 1 depicts the broad design of the advised strategy. The first two images captured are the iris and face of a user. The individuals' offline signatures, iris scans, and fingerprints are taken as a first step. The client's personality is then determined using the multimodal framework, which comprises two already-implemented models for the iris and face. A deep convolutional neural network architecture known as VGG19 and it has contributed to computer vision problems. The VGG19 image categorization algorithm, unveiled in 2014 by the Visual Geometry Group at the University of Oxford, is renowned for its efficiency and simplicity. It is deeper and more complicated than its predecessor, VGG16, and has 19 layers in total, including 16 convolutional layers and 3 fully linked layers. High accuracy in tasks like object detection and localization is achieved via VGG19's architecture, which includes layered convolutional layers that let it learn complicated visual properties from input images.

The current evaluation reiterates earlier efforts' implicit unimodal iris and faces identifiable proof methods while implementing the suggested multimodal biometric architecture using the two attributes iris, face, and signatures. There has also been the creation of another offline signature single biometric. The three biometrics solutions are combined to provide the offered multimodal solution. After considering past work on uni-model solutions, the correctness of these models is assessed before incorporating them into a multi-model solution.

3.1 Dataset

Using a digital camera, participants from the selected experimental organization will have their face and iris biometric data collected. The dimension of the photos will be 640 by 480 pixels, and they will include the faces and irises of 190 people from three separate samples. The two biometric features will be reduced to 128 by 128 pixels without changing the images. Each photograph should have a backdrop of a bright hue with uniform lighting conditions. There will be 570 photos in each modality in the dataset. 40% will be utilized for authentication, while 60% will be used to train the system. The random sample cross-validation approach is the foundation for the dataset partition decision.

3.2 Pre-processing

The pixel values of an image must be distributed more equally over the range of acceptable values to increase the contrast of a low-contrast image. Histogram equalization is an alternative method that is totally automated, incredibly easy to use, and parameter-free. The probability density function (recorded by the normalized histogram) is first transformed via histogram equalization into a cumulative distribution function (CDF) by determining the histogram's running sum:

$$\stackrel{-}{c}\left[\rho \right] =\sum _{k=0}^{p}\stackrel{-}{h}\left[k\right], \rho = 0, 1, ..., 255$$

Where the histogram's gray levels $p$ are concerned. $\stackrel{-}{c}$ Represents the histogram's running total. $\stackrel{-}{h}$ represents a floating-point number.

When computing the running sum, it is best to initialize the array's first member using formula $\stackrel{-}{c}$ [0] = $\stackrel{-}{h}$ [0] and then update $\stackrel{-}{c}\left[\rho \right]=\stackrel{-}{c}[\rho - 1] + \stackrel{-}{h}\left[p\right]$ for each gray level. A pixel with the gray level is changed to $\stackrel{´}{p}$= Round (255.$\stackrel{-}{ c}\left[\rho \right]$) once the CDF has been calculated. The CDF always evaluates to 1 at the greatest value among integral of a probability density function (PDF) is always 1, which is $\stackrel{-}{c}\left[255\right] = 1$. Consequently, the output falls within the specified range of 0 to 255.

3.3 Feature extraction of VGG19-SC

Convolutional neural networks are deep neural networks used in learning that are most typically used to analyze visual images. Currently, when we think about neural networks, they frequently consider matrix computation, although ConvNet may not operate in this way. It makes use of convolution, a remarkable technique. Currently, convolution in mathematics refers to a numerical computation of two functions that creates a new process that conveys that another changes the state of one.

In terms of technology, each input image used in a deep learning technique was put through a series of convolutional layers for vector transformation using fully connected layers, feature extraction using kernels, size reduction via pooling, and object classification using the Softmax classifier. Back and forward propagation are the two different forms of spread used when creating the VGG19 net. Ward propagation involves the internet gathering data, setting channels, and establishing various boundary parameters erroneously. The information is subsequently shared with the network, which calculates the misfortune value using the irregular borders. The foundation gives the network permission to utilize an enhancement technique to reduce the worth of result tragedy. Backpropagation is used throughout this process to enable network loads and boundaries to be changed and to enable the misfortune value to be reduced as necessary. This prepares the forward propagation's bounds for the subsequent iteration. The biggest difficulty with the CNN model is setting the hyperparameters to produce the desired results. Since hyperparameters contain all of the training variables for the technique's framework or set of training rules, tuning of hyperparameters necessitates determining the algorithm's hyperparameters' ideal value. The hyperparameters include L1 and L2 regularization, dropout value, learn rate, epoch count, batch equality, and batch quantity.

The VGG-19 is a pre-trained network chosen for its ability to recognize the human iris and face and the fact that it makes for an easy network design. It is currently strongly advised for use in deep network techniques. The VGG-19 has input dimensions of 224x224x3. Figure 2 shows the VGG-19, which has 3 completely related layers, 5 pool layers, and 16 convolutional layers. The measurement of the resulting characteristic map is 224x224x64, and the initial convolution layer has 64 channels that are each 3X3 in length. Rectified Linear Unit, a non-linear activation function that passes the convolution layer's output results in a non-linear result, is used by VGG-19. ReLU has following characteristics when its negative value is replaced with 0:

$$F=MAX(0,X)$$

The convolutional layer's output X,

3.4 Fusion Methods

3.4.1 Feature-stage Fusion

Fusion at the feature level involves combining characteristics representative of many attributes. Three qualities' retrieved features were combined to produce additional features that reflect the user. In the training phase of this fusion strategy, the model picked up on the merged characteristics. The combined output of the iris CNNs and second layer of a face is completely linked. Consequently, features are vectors in the two CNN model's second fully connected layer combined to form a single vector, which is depicted by the following definition in Fig. 3 feature-level fusion:

$$a= ar |af$$

Where $ar$ is the extracted facial image features, and $af$ is the extracted iris image features.

After classifying the image based on a similarity score, the softmax classifier identifies an individual by entering the resultant vector$\left(a\right)$.

3.4.2 Score Stage of Fusion

Results in the second completely associated layer for the iris and face are sent to each CNN network's softmax function, retrieving the matching values for the score stage merging methods. The process comprises two sets of Fusion at Score stage method. The first step was to normalize the count values generated by each CNN network, and after that, the VGG-19 networks' matching values were combined using a matching value combination approach. Finally, the model produces the character of the person with the highest total score. Combining the product-based method and number arithmetical mean rule are two distinct score fusion procedures. The product is divided by the total number of characteristics, resulting in a final score calculated using an arithmetical mean rule, includes the matching values for each factor, shown in Fig. 4 fusion at the level of score.

The equation is used to apply the arithmetical mean rule to computations.

$$G={\Sigma }Gd/i$$

Equation (4), for $Gd$, is the trait is score vector, where j is the total number of qualities.

By multiplying the scores of two features, and fused score under the product rule is determined. It turned out as follows:

$${\prod }_{d=1}^{i}Gd$$

In Eq. (5), $Gd$is the matching trait value vector, and J is the number of traits.

3.5 Soft-max classifier (SC)

The multiple-class classification function, often known as the soft argmax, is a variation of the softmax. It begins with an input vector of real p numbers, where p equals the quantity of a matrix-vector of values. It normalizes the contribution to each class by employing a likelihood-based distribution whose absolute sums to 1. Many classes require neuronal organizing techniques since the result values are frequently between zero and one. The aim class is the one with the highest likelihood when using the softmax function to determine the probabilities of each possible class. The softmax equation divides the total of all escalations to normalize the values after applying the vector form's escalation function applied to each component:

$$\sigma {\left(\overrightarrow{y}\right)}_{j}=\frac{{a}^{{h}_{j}}}{{\sum }_{i=1}^{R}{a}^{{h}_{j}}}$$

Where j is the $j,$th class, $yj$ is a parameter, and $y$ is the input vector form.

The equation below can express the softmax function's output value vector.

Softmax result = $[ S1, S2, \dots \dots \dots ,Sn]$ (7)

Si measures the chance something is stored in the j class data store.

The first four squares in the VGG-19 loading were frozen to create our CNNs models since the basic layers' channels scan the images for low-level objects like points and lines. The top layers are learned only when filters looking for high-level feature values are present.

The suggested VGG 19-SC strategy's quality is examined in-depth by comparing and assessing the results. The efficiency and accuracy of a proposed approach are compared to those of contemporary methods like the Convolutional Neural Network (CNN) and Multimodal Biometric Feature Extraction (MBFE) to show that it is effective. The estimated Precision, Accuracy, F1-Source, and Recall are established in the result for the provided approach.

Accuracyindicates the proportion of the total number of gathered image samples to the precisely anticipated Multi Biometric input pictures from the provided samples. The supplied output picture sample receives a high-quality rate when the rate of Multi Biometric accuracy is high:

$$Accuracy=\frac{Number of correct predications}{Total EquationNumber of predictions }$$

Figure 5 and Table 1 show the accuracy of the proposed system. The accuracy of consumption forecasting in current systems and the suggested method is indicated. While the advanced system achieves the proposed 98% accuracy, CNN has gained 85%, and MBFE has attained 70%. It demonstrates that the suggested course of action is more successful than the existing one.

Table 1

Numerical Outcomes of Accuracy
Methods	Accuracy (%)
MBFE [16]	70
CNN [17]	85
VGG19-SC [Proposed]	98

The precision percentage of positive to total genuine samples is known as the measurement ratio. The sum of the real models that are positive and the ones that are falsely positive represents the total true samples. FP stands for False Positive; TP stands for True Positive (9).

$$Precision=\frac{TP}{TP+FP}$$

Figure 6 and Table 2 show the precision of the proposed system. The suggested plan and current systems' predictions of precision usage are discussed. While CNN has a 67% precision, MBFE has a 74% precision, and the suggested system has a 88% precision. It demonstrates that the proposed strategy is more successful than the current one.

Table 2

Numerical Outcomes of Precision
Methods	Precision (%)
MBFE [16]	74
CNN [17]	67
VGG19-SC [Proposed]	88

The recallratio of the positive true samples and total of the positive true samples plus the false-negative true samples is known as measurement. TP stands for True Positive; FN stands for False Negative (10).

$$Recall=\frac{TP}{TP+FN}$$

Figure 7 and Table 3 show the recall of the proposed system. Where FN and TP stand for a true positive and a false positive, respectively. Recall consumption forecasts for both the current and suggested systems are shown. The proposed method achieves 97% recall compared to 65% for CNN and 50% for MBFE. It demonstrates how successful the recommended strategy is than the current one.

Table 3

Numerical Outcomes of Recall
Methods	Recall (%)
MBFE [16]	50
CNN [17]	65
VGG19-SC [Proposed]	97

F1-Source may be characterized as a weighted average of recall and accuracy.

$$F1-Source=2\times \frac{Recall\times Precision }{Recall+Precision }$$

Figure 8 and Table 4 show the f1-score of the proposed system.Recall consumption forecasts for both the current and suggested systems are shown. The recommended method achieves 92% f1-source compared to 55% for CNN and 66% for MBFE. It demonstrates how successful the proposed strategy is than the current one.

Tale 4: Numerical Outcomes of F1-Score

Methods	F1-score (%)
MBFE [16]	66
CNN [17]	55
VGG19-SC [Proposed]	92

In this study, a multimodal biometric model for user identification was constructedusing VGG19-SC method. The user was recognized using features of their iris and face using feature level fusion and two separate score fusion approaches. This study looked at that to create a multimodal biometric model with these two characteristics. As previously declared, a face feature is one of two elements of a multimodal identification biometric system. Two VGG19-SCs were employed in the proposed model to identify each characteristic. The accuracy of the score-level fusion technique was often higher than the feature-level fusion method (99.30%). The experimental outcomes demonstrated the VGG19-SC's outstanding performance. It also shown how using two biometric qualities in instead of relying just on two biometric features, identification methods may yield superior outcomes. Multi-biometric imaging systems' end-user acceptance and adoption might be a drawback. The researchers develop VGG19-SC in place of a pre-trained model appropriate for each attribute to conduct future studies.

Data availability

Not applicable.

Code availability

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflict of interest

The authors declare no competing interests.

Acknowledgments

There is no funding

Rane, M., Latne, T. and Bhadade, U., 2020. Biometric recognition using fusion. In ICDSMLA 2019: Proceedings of the 1st International Conference on Data Science, Machine Learning and Applications (pp. 1320-1329). Springer Singapore.
Alghamdi, M., 2023. Machine Learning Methods for Human Identification from Dorsal Hand Images (Doctoral dissertation, Lancaster University).
Jadhav, S.B., Deshmukh, N.K. and Humbe, V.T., 2022. HDL-PI: hybrid DeepLearning technique for person identification using multimodal fingerprint, iris, and face biometric features. Multimedia Tools and Applications, pp.1-26.
Safavipour, M.H., Doostari, M.A. and Sadjedi, H., 2023. Deep Hybrid Multimodal Biometric Recognition System Based on Features-Level Deep Fusion of Five Biometric Traits. Computational Intelligence and Neuroscience, 2023.
Winston, J.J., Hemanth, D.J., Angelopoulou, A. and Kapetanios, E., 2022. Hybrid deep convolutional neural models for iris image recognition. Multimedia Tools and Applications, pp.1-23.
Qiu, S., Zhao, H., Jiang, N., Wang, Z., Liu, L., An, Y., Zhao, H., Miao, X., Liu, R. and Fortino, G., 2022. Multi-sensor information fusion based on machine learning for real applications in human activity recognition: State-of-the-art and research challenges. Information Fusion, 80, pp.241-265.
Gona, A. and Subramoniam, M., 2022. Convolutional neural network with improved feature ranking for the robust multimodal biometric system. Computers and Electrical Engineering, 101, p.108096.
Omara, I., Hagag, A., Chaib, S., Ma, G., Abd El-Samie, F.E. and Song, E., 2020. A hybrid model combining learning distance metric and DAG support vector machine for multimodal biometric recognition. IEEE Access, 9, pp.4784-4796.
Freire-Obregón, D., Rosales-Santana, K., Marín-Reyes, P.A., Penate-Sanchez, A., Lorenzo-Navarro, J. and Castrillón-Santana, M., 2021. Through sample quality assessment, they improve user verification in human-robot interaction from audio or image inputs. Pattern Recognition Letters, 149, pp.179-184.
Al Alkeem, E., Yeun, C.Y., Yun, J., Yoo, P.D., Chae, M., Rahman, A. and Asyhari, A.T., 2021. Robust deep identification using ECG and multimodal biometrics for industrial internet of things. Ad Hoc Networks, 121, p.102581.
Balaji, S. and Rahamathunnisa, U., 2023. Multimodal Biometrics Authentication in Healthcare Using Improved Convolution Deep Learning Model. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 32(03), p.2340013.
Farouk, R.H., Mohsen, H. and El-Latif, Y.M.A., 2022. A Proposed Biometric Technique for Improving Iris Recognition. International Journal of Computational Intelligence Systems, 15(1), p.79.
Ghosh, M., Dey, A. and Kahali, S., 2022. Type-2 fuzzy blended improved DS evidence theory-based decision fusion for face recognition. Applied Soft Computing, 125, p.109179.
SaiTeja, C. and Seventline, J.B., 2023. A hybrid learning framework for multimodal facial prediction and recognition using improvised non-linear SVM classifier. AIP Advances, 13(2).
El-Rahiem, B.A., Amin, M., Sedik, A., Samie, F.E.A.E. and Iliyasu, A.M., 2021. An efficient multi-biometric cancellable biometric scheme based on deep fusion and deep dreams. Journal of Ambient Intelligence and Humanized Computing, pp.1-13.
Vasavi, J. and Abirami, M.S., 2023. Novel Multimodal Biometric Feature Extraction for Precise Human Identification. Intelligent Automation & Soft Computing, 36(2).
Alay, N. and Al-Baity, H.H., 2020. Deep learning approach for multimodal biometric recognition system based on a fusion of iris, face, and finger vein traits. Sensors, 20(19), p.5523.

Download PDF

Editorial decision: Minor revisions
03 Apr, 2024
Reviewers agreed at journal
06 Mar, 2024
Reviewers invited by journal
06 Mar, 2024
Editor invited by journal
29 Feb, 2024
First submitted to journal
28 Feb, 2024

You are reading this latest preprint version

Improved Human Identification by Multi-biometric Image Sensor Integration With a Deep Learning Approach

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related work

3. Proposed work

3.1 Dataset

3.2 Pre-processing

3.3 Feature extraction of VGG19-SC

3.4 Fusion Methods

3.4.1 Feature-stage Fusion

3.4.2 Score Stage of Fusion

3.5 Soft-max classifier (SC)

4. Result and discussion

5. Conclusion

Declarations

References

Status:

Version 1