This paper proposes a VGG19-SC-based face and iris biometric system. Figure 1 depicts the broad design of the advised strategy. The first two images captured are the iris and face of a user. The individuals' offline signatures, iris scans, and fingerprints are taken as a first step. The client's personality is then determined using the multimodal framework, which comprises two already-implemented models for the iris and face. A deep convolutional neural network architecture known as VGG19 and it has contributed to computer vision problems. The VGG19 image categorization algorithm, unveiled in 2014 by the Visual Geometry Group at the University of Oxford, is renowned for its efficiency and simplicity. It is deeper and more complicated than its predecessor, VGG16, and has 19 layers in total, including 16 convolutional layers and 3 fully linked layers. High accuracy in tasks like object detection and localization is achieved via VGG19's architecture, which includes layered convolutional layers that let it learn complicated visual properties from input images.
The current evaluation reiterates earlier efforts' implicit unimodal iris and faces identifiable proof methods while implementing the suggested multimodal biometric architecture using the two attributes iris, face, and signatures. There has also been the creation of another offline signature single biometric. The three biometrics solutions are combined to provide the offered multimodal solution. After considering past work on uni-model solutions, the correctness of these models is assessed before incorporating them into a multi-model solution.
3.1 Dataset
Using a digital camera, participants from the selected experimental organization will have their face and iris biometric data collected. The dimension of the photos will be 640 by 480 pixels, and they will include the faces and irises of 190 people from three separate samples. The two biometric features will be reduced to 128 by 128 pixels without changing the images. Each photograph should have a backdrop of a bright hue with uniform lighting conditions. There will be 570 photos in each modality in the dataset. 40% will be utilized for authentication, while 60% will be used to train the system. The random sample cross-validation approach is the foundation for the dataset partition decision.
3.2 Pre-processing
The pixel values of an image must be distributed more equally over the range of acceptable values to increase the contrast of a low-contrast image. Histogram equalization is an alternative method that is totally automated, incredibly easy to use, and parameter-free. The probability density function (recorded by the normalized histogram) is first transformed via histogram equalization into a cumulative distribution function (CDF) by determining the histogram's running sum:
$$\stackrel{-}{c}\left[\rho \right] =\sum _{k=0}^{p}\stackrel{-}{h}\left[k\right], \rho = 0, 1, ..., 255$$
1
Where the histogram's gray levels \(p\) are concerned. \(\stackrel{-}{c}\) Represents the histogram's running total. \(\stackrel{-}{h}\) represents a floating-point number.
When computing the running sum, it is best to initialize the array's first member using formula \(\stackrel{-}{c}\) [0] = \(\stackrel{-}{h}\) [0] and then update \(\stackrel{-}{c}\left[\rho \right]=\stackrel{-}{c}[\rho - 1] + \stackrel{-}{h}\left[p\right]\) for each gray level. A pixel with the gray level is changed to \(\stackrel{´}{p}\)= Round (255.\(\stackrel{-}{ c}\left[\rho \right]\)) once the CDF has been calculated. The CDF always evaluates to 1 at the greatest value among integral of a probability density function (PDF) is always 1, which is \(\stackrel{-}{c}\left[255\right] = 1\). Consequently, the output falls within the specified range of 0 to 255.
3.3 Feature extraction of VGG19-SC
Convolutional neural networks are deep neural networks used in learning that are most typically used to analyze visual images. Currently, when we think about neural networks, they frequently consider matrix computation, although ConvNet may not operate in this way. It makes use of convolution, a remarkable technique. Currently, convolution in mathematics refers to a numerical computation of two functions that creates a new process that conveys that another changes the state of one.
In terms of technology, each input image used in a deep learning technique was put through a series of convolutional layers for vector transformation using fully connected layers, feature extraction using kernels, size reduction via pooling, and object classification using the Softmax classifier. Back and forward propagation are the two different forms of spread used when creating the VGG19 net. Ward propagation involves the internet gathering data, setting channels, and establishing various boundary parameters erroneously. The information is subsequently shared with the network, which calculates the misfortune value using the irregular borders. The foundation gives the network permission to utilize an enhancement technique to reduce the worth of result tragedy. Backpropagation is used throughout this process to enable network loads and boundaries to be changed and to enable the misfortune value to be reduced as necessary. This prepares the forward propagation's bounds for the subsequent iteration. The biggest difficulty with the CNN model is setting the hyperparameters to produce the desired results. Since hyperparameters contain all of the training variables for the technique's framework or set of training rules, tuning of hyperparameters necessitates determining the algorithm's hyperparameters' ideal value. The hyperparameters include L1 and L2 regularization, dropout value, learn rate, epoch count, batch equality, and batch quantity.
The VGG-19 is a pre-trained network chosen for its ability to recognize the human iris and face and the fact that it makes for an easy network design. It is currently strongly advised for use in deep network techniques. The VGG-19 has input dimensions of 224x224x3. Figure 2 shows the VGG-19, which has 3 completely related layers, 5 pool layers, and 16 convolutional layers. The measurement of the resulting characteristic map is 224x224x64, and the initial convolution layer has 64 channels that are each 3X3 in length. Rectified Linear Unit, a non-linear activation function that passes the convolution layer's output results in a non-linear result, is used by VGG-19. ReLU has following characteristics when its negative value is replaced with 0:
The convolutional layer's output X,
3.4 Fusion Methods
3.4.1 Feature-stage Fusion
Fusion at the feature level involves combining characteristics representative of many attributes. Three qualities' retrieved features were combined to produce additional features that reflect the user. In the training phase of this fusion strategy, the model picked up on the merged characteristics. The combined output of the iris CNNs and second layer of a face is completely linked. Consequently, features are vectors in the two CNN model's second fully connected layer combined to form a single vector, which is depicted by the following definition in Fig. 3 feature-level fusion:
Where \(ar\) is the extracted facial image features, and \(af\) is the extracted iris image features.
After classifying the image based on a similarity score, the softmax classifier identifies an individual by entering the resultant vector\(\left(a\right)\).
3.4.2 Score Stage of Fusion
Results in the second completely associated layer for the iris and face are sent to each CNN network's softmax function, retrieving the matching values for the score stage merging methods. The process comprises two sets of Fusion at Score stage method. The first step was to normalize the count values generated by each CNN network, and after that, the VGG-19 networks' matching values were combined using a matching value combination approach. Finally, the model produces the character of the person with the highest total score. Combining the product-based method and number arithmetical mean rule are two distinct score fusion procedures. The product is divided by the total number of characteristics, resulting in a final score calculated using an arithmetical mean rule, includes the matching values for each factor, shown in Fig. 4 fusion at the level of score.
The equation is used to apply the arithmetical mean rule to computations.
Equation (4), for \(Gd\), is the trait is score vector, where j is the total number of qualities.
By multiplying the scores of two features, and fused score under the product rule is determined. It turned out as follows:
$${\prod }_{d=1}^{i}Gd$$
5
In Eq. (5), \(Gd\)is the matching trait value vector, and J is the number of traits.
3.5 Soft-max classifier (SC)
The multiple-class classification function, often known as the soft argmax, is a variation of the softmax. It begins with an input vector of real p numbers, where p equals the quantity of a matrix-vector of values. It normalizes the contribution to each class by employing a likelihood-based distribution whose absolute sums to 1. Many classes require neuronal organizing techniques since the result values are frequently between zero and one. The aim class is the one with the highest likelihood when using the softmax function to determine the probabilities of each possible class. The softmax equation divides the total of all escalations to normalize the values after applying the vector form's escalation function applied to each component:
$$\sigma {\left(\overrightarrow{y}\right)}_{j}=\frac{{a}^{{h}_{j}}}{{\sum }_{i=1}^{R}{a}^{{h}_{j}}}$$
6
Where j is the \(j,\)th class, \(yj\) is a parameter, and \(y\) is the input vector form.
The equation below can express the softmax function's output value vector.
Softmax result = \([ S1, S2, \dots \dots \dots ,Sn]\) (7)
Si measures the chance something is stored in the j class data store.
The first four squares in the VGG-19 loading were frozen to create our CNNs models since the basic layers' channels scan the images for low-level objects like points and lines. The top layers are learned only when filters looking for high-level feature values are present.