Software :-
1) Python
It is a high-level, flexible, simple coding programming language. it uses an interpreter and widely used for general-purpose programming. This language can support structural and object-oriented programming, imperative, functional programming, and procedural styles. Python uses whitespace indentation to delimit code blocks which allows programs to be coded in fewer lines of code. It is very flexible, because of its ability to use modular components that were designed in other programming languages like c++, java etc. It has a large number of libraries like NumPy, SciPy, and Matplotlib etc with specialized libraries such as Biopython and Astropy.
2) OpenCV
It was created by Intel to accelerate commercial applications of computer vision with computational efficiency and a strong focus on real-time applications in mind. It's an open-source computer-vision which is free for commercial, public & academic use, and its libraries can greatly simplify computer-vision programming. OpenCV can take advantage of multi-core processing and has so many advanced capabilities like face detection, face tracking, face recognition, Kalman filtering, and a variety of artificial intelligence (AI) methods in plug and play form. OpenCV is a multi-platform framework which supports both Windows, Linux, IOS, Android and Mac OS X and has C++, C, Python and Java interfaces.
3) Haar Cascade Classifier
It uses Haar-like features for object detection. Haar-like features are digital image features used in object recognition. The detection algorithm is based upon an approach for human upright facial detection introduced by Viola and Jones. This algorithm designs a system by giving input as a huge number of positive pictures and negative pictures and train a classifier to detect the object.
It consists of four main steps :-
a) Haar Features: Haar features or digital features like shown in above images are used. Each feature is compared with image and a single value obtained by subtracting the sum of pixels under white rectangle from the sum of pixels under black rectangle.
b) Integral Image: An integral image is defined as two dimensional looked up tables in the form of a matrix with the same size of the original image. This is calculated for all features on all images by using values of neighbouring features and finds the best threshold to find positives and negatives. We select the features with minimum error rate. Haar features are calculated all over the image which may have many features per image.
c) Adaboost: Summing up the entire image pixel and then subtracting them to get a single value is not efficient in real-time applications. This can be reduced by using Ada boost classifier. Ada boost reduces the redundant features. Here instead of summing up all the pixels, the integral image is used. Adaboost classifies relevant features and irrelevant features. After identifying relevant features and irrelevant features the Adaboost assigns a weight to all of them. It constructs a strong classifier by combining many Weak classifiers.
d) Cascading: This strong classifier is used to create a cascading sheet which consists of all the mathematical calculations required to detect the targeted animals from the given training dataset. This cascading sheet is in an XML format which is used by OpenCV to detect objects in real time.
4) K-nearest Neighbours Algorithm
The K-nearest neighbours (KNN) algorithm is a classifier under supervised learning of algorithms that is often used to classify complex data. Here we give a labeled training dataset consisting of a relationship between 2 points xx and yy. it learns a function h: X→Y, h: X→Y that predict yy using xx.
This consist of two learning algorithms namely :
a) Non-parametric:
It makes no assumptions about the function h, avoiding the dangers of modeling the underlying distribution of the data.
b) Instance-based:
The algorithm doesn’t learn any model. Instead, it chooses to memorize the training instances for the prediction phase.
dist((x, y), (a, b)) = √(x - a)² + (y - b)²
This algorithm uses the Euclidean distance formula to calculate the nearest neighbour around our target and boils down to forming a majority vote between the K most similar instances to a given “unseen” observation. The K value should be larger to classify the object more effectively.
Deep learning, a subset of machine learning which in turn is a subset of artificial intelligence (AI) has networks capable of learning things from the data that is unstructured or unlabeled. The approach utilized in this project is Convolutional Neural Networks (CNN). It uses the Haar-cascade classifiers which help us in the detection of objects.
1. CNN:
The convolutional neural network, or CNN for brief, could also be a specialized kind of neural network model designed for working with two-dimensional image data, although they're going to be used with one-dimensional and three-dimensional data. Central the convolutional neural network is the convolutional layer that gives the network its name. This layer performs an operation known as “convolution”. In the context of a convolutional neural network, a convolution may be a linear operation that involves the multiplication of a group of weights with the input, very similar to a standard neural network. as long as the technique was designed for two- dimensional input, the multiplication is performed between an array of input file and a two-dimensional array of weights, called a filter or a kernel. The filter is smaller than the input file and therefore the before the sort of multiplication applied between a filter-sized patch of the input and the filter may be a scalar product. A scalar product is that the element-wise multiplication between the filter-sized patch of the input and filter, which is then summed, always leading to one value. Because it leads to 1 value, the operation is conventionally represented and mentioned because the “scalar product”. Using a filter smaller than the input is intentional because it allows an equivalent filter (set of weights) to be multiplied by the input array multiple times at distinct points on the input. Specifically, the filter is applied systematically to every overlapping part or filter-sized patch of the input file, left to right, top to bottom.
This systematic application of an equivalent filter across a picture may be a powerful idea. If the filter is meant to detect a selected sort of feature within the input, then the appliance of that filter systematically across the whole input image allows the filter a chance to get that feature anywhere within the image. This capability is usually represented and mentioned as translation in variance, e.g. the total altogether concern in whether the feature is present instead of where it should had been present.
2. Training the data set:
The data set is typically the gathering of knowledge . the info set could also be collection of images or alphabets or numbers or documents and files too. the info set we used for the thing detection is that the collection of images of all the objects that are to be identified. Several different images of every and each object is typically present within the data set. If there are more number of images like each object within the datasets then the accuracy are often improved. The important thing that's to be remembered is that the info within the data set must be labeled, there'll be actually 3 data set. they're the training data set, the validation dataset and therefore the other one is testing data set. The training data set will usually contains around 85-90% of the entire labeled data. This training dataset are going to be training our machine and therefore the model is obtained by training the info set. The validation data set consists of around 5-10% of the entire labeled data. this is often used for the validation purpose. the opposite data set is that the testing dataset and it's wont to test the performance of our machine.
3. Developing a real time object detector:
For developing a true time object detector using deep learning and opencv we'd like to access our webcam during a really effective way then the thing detection is to be applied to each and every frame. we should always install opencv in our systems.
The deep neural network module should be installed. Firstly, we should always always import all the specified packages:
1. From imutils.video we'll import VideoStream
2. From imutils.video we'll import FPS
3. we'll import numpy as np
4. we'll import argparse
5. we'll import imutils
6. we'll import time
7. we'll import cv2
The next step is to construct the argument parse then we should always parse the arguments.
--prototxt: provide path to the Caffe prototxt file.
--model: provide path to the pre-trained model.
--confidence: minimum probability threshold to filter weak detection.
The next step is to initialize CLASS labels and corresponding random COLORS. Each object when it's detected, it's surrounded by a box with some predefined colour. Thus, we assign each object a specific color. After that we'll load our model and that we will provide the regard to our prototxt and also to our model files. With the assistance of imutils we'll read the video and that we will set the amount of frames per second. Now with this some predefined number of frames are going to be loaded per second. Each frame is analogous to the image. Now these images are going to be given because the inputs to the model. The model will process the input image and produces the output image which consists of labels. in additional practical sense the input raw image is given to the model. Now the model process the input image. within the output image all the thing s are identified and every object is surrounded by an oblong box and therefore the name of the object is additionally displayed. we'll be only observing the output video stream but not the input video stream.
Object detection is a Computer technology that is used to identify different objects in digital images like humans, animals etc. There are many applications depending upon this Technology like robotics, security, face detection, medical to name a few. Object detection algorithms mainly used to extract features to recognize instances of an object. Detecting object like an animal in surveillance videos is a challenging task due to their different appearances and variety of poses they can adopt.
To extract these features, we first train the program to detect required objects and it gives us an XML file of the objects required features. For this we give a huge set of positive images with the object is present and negative images with the object not present to the algorithm. The algorithm uses various detection techniques like haar features, neural networks etc, to find a set of usable features. To get these features the algorithm scans the images thousands of features and come out with a set of useful features in form of an XML file. Then we use these features to scan a digital image and identify if the object is present or not. We do it by scanning the image to identify if the output features are present or not. If they are present we take the images positive i.e object is present, or else we take it as a negative i.e object is absent. This is how an object is detected in an image.