CNN-LSTM-Dense Hindi Sign Language Conversion System

doi:10.21203/rs.3.rs-4148939/v1

Download PDF

Research Article

CNN-LSTM-Dense Hindi Sign Language Conversion System

https://doi.org/10.21203/rs.3.rs-4148939/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Sign language is a vital form of communication since it provides an alternative to spoken language expression for those who are deaf or hard of hearing. People who use sign language still have restricted access to information and communication, especially in languages other than English. The primary purpose of this paper is focused on building the communication bridge for them. Convolutional Neural Network (CNN's) is a popular machine learning approach and has been taken advantage of in the current research to ease the translation of Hindi text and speech to Indian Sign Language (ISL). Along with CNNs the paper also contains Long Short-Term Memory (LSTM) and dense layers to increase the accuracy of the research. There are several potential uses for the proposed CNN-LSTM-Dense Sign Language Conversion System, including removing communication barriers in healthcare settings, educational settings, and everyday encounters. Its main objective is to improve the deaf and hard of hearing community's overall quality of life and social integration in India while fostering inclusion and equal opportunity. For the deaf and hard of hearing community in India, the Sign Language Conversion System for Hindi Language provided in this research article marks a big step forward in assuring information and communication accessibility. The system's ability to enable real-time translation of Hindi text and speech into ISL gestures, utilizing the CNN-LSTM-Dense architecture, holds great promise for promoting inclusivity and enhancing the quality of life for people who use sign language as their primary form of communication.

Indian Sign Language

CNN

LSTM

Hand Gestures

Hindi text

Language is inevitable for human communication, connecting different cultures. However, those with hearing difficulties may not be able to use speech. Instead, they might use ISL in India. ISL has its own grammar and words, serving about 18 million Deaf people in the country. It's a major way for them to communicate, including in education, work, and social interactions. In ISL, communication involves using hand signs, spelling with fingers, facial expressions, body gestures, and other markers. All of these components join to create a full language system that can express many thoughts. ISL heavily relies on visual cues, utilizing space and movement to show relationships between things and grammar rules.

Similar to spoken languages, ISL also has differences in how it's used across regions, showing the language diversity in India. Various states and areas have their distinct signs, which come from local languages, traditions, and culture. ISL originates from native sign languages that were present in different Indian regions for many years. These local sign languages were utilized by Deaf groups for communication. Gradually, attempts were made to bring together and make consistent these diverse sign languages, resulting in the ISL we use today.

The ISL Research and Training Center (ISLRTC) was established in 2015 to promote the use and understanding of ISL. This organization conducts research, makes educational materials, and provides training to Deaf individuals, sign language interpreters, and the wider public. The center also strives to make fair policies and stand up for the rights of the Deaf population in India. ISL is essential for Deaf students' education. It is utilized in inclusive learning environments and schools for the Deaf to enable students to access material, participate in class debates, and fully engage in the educational process. In 2020, as part of the New Education Policy, the Indian government formally recognized ISL as a separate language in recognition of its significance. By protecting their linguistic and educational rights, this recognition strives to encourage the inclusion and empowerment of the Deaf population.

Given the importance of ISL and its potential to completely change communication, a model for recognizing and interpreting Hindi Sign Language (HSL) has become essential. Although ISL serves as a form of interaction for the deaf community, there is an increasing need for a model that can recognize and understand HSL. The development of a technology that can convert HSL to written or spoken Hindi could completely transform how deaf people interact with the majority of the population and it is the main motivation for current research. The communication barrier that commonly prevents meaningful interactions between the hearing population and the hearing-impaired has a lot of promise to be overcome by this recognition approach. An HSL recognition model can serve as an educational aid by automatically converting lectures from the classroom into textbooks and online materials that the deaf can use. This ground-breaking move not only promises equal access to education but also gives deaf kids the freedom to flourish in schools and follow their aspirations with minimum possible restrictions.

Astonishingly, compared to the translation of signs into text in other languages, far less research has been done on the promotion of HSL. Our study's goal is to demonstrate HSL's effectiveness. By creating an HSL recognition model, we hope to demonstrate how promoting workplace cooperation, workplace communication, and interview practices may level the playing field. We can develop a culture that celebrates diversity and makes the most of each person's talents by smoothly incorporating deaf people into the workforce.

In this paper, we examine the difficulties involved in utilizing CNN to translate Hindi Sign Gestures into written words. Our research attempts to clarify the significance and efficacy of this technical strategy. In order to do this, we thoroughly examined the CNN-based technique for converting hand motions into written Hindi text. Our results highlight how this strategy can help the community of hearing-impaired people communicate more effectively by giving them access to printed materials in their original tongue. This study adds to the expanding body of knowledge on assistive technologies for people with disabilities and provides insightful information for practitioners and academics in the field. We will examine the literature that has already been written, explain our research strategy, provide thorough results, and consider the wider ramifications of our findings in the parts that follow.

Many researchers have been exploring the area. Pranali Loke et al. [1] presented a tool that converts hand movements used in ISL into language. Procedures like segmentation and classification are applied to the data. The photos were taken with an Android app, and pattern recognition was done with MATLAB. Fazlur Rahman Khan et al. [2] demonstrated a working prototype that can convert 26 English alphabets from American sign language (ASL) into text. Hand motion tracking was completed utilizing a Leap Motion controller (LMC) as the interface without the use of any additional hardware. Cross Correlation, Artificial Neural Networks (ANNs), and Geometric Template Matching were used to test the effectiveness of the prototype. According to the testing findings, in contrast to different detection algorithms, Geometric Template Matching has the highest recognition accuracy. M.N.Pushpalatha et al. [3] presented a paper that translates sign language into text and voice with 92% accuracy using Feature Extractor and Posenet, facilitating communication for people with hearing and speech impairments. A webcam can take pictures. Posenet and ANN are used to group common phrases used in daily life. The webcam tracks several body components, which are then converted into audio and text to show what the user is saying in real-time. Nishi_Intwala et al. [4] presented the translation of 26 English alphabetical sign motions into their text equivalents and classification into related alphabets is the goal of an ISL translator that was developed. Preprocessing and feature selection were performed on the dataset. The photographs were then uploaded to CNN using Python software, producing real-time images with an accuracy of 87.69%. A.J. Paul et al. [5] suggested a microcontroller architecture based on the ARM Cortex-M7 for detecting alphabets in American Sign dialect. They significantly reduced accuracy loss by using interpolation as an augmentation in addition to other methods, which allowed the framework to adjust well over heretofore concealed chaotic data. The inference speed for the model is 20 frames per second, and the post-quantization size is approximately 85 KB. An appropriate smart band featuring pressure monitors composed of nanocomposite materials was proposed by R. Ramalingame et al. [6]. An improved synthesis procedure is used to prepare the sensors. Active muscular contraction and relaxation monitoring is possible with the sensor band on the arm. The inventive smart band was put to the test on 10 individuals who practiced ten times each of the numeral gestures from 0 to 9 in ASL. Each individual received an aggregate of 100 information sets, which were all captured at 100 Hz. By putting the data sets into a rigorous approach to machine learning that chooses the characteristics, biases and weights, the categorization of the motions may be done with an overall accuracy of 93%.

Shagun Gupta et al. [7] presented a technology that enables hand gestures to be used for communication. The current framework captures a patient's hand motion using a webcam and decodes the signal it transmits. The message is then delivered with audio and text and is mapped to the signal. Instead of using an alphabet, the gesture will be used to converse. The device includes an alarm that goes off in an emergency and enables the patient to choose their favorite communication language for comfort. L. Fernandes et. al. [8] demonstrated a system for translating voice and text from sign language to spoken language and the other way around. When speech is provided, it is converted into a set of motions used in ASL. The dataset is adaptable, and additional languages can be added. Text acts as a bridge between voice and gesture in this instance.

The creation of a simple, inexpensive Bangla sign language translation (BSLT) system that can convert signs into written Bangla was suggested by S.M.K. Hasan et al. [9]. They talk about how universal interpreter software (UIS) was developed and made available to users in Bangladesh and the US. For this, a successful method for feature extraction and skin detection is proposed. 11 Bengali numbers and 16 sentences can be decoded using the technology. The system produces a result of approximately 96.463% when comparing accuracy to the K-Nearest Neighbor technique. The method suggested by S Masood et al. [10] employs sign language recognition to address the communication gap. CNN was used to train the inception model for spatial features and recurrent neural networks (RNN) for time-related characteristics. Argentinian Sign Language gestures from 46 various classifications make up our dataset. Over a wide number of images, the suggested solution was able to attain a high accuracy of 95.2%. R.J. Raghavan et al. [11] demonstrated a program that turns text input into an animated gesture sequence. The linguistic synthesis system in this application transforms English text into ISL format, an interface where users can write words, and a virtual avatar that serves as an interpreter at the user interface make up its three primary components. Both manual and automatic motions, such as facial expressions and hand placement, are described using the LOTS Notation. They also included the epenthesis movement, which is the inter sign transition gesture, to reduce jitters when gesturing.

CNN has also been extensively used for converting sign images to text. A CNN was fully integrated into an HMM by O Koller et al. [12], with the CNN outputs being evaluated within a Bayesian framework. On three difficult benchmark continuous sign language recognition tasks, our embedding is able to outperform the latest developments by 15–38% comparative decrease in the word error rate and up to 20% absolute. They look at the effects of network pretraining and CNN construction. They weigh the advantages of combining models and compare hybrid modeling to a tandem strategy. A Ojha et al. [13] fingerspelling sign language converter was put into use. Using CNN, this software recognizes ASL motions and instantly transforms them to text and voice. The computer's webcam and a desktop programme track the gestures. A vision-based technique that translates sign jargon into text was created by K. Bantupalli et al. [14] to improve communication between signers and non-signers. The suggested method extracts temporal and geographical information using video sequences. RNN is used to train on temporal features after initially using a CNN to detect spatial features. The ASL Dataset was used. The LMC serves as the system's brains in the Arabic Sign Language Recognition (ArSLR) approach developed by M. Mohandes et al. [15]. In order to offer information on posture and action, this device detects and keeps an eye on the fingertips and palm. They compare the Naive Bayes classifier's capabilities to those of Multilayer Perceptron (MLP) neural networks. The proposed method yields classification accuracy of 98% for the Arabic sign alphabets with the Naive Bayes classifier and greater than 99% with the MLP.

Adversarial, Multitask, Transfer Learning was used by A. Orbay et al. [16] to find semi-supervised tokenization techniques that don't call for extra labeling. To conduct a more thorough examination, it has numerous experiments that contrast all the methods in various settings. The suggested methodology produces 36.28 ROUGE and 13.25 BLUE-4 scores using only phrases as the goal annotation, which is a 4 point improvement in BLUE-4 and a 5 point improvement in ROUGE above the state-of-the-art. A system that tries to give the mute speech was proposed by K.K. Dutta et al. [17]. With the help of MATLAB, a double-handed Indian language is collected as pictures, processed, and then transformed to the corresponding speech and text.

For statistical sign language translation and recognition, J. Forster et al. [18] introduced a large vocabulary, video-based corpus of German Sign Language, known as The RWTH-PHOENIX-Weather corpus. The collection contains weather forecasts that were manually glossed to distinguish between sign variants and acquired from German public television. The sentence and gloss levels have also been given time limits. Additionally, a cutting-edge automatic voice recognition approach has been used to semi-automatically translate the spoken German weather forecast. The glosses have been translated into spoken German a second time to account for allowable translation variability. Along with the corpus, experimental baseline results are provided for head and hand tracking, statistical translation and recognition of sign language.

Two hidden real-time methods based on the Markov model were demonstrated by T. Starner et al. [19] for comprehending Continuous ASL sentences while tracking bare hands of the user. The original device monitors the user with a camera mounted on a desk with a word accuracy of 92 percent. The second device employs a different technique and achieves an accuracy of 98 percent (with unconstrained grammar it is 97 percent). It implants the camera in the user's headgear. The lexicon used in both tests is 40 words.

For individuals with hearing impairment, R. Kaur et al. [20] created an SMS generator in ISL. An advanced visual user interface, a system for Short Message Service (SMS) into speech translation, and a sign language to English translator have all been developed for this system. The graphic interface enables the characterization of a number of signals through the usage of different services. The addition of an animation module has made it possible to play the sign motions through a GIF file.

Table 1

Details of Related Work
Citation	Year of Publication	Language used	Technique Used	Dataset	Alphabet converted	Accuracy
[1]	2017	ISL	Hue Saturation Value (HSV) model	Self -Dataset	25 alphabets	Not mentioned
[2]	2016	ASL	ANN	Self -Dataset	26 alphabets	52.56%
[3]	2022	ASL	PoseNet, ANN, MobileNet	Self -Dataset	25 alphabets	92.8%
[4]	2019	ISL	CNN (MobleNet)	Self -Dataset	26 alphabets	87.69%
[5]	2021	ASL	CNN	1. Kaggle Sign MNIST dataset 2. Kaggle ASL dataset 3. Self dataset 4. Kaggle ASL Alphabet Test dataset	26 alphabets	Not mentioned
[6]	2021	ASL	SVM, ELM, LDA	Self-Dataset	26 alphabets	93%
[7]	2020	ASL	Open CV	Self-Dataset	26 alphabets	100%
[8]	2020	ASL	Open CV, Neural Networks	1. MNIST Dataset 2. Self Dataset	26 alphabets	Not mentioned
[9]	2016	Bangla Sign Language	PCA, LSVM, KNN	Self-Dataset	16 Bengali words & 11 Bengali numbers.	96.46%
[10]	2018	Argentinian Sign Language	CNN, RNN, Open CV	Argentinian Sign Language Dataset	2300 videos for 46 gesture categories	95.21%
[11]	2014	ISL	Support Vector Machine (SVM)	Self-Dataset	English sentences	Not mentioned
[12]	2018	German Sign Language	CNN, HMM (Hidden Markov Model)	Three state-of-the art continuous sign language data sets RWTH-PHOENIX-Weather 2012, RWTH-PHOENIX-Weather 2014 and SIGNUM	266, 1080 and 465 for PHOENIX 2012, PHOENIX 2014 and SIGNUM	Not mentioned
[13]	2020	ASL	CNN	Self-Dataset	26 alphabets	95%
[14]	2018	ASL	CNN, RNN	Dataset created by Neidel et al. (The dataset was on the images)	2400 images	99%
[15]	2014	Arabic Sign Language	Naive Bayes Classifier, MLP Neural Network	Self-Dataset	2800 frames of data (28 alphabets)	98.3%
[16]	2020	Neural Sign Language	CNN, RNN	1. Dataset consists of images with weak annotations collected from 3 sources prepared by Koller et al. 2. Self-Dataset	30 signs and seven signers	Not mentioned
[17]	2015	ISL	Minimum EigenValue Algorithm	Self-Dataset	125 images	Not mentioned
[18]	2012	ASL	RASR	1.The SIGNUM Database 2.RWTH-PHOENIX-Weather dataset 3.The ASL Lexicon Video	369 images	Not mentioned
[19]	1998	ASL	HMM (Hidden Markov Model)	Self-Dataset	500 sentences	Not mentioned
[20]	2017	ISL	SMS, LFG (Lexical Functional Grammar) method	Self-Dataset	250 sentences which include basic hand-shapes	Not mentioned

CNNs are deep learning algorithms used to execute computer vision tasks accurately, such as picture recognition and classification [21]. Traditional neural networks and CNNs are comparable, but CNNs have more layers. It has nonlinearly activated outputs, biases, and weights. CNN's neurons are organized volumetrically, with each neuron having a height, breadth, and depth.

CNN is a key player in the field of deep learning, especially when it comes to solving computer vision problems with astounding precision. Traditional neural networks and CNNs may be compared, and it becomes clear that while they share some essential ideas, CNNs are different in terms of their architecture. CNNs stand out for their increased depth, which includes numerous layers devoted to extracting and processing visual data in a hierarchical manner. These layers, which are made up of nonlinearly activated outputs, bias terms, and weight factors, help the network learn complex patterns and features from the incoming data. The distinctive three-dimensional or volumetric neuron arrangement of CNNs is what genuinely distinguishes them from other brain structures. CNNs are ideally suited for applications like picture recognition and classification because of their organization into height, breadth, and depth dimensions. CNNs are now indispensable for a variety of computer vision applications thanks to this architectural advancement that enables them to capture spatial hierarchies and complex patterns in images.

Figure 1 describes the architecture of CNN. It has fully linked layers, pooling layers, and convolutional layers. Convolutional and pooling layers are frequently arranged in a staggered fashion, with each filter's output size (height and breadth) dropping and its depth increasing from left to right. The last stage is the fully connected layer, which mimics the top layer in a conventional neural network.

3.1 CNN in Object Detection and Classification

The process of identifying and finding various things of interest inside an image or video frame is known as object detection. CNNs' distinct architecture, consisting of convolutional layers and pooling layers, enables them to learn hierarchical features from raw pixel data automatically. This capability enables CNNs to detect objects of diverse shapes, sizes, and orientations. In addition, with the development of region proposal approaches and anchor-based techniques, CNNs have attained cutting-edge performance in object detection tasks.

Image classification entails labeling or categorizing input images. It is an attempt at supervised learning where a model is trained on labeled image data to anticipate the class of photos that have not yet been viewed. CNNs are widely used for image classification because they can learn hierarchical elements such as edges, textures, and forms, allowing for accurate object recognition in photos. CNNs excel at this task because they can extract important spatial characteristics from photos automatically.

3.2 CNN in Sign Language Recognition

Sign Language Recognition (SLR) is an implementation of computer vision that interprets hand gestures and motions used in sign languages. CNNs have shown to be extremely valuable in SLR considering their capacity to effectively collect spatial information. CNNs can learn the distinctive hand shapes and movements associated with different signals by training on massive sign language datasets. This cleared the door for real-time sign language interpreting systems, which help the deaf and hard of hearing communities by allowing for seamless communication.

In this study, we have developed a model for converting 43 HSL alphabets into text using CNN. This model would effectively detect gestures of the HSL and give the name of the alphabet it represents as output. The model has been implemented using Google Colaboratory as IDE in Python language.

4.1 Dataset Construction

After reviewing several previous implementations and research literature, we found that there was no significant data available in the domain of HSL. Hence, all of the photos that were used to train and test the model were personally gathered. Reference images were taken from [22]. Images were captured by using a Samsung Smartphone camera which has a resolution of 48MP [23]. Photos for 43 Hindi alphabets were taken from different angles to ensure the credibility of the created dataset. The captured images are depicted in Fig. 3.

4.2 Pre-processing and Augmentation of Data

An instance of the ImageDataGenerator [25] was utilized with the following augmentation settings to expand the dataset:

Rotate the images by a random angle within ± 30 degrees.

Shift the images horizontally and vertically by 20% of the height and width.

Apply shear transformations with shear angle up to 20%.

Zoom the images by up to 20%.

Flip the images horizontally or vertically.

Adjust brightness of the image within the range of 0.8 to 1.2.

All the images were resized to the height and width of 64 pixels such that the input shape for all images is the same. Next, we divided the images into test and train datasets with the ratio 80:20. The number of samples in the test and train dataset were 1186 and 317 respectively.

4.3 Training the Model Using Train Data

We created a CNN model having 3 CNN layers and 356,907 trainable parameters. Figure 4 depicts the model's whole architecture. This model utilized the Adam optimizer available in the Keras neural network library. A stochastic gradient descent technique called Adam optimization is based on adaptive estimates of first- and second-order moments [24]. The photos in the training dataset were used to train the model.

LSTM is a type of deep learning, sequence neural network that enables the retention of information. LSTMs are a subclass of RNNs that include feedback connections, enabling it to process whole sequences of data rather than single data points. At the beginning, our model had only 3 CNN layers, but this was providing low accuracy. The addition of LSTM layers (64 and 32 units) improved the accuracy of the model considerably.

4.4 Evaluating the Model with Test Data and Calculating Model Accuracy

After training the model for optimal performance, it was tested using a test dataset. The model was used to predict labels for individual images, one at a time. The predicted label was then mapped with the ground truth label. Same value of the predicted and ground truth label indicated that the image had been identified successfully. Accuracy for each class of alphabets along with the overall accuracy of the test dataset was computed. Figure 5 shows an example of the output predicted by a hand gesture.

The ultimate CNN model, comprising Convolution, Pooling, LSTM, and dense layers, was found effective in predicting Hindi translations of ISL, given that the dataset is of high quality with accuracy of 86.43%. Figure 6 depicts the mapping of the accuracy in prediction to the original Akshar (letters) of the Hindi Varnmala. Some of the letters like ‘क्ष’, ‘ळ’, ‘ज्ञ’, ‘आ’, ‘अ’, ‘उ’ etc. had a 100% accuracy in their prediction. The lowest accuracy of 40% was obtained for ‘इ’. The accuracy can definitely be improved with the improvement in the dataset.

Figure 7 displays the comparison between the overall accuracy obtained by our working models with and without the addition of LSTM layers. As evident, the initial model solely comprised a CNN architecture and achieved an overall accuracy of 78.23%. But this was improved when two LSTM layers were added, increasing the accuracy to 86.43%. LSTM networks have the capability to retain the data which certainly helped in the enhancement of the final model.

The external threats to the validity of our model are explained as follows. HSL is translated into written form by employing the Devanagari writing system, which is also used for numerous other Indian languages, to reflect the signs and movements used in HSL. A visual and gestural language, sign language relies on symbols and characters. The whole spectrum of visual, spatial, and gestural elements that are essential to sign language communication may not be captured in sign language, which results in nuance being lost when translated. There could be grammatical mistakes in the written, spoken, or written form. Direct conversion may not always produce accurate data and may result in misunderstandings or loss of significance. During the conversion process, body motions and tiny modifications could result in the loss of crucial information. In a vital stage, space and time are involved and have the potential to impact the authenticity and quality of the data.

Internal challenges to validity include the fact that we only used a portion of studies on the translation of sign language into Hindi. There may be more research, but we have only used a portion of it due to the resource and technical constraints. Publication bias is the tendency for studies with significant or positive results to be published more frequently than those reporting null or negative outcomes. This can skew the overall findings of the review. To decrease this threat, an effort should be made to incorporate unpublished or gray literature into the review process. The study that we have selected may have flaws because of its limited sample size and variable measuring methods, which could lead to conclusions that are accurate for some outputs but inaccurate for other studies. The capacity to generalize the results to a larger population may be constrained by using its own dataset. This method, sometimes referred to as backward and forward citation tracking, aids in ensuring a more thorough examination of the literature. To improve transparency and reproducibility, thoroughly document the search strategy, inclusion/exclusion standards, and selection procedure were used. This makes it possible for readers to comprehend any potential restrictions and biases in the literature search. The primary research sources were IEEE Xplore and Google Scholar. Find journals that publish articles on linguistics, sign language, or language processing. consulted encyclopedias and book chapters written by experts in the field. Look for literature on sign language linguistics, processing, or translation studies. Speak with experts in sign language, language technology, or linguistics. These professionals can offer advice, suggestions, or other sources to take into account for your research. Did extensive research on the research papers consulted, and we also examined the papers that served as the references for that specific work.

The current study presents a review that has looked into the translation of sign language into Hindi text, which is increasingly important for deaf persons learning Hindi. It has looked into the creation and application of a Hindi-to-sign language translator. The research aimed to promote effective communication and inclusivity by closing the communication divide between the hearing-impaired community and the broader Hindi-speaking population. The advancement of accessibility, inclusivity, and communication for people with hearing loss is greatly enhanced by the translation of sign language into Hindi text. To the best of our knowledge there is now no such ready instrument available on the market, the current study was driven by a lack of available suitable software. To achieve real-time recognition and interpretation of spoken Hindi with the generation of corresponding ISL gestures, the converter leverages advancements in computer vision, machine learning, and natural language processing techniques. To improve opportunities and quality of life for individuals who primarily rely on sign language for communication. 43 signs were converted to text using a proposed and tested CNN model comprising convolution, pooling, LSTM and dense layers with an accuracy of 86.43%. These are highly motivating and positive results. It's important to note that the sign language converter discussed in this research is still in its early stages and requires further testing and refinement. Through ongoing upgrades and enhancements in both the software and hardware components, the converter's accuracy and efficiency can be increased. We anticipate that our study will act as a spark for further study and advancement in the field and support the development of original solutions to urgent issues.

The future scope for HSL detection is highly promising and holds potential for significant advancements. A few areas where further development is possible are improving the dataset, enhancing accuracy and user-friendly applications.

Competing Interests:

The authors have no competing interests to declare that are relevant to the content of this article.

Funding:

No funding was received for conducting this study.

Financial interests:

The authors declare they have no financial interests.

Human Ethics and Consent to Participate:

not applicable.

Data Availability:

Not applicable.

Consent to Participate:

Informed consent to participate was obtained from 4 of the authors to generate the images used as data.

Author Contribution

S.J conceptualized the idea for the manuscript. V.J., S.G and M.A. wrote the main manuscript text and M.R. prepared all the figures and tables. All authors reviewed the manuscript.

Loke, P., Paranjpe, J., Bhabal, S., & Kanere, K. (2017, April). Indian sign language converter system using an android app. In 2017 international conference of electronics, communication and aerospace technology (ICECA) (Vol. 2, pp. 436-439). IEEE. https://doi.org/10.1109/ICECA.2017.8212852
Khan, F. R., Ong, H. F., & Bahar, N. (2016). A sign language to text converter using leap motion. International Journal on Advanced Science, Engineering and Information Technology, 6(6), 1089-1095. http://dx.doi.org/10.18517/ijaseit.6.6.1252
Pushpalatha, M. N., Parkavi, A., Sachin, R. S., Vadakkan, A. S., Khateeb, A., & Deeksha, K. P. (2022). Sign Language Converter Using Feature Extractor and PoseNet. Webology, 19(1), 5476-5486. https://doi.org/10.14704/WEB/V19I1/WEB19368
Intwala, N., Banerjee, A., & Gala, N. (2019, March). Indian sign language converter using convolutional neural networks. In 2019 IEEE 5th international conference for convergence in technology (I2CT) (pp. 1-5). IEEE. https://doi.org/10.1109/I2CT45611.2019.9033667
A. J. Paul, P. Mohan and S. Sehgal, "Rethinking Generalization in American Sign Language Prediction for Edge Devices with Extremely Low Memory Footprint," 2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS), Thiruvananthapuram, India, 2020, pp. 147-152, doi: 10.1109/RAICS51191.2020.9332480. https://doi.org/10.3390/mi13060851
R. Ramalingame et al., "Wearable Smart Band for American Sign Language Recognition With Polymer Carbon Nanocomposite-Based Pressure Sensors," in IEEE Sensors Letters, vol. 5, no. 6, pp. 1-4, June 2021, Art no. 6001204. https://doi.org/10.1109/LSENS.2021.3081689
Gupta, S., Thakur, R., Maheshwari, V., & Pulgam, N. (2020, December). Sign Language Converter Using Hand Gestures. In 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) (pp. 251-256). IEEE. http://dx.doi.org/10.1109/ICISS49785.2020.9315964
Fernandes, Lance & Dalvi, Prathamesh & Junnarkar, Akash & Bansode, Manisha. (2020). Convolutional Neural Network based Bidirectional Sign Language Translation System. 769-775. 10.1109/ICSSIT48917.2020.9214272. https://doi.org/10.1109/ICSSIT48917.2020.9214272
S. M. Kamrul Hasan and M. Ahmad, "A new approach of sign language recognition system for bilingual users," 2015 International Conference on Electrical & Electronic Engineering (ICEEE), Rajshahi, Bangladesh, 2015, pp. 33-36. http://dx.doi.org/10.1109/CEEE.2015.7428284
Masood, S., Srivastava, A., Thuwal, H. C., & Ahmad, M. (2018). Real-time sign language gesture (word) recognition from video sequences using CNN and RNN. In Intelligent Engineering Informatics: Proceedings of the 6th International Conference on FICTA (pp. 623-632). Springer Singapore. http://dx.doi.org/10.1007/978-981-10-7566-7_63
Raghavan, R. J., Prasad, K. A., Muraleedharan, R., & Geetha, M. (2013, October). Animation system for Indian Sign Language communication using LOTS notation. In 2013 International Conference on Emerging Trends in Communication, Control, Signal Processing and Computing Applications (C2SPCA) (pp. 1-7). IEEE. https://doi.org/10.1109/C2SPCA.2013.6749444
Koller, O., Zargaran, S., Ney, H., & Bowden, R. (2018). Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. International Journal of Computer Vision, 126, 1311-1325. https://doi.org/10.1007/s11263-018-1121-3
Ojha, A., Pandey, A., Maurya, S., Thakur, A., & Dayananda, P. (2020). Sign language to text and speech translation in real time using convolutional neural network. Int. J. Eng. Res. Technol.(IJERT), 8(15), 191-196. 10.17577/IJERTCONV8IS15042
Bantupalli, K., & Xie, Y. (2018, December). American sign language recognition using deep learning and computer vision. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 4896-4899). IEEE. https://doi.org/10.1109/BigData.2018.8622141
Mohandes, M., Aliyu, S., & Deriche, M. (2014, June). Arabic sign language recognition using the leap motion controller. In 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE) (pp. 960-965). IEEE. https://doi.org/10.1109/ISIE.2014.6864742
Orbay, A., & Akarun, L. (2020, November). Neural sign language translation by learning tokenization. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (pp. 222-228). IEEE. https://doi.org/10.1109/FG47880.2020.00002
Dutta, K. K., & GS, A. K. (2015, December). Double handed Indian Sign Language to speech and text. In 2015 Third International Conference on Image Information Processing (ICIIP) (pp. 374-377). IEEE. https://doi.org/10.1109/ICIIP.2015.7414799
Forster, J., Schmidt, C., Hoyoux, T., Koller, O., Zelle, U., Piater, J. H., & Ney, H. (2012, May). RWTH-PHOENIX-weather: A large vocabulary sign language recognition and translation corpus. In LREC (Vol. 9, pp. 3785-3789). https://aclanthology.org/L12-1503/
Starner, T., Weaver, J., & Pentland, A. (1998). Real-time American sign language recognition using desk and wearable computer based video. IEEE Transactions on pattern analysis and machine intelligence, 20(12), 1371-1375. https://doi.org/10.1109/34.735811
Kaur, R., & Kumar, P. (2017, June). Sign language based SMS generator for hearing impaired people. In 2017 International Conference on Computational Intelligence in Data Science (ICCIDS) (pp. 1-5). IEEE. https://doi.org/10.1109/ICCIDS.2017.8272655
Devansh Sharma (2023) Image Classification Using CNN | Step-wise Tutorial https://www.analyticsvidhya.com/blog/2021/01/image-classification-using-convolutional-neural-networks-a-step-by-step-guide/
Shinde, Amitkumar & Kagalkar, Ramesh. (2015). Advanced Marathi Sign Language Recognition using Computer Vision. International Journal of Computer Applications. 118. 1-7. 10.5120/20802-3485. http://dx.doi.org/10.5120/20802-3485
Samsung Galaxy S10 Lite - Full phone specifications (gsmarena.com)
Adam (keras.io)
tf.keras.preprocessing.image.ImageDataGenerator | TensorFlow v2.13.0

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

CNN-LSTM-Dense Hindi Sign Language Conversion System

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related Work

3 Convolutional Neural Networks (CNNs)

3.1 CNN in Object Detection and Classification

3.2 CNN in Sign Language Recognition

4 Experimentation and Approach

4.1 Dataset Construction

4.2 Pre-processing and Augmentation of Data

4.3 Training the Model Using Train Data

4.4 Evaluating the Model with Test Data and Calculating Model Accuracy

5 Results

6 Threats to Validity

7 Conclusion and Future Scope

8 Statements and Declaration

Competing Interests:

Funding:

Financial interests:

Human Ethics and Consent to Participate:

Data Availability:

Consent to Participate:

Author Contribution

9 References

Additional Declarations

Status:

Version 1