scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Sign Language Gesture Recognition with Convolutional-Type Features on Ensemble Classifiers and Hybrid Artificial Neural Network

20 Jul 2022-Applied Sciences-Vol. 12, Iss: 14, pp 7303-7303
TL;DR: The proposed methodologies are able to handle a diverse variety of images that include labyrinthine backgrounds, user-specific distinctions, minuscule discrepancies between classes and image alterations as well as producing accuracies comparable with state-of-the-art literature.
Abstract: The proposed research deals with constructing a sign gesture recognition system to enable improved interaction between sign and non-sign users. With respect to this goal, five types of features are utilized—hand coordinates, convolutional features, convolutional features with finger angles, convolutional features on hand edges and convolutional features on binary robust invariant scalable keypoints—and trained on ensemble classifiers to accurately predict the label of the sign image provided as input. In addition, a hybrid artificial neural network is also fabricated that takes two of the aforementioned features, namely convolutional features and convolutional features on hand edges to precisely locate the hand region of the sign gesture under consideration in an attempt for classification. Experiments are also performed with convolutional neural networks on those benchmark datasets which are not accurately classified by the previous two methods. Overall, the proposed methodologies are able to handle a diverse variety of images that include labyrinthine backgrounds, user-specific distinctions, minuscule discrepancies between classes and image alterations. As a result, they are able to produce accuracies comparable with state-of-the-art literature.

Content maybe subject to copyright    Report

Citations
More filters
Journal Article
TL;DR: In this paper , supervised machine learning models are being implemented to classify water quality indexes, and the Smote analysis is used to handle the imbalance in the dataset and the results and interpretations for the predictions seem to be more promising and attractive making the proposed models more interpretable, accurate and efficient.
Abstract: Water is known as a "universal solvent" as it is extraordinarily frail against contamination. Water quality standards are developed based on logical evidence on the effects of hazardous compounds on a certain quantity of water used. Classification technique of machine learning can be employed to under-stand the water quality status. In this work, supervised machine learning models are being implemented to classify water quality indexes, and the Smote analysis is used to handle the imbalance in the dataset. Artificial neural net-work model is built using the features such as Oxygen, pH, temperature, total suspended sediment, turbidity, nitrogen, and phosphorus as inputs and water quality check as target variable. This target variable is created using Canadian Council of Ministers of the Environment Water Quality Index, and the model works with an accuracy of 87%. The classification is done on XGBoost model as well and it performs with an accuracy of 90%. The explanations for predictions of these models for a data instance were performed using explainable artificial intelligence tools such as LIME and SHAP. The results and interpretations for the predictions seem to be more promising and attractive making the proposed models more interpretable, accurate and efficient. Through our re-search we can benefit our readers by providing them clarity about exactly what features are having more influence on water quality than others from different machine learning algorithms. This will help the developers to gain insights about the significant factors of poor water quality and how to overcome that.
References
More filters
Posted Content
TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

11,866 citations

Proceedings ArticleDOI
06 Nov 2011
TL;DR: A comprehensive evaluation on benchmark datasets reveals BRISK's adaptive, high quality performance as in state-of-the-art algorithms, albeit at a dramatically lower computational cost (an order of magnitude faster than SURF in cases).
Abstract: Effective and efficient generation of keypoints from an image is a well-studied problem in the literature and forms the basis of numerous Computer Vision applications. Established leaders in the field are the SIFT and SURF algorithms which exhibit great performance under a variety of image transformations, with SURF in particular considered as the most computationally efficient amongst the high-performance methods to date. In this paper we propose BRISK1, a novel method for keypoint detection, description and matching. A comprehensive evaluation on benchmark datasets reveals BRISK's adaptive, high quality performance as in state-of-the-art algorithms, albeit at a dramatically lower computational cost (an order of magnitude faster than SURF in cases). The key to speed lies in the application of a novel scale-space FAST-based detector in combination with the assembly of a bit-string descriptor from intensity comparisons retrieved by dedicated sampling of each keypoint neighborhood.

3,292 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work presents a weakly supervised framework with deep neural networks for vision-based continuous sign language recognition, where the ordered gloss labels but no exact temporal locations are available with the video of sign sentence, and the amount of labeled sentences for training is limited.
Abstract: This work presents a weakly supervised framework with deep neural networks for vision-based continuous sign language recognition, where the ordered gloss labels but no exact temporal locations are available with the video of sign sentence, and the amount of labeled sentences for training is limited. Our approach addresses the mapping of video segments to glosses by introducing recurrent convolutional neural network for spatio-temporal feature extraction and sequence learning. We design a three-stage optimization process for our architecture. First, we develop an end-to-end sequence learning scheme and employ connectionist temporal classification (CTC) as the objective function for alignment proposal. Second, we take the alignment proposal as stronger supervision to tune our feature extractor. Finally, we optimize the sequence learning model with the improved feature representations, and design a weakly supervised detection network for regularization. We apply the proposed approach to a real-world continuous sign language recognition benchmark, and our method, with no extra supervision, achieves results comparable to the state-of-the-art.

255 citations

Journal ArticleDOI
TL;DR: This paper targets Indian sign recognition area based on dynamic hand gesture recognition techniques in real-time scenario and would be helpful in teaching and communication of hearing impaired persons.
Abstract: Needs and new technologies always inspire people to make new ways to interact with machines. This interaction can be for a specific purpose or a framework which can be applied to many applications. Sign language recognition is a very important area where an easiness in interaction with human or machine will help a lot of people. At this time, India has 2.8M people who can't speak or can't hear properly. This paper targets Indian sign recognition area based on dynamic hand gesture recognition techniques in real-time scenario. The captured video was converted to HSV color space for pre-processing and then segmentation was done based on skin pixels. Also Depth information was used in parallel to get more accurate results. Hu-Moments and motion trajectory were extracted from the image frames and the classification of gestures was done by Support Vector Machine. The system was tested with webcam as well as with MS Kinect. This type of system would be helpful in teaching and communication of hearing impaired persons.

85 citations

Journal ArticleDOI
TL;DR: Experimental results show that the performance of the proposed method outperforms state-of-the-art recognition accuracy using leave-one-out evaluation strategy and is evaluated using public dataset of real depth images captured from various users.
Abstract: Sign language is the most natural and effective way for communications among deaf and normal people. American Sign Language (ASL) alphabet recognition (i.e. fingerspelling) using marker-less vision sensor is a challenging task due to the difficulties in hand segmentation and appearance variations among signers. Existing color-based sign language recognition systems suffer from many challenges such as complex background, hand segmentation, large inter-class and intra-class variations. In this paper, we propose a new user independent recognition system for American sign language alphabet using depth images captured from the low-cost Microsoft Kinect depth sensor. Exploiting depth information instead of color images overcomes many problems due to their robustness against illumination and background variations. Hand region can be segmented by applying a simple preprocessing algorithm over depth image. Feature learning using convolutional neural network architectures is applied instead of the classical hand-crafted feature extraction methods. Local features extracted from the segmented hand are effectively learned using a simple unsupervised Principal Component Analysis Network (PCANet) deep learning architecture. Two strategies of learning the PCANet model are proposed, namely to train a single PCANet model from samples of all users and to train a separate PCANet model for each user, respectively. The extracted features are then recognized using linear Support Vector Machine (SVM) classifier. The performance of the proposed method is evaluated using public dataset of real depth images captured from various users. Experimental results show that the performance of the proposed method outperforms state-of-the-art recognition accuracy using leave-one-out evaluation strategy.

69 citations