scispace - formally typeset
Search or ask a question
Author

Qing Zhu

Bio: Qing Zhu is an academic researcher from Beijing University of Technology. The author has contributed to research in topics: Recurrent neural network & Time delay neural network. The author has an hindex of 1, co-authored 2 publications receiving 18 citations.

Papers
More filters
Proceedings ArticleDOI
21 Jul 2017
TL;DR: An appropriate model based on convolutional neural network (CNN) combined with Long Short-Term Memory (LSTM) network is formulated in order to accomplish the continuous recognition work of real-time SLR system.
Abstract: The goal of sign language recognition (SLR) is to translate the sign language into text, and provide a convenient tool for the communication between the deaf-mute and the ordinary. In this paper, we formulate an appropriate model based on convolutional neural network (CNN) combined with Long Short-Term Memory (LSTM) network, in order to accomplish the continuous recognition work. With the strong ability of CNN, the information of pictures captured from Chinese sign language (CSL) videos can be learned and transformed into vector. Since the video can be regarded as an ordered sequence of frames, LSTM model is employed to connect with the fully-connected layer of CNN. As a recurrent neural network (RNN), it is suitable for sequence learning tasks with the capability of recognizing patterns defined by temporal distance. Compared with traditional RNN, LSTM has performed better on storing and accessing information. We evaluate this method on our self-built dataset including 40 daily vocabularies. The experimental results show that the recognition method with CNN-LSTM can achieve a high recognition rate with small training sets, which will meet the needs of real-time SLR system.

28 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: An algorithm for dynamic outputting writing rendering based on brush model, which achieves a more delicate rendering of Chinese calligraphy to enhance the user's operating results, and finishes the unique writing effect separated the ChineseCalligraphy form other general writing results, which greatly enhances the Chinesecalligraphy simulation.
Abstract: In order to achieve the simulation of elaborate stroke trajectories in Chinese calligraphy, this paper puts forward the innovative researching on writing momentum in the field of non-photorealistic rendering in the first time. Through the analysis of using pen in Chinese calligraphy, the writing momentum is divided into three parts: the center, the side and the back of writing brush by the judgment of the angle of brush holder. We design an algorithm for dynamic outputting writing rendering based on brush model. According to monitoring parameters such as the direction, position and normalized pressure of using pen, we calculate parameters like the footprint direction, the shape, size and nib bending after writing. The algorithm can also judge the dynamic writing trend of stroke trajectories, even automatic generate stroke trajectories by the algorithm forecasted. We achieve a more delicate rendering of Chinese calligraphy to enhance the user's operating results. And we finish the unique writing effect separated the Chinese calligraphy form other general writing results, which greatly enhances the Chinese calligraphy simulation. So that people who lack of writing skills can easily draw a beautiful charm font.

Cited by
More filters
Journal ArticleDOI
14 May 2020
TL;DR: Transfer learning and fine tuning deep convolutional neural networks are utilized to improve the accuracy of recognizing 32 hand gestures from the Arabic sign language.
Abstract: Sign Language is considered the main communication tool for deaf or hearing impaired people. It is a visual language that uses hands and other parts of the body to provide people who are in need to full access of communication with the world. Accordingly, the automation of sign language recognition has become one of the important applications in the areas of Artificial Intelligence and Machine learning. Specifically speaking, Arabic sign language recognition has been studied and applied using various intelligent and traditional approaches, but with few attempts to improve the process using deep learning networks. This paper utilizes transfer learning and fine tuning deep convolutional neural networks (CNN) to improve the accuracy of recognizing 32 hand gestures from the Arabic sign language. The proposed methodology works by creating models matching the VGG16 and the ResNet152 structures, then, the pre-trained model weights are loaded into the layers of each network, and finally, our own soft-max classification layer is added as the final layer after the last fully connected layer. The networks were fed with normal 2D images of the different Arabic Sign Language data, and was able to provide accuracy of nearly 99%.

64 citations

Journal ArticleDOI
TL;DR: This survey provides an overview of the most important work on Chinese sign language recognition and translation, discussed its classification, highlights the features explored in sign language Recognition research, presents the datasets available, and provides trends for the future research.
Abstract: As with the huge number of deaf-mute people in China is of concern, there is a growing need to integrate them into mainstream society through the use of efficient sign language processing technologies. Sign language processing entails the systematic recognition and translation of sign language images/videos to text or speech. This survey provides an overview of the most important work on Chinese sign language recognition and translation, discussed its classification, highlights the features explored in sign language recognition research, presents the datasets available, and provides trends for the future research.

31 citations

Journal ArticleDOI
19 Feb 2020-PLOS ONE
TL;DR: This study proposes to classify 60 signs from the American Sign Language based on data provided by the LeapMotion sensor by using different conventional machine learning and deep learning models including a model called DeepConvLSTM that integrates convolutional and recurrent layers with Long-Short Term Memory cells.
Abstract: Human activity recognition is an important and difficult topic to study because of the important variability between tasks repeated several times by a subject and between subjects. This work is motivated by providing time-series signal classification and a robust validation and test approaches. This study proposes to classify 60 signs from the American Sign Language based on data provided by the LeapMotion sensor by using different conventional machine learning and deep learning models including a model called DeepConvLSTM that integrates convolutional and recurrent layers with Long-Short Term Memory cells. A kinematic model of the right and left forearm/hand/fingers/thumb is proposed as well as the use of a simple data augmentation technique to improve the generalization of neural networks. DeepConvLSTM and convolutional neural network demonstrated the highest accuracy compared to other models with 91.1 (3.8) and 89.3 (4.0) % respectively compared to the recurrent neural network or multi-layer perceptron. Integrating convolutional layers in a deep learning model seems to be an appropriate solution for sign language recognition with depth sensors data.

22 citations

Journal ArticleDOI
TL;DR: This work replaces the traditional encoder in a neural machine translation (NMT) module with an improved architecture, which incorporates a temporal convolution (T-Conv) unit and a dynamic hierarchical bidirectional GRU (DH-BiGRU) unit sequentially.
Abstract: Sign language translation (SLT) is an important application to bridge the communication gap between deaf and hearing people. In recent years, the research on the SLT based on neural translation frameworks has attracted wide attention. Despite the progress, current SLT research is still in the initial stage. In fact, current systems perform poorly in processing long sign sentences, which often involve long-distance dependencies and require large resource consumption. To tackle this problem, we propose two explainable adaptations to the traditional neural SLT models using optimized tokenization-related modules. First, we introduce a frame stream density compression (FSDC) algorithm for detecting and reducing the redundant similar frames, which effectively shortens the long sign sentences without losing information. Then, we replace the traditional encoder in a neural machine translation (NMT) module with an improved architecture, which incorporates a temporal convolution (T-Conv) unit and a dynamic hierarchical bidirectional GRU (DH-BiGRU) unit sequentially. The improved component takes the temporal tokenization information into consideration to extract deeper information with reasonable resource consumption. Our experiments on the RWTH-PHOENIX-Weather 2014T dataset show that the proposed model outperforms the state-of-the-art baseline up to about 1.5+ BLEU-4 score gains.

17 citations

Journal ArticleDOI
TL;DR: A multimodal deep learning architecture for sign language recognition which effectively combines RGB-D input and two-stream spatiotemporal networks is proposed which obtains the state-the-of-art performance on the datasets of CSL and IsoGD.
Abstract: Different from other human behaviors, sign language has the characteristics of limited local motion of upper limb and meticulous hand action. Some sign language gestures are ambiguous in RGB video due to the influence of lighting and background color, which affects the recognition accuracy. We propose a multimodal deep learning architecture for sign language recognition which effectively combines RGB-D input and two-stream spatiotemporal networks. Depth videos, as an effective compensation of RGB input, can supply additional distance information about the signer's hands. A novel sampling method called ARSS (Aligned Random Sampling in Segments) is put forward to select and align optimal RGB-D video frames, which improves the capacity utilization of multimodal data and reduces the redundancy. We get the hand ROI by joints information of RGB data for local focus in spatial stream. D-shift Net is proposed as depth motion feature extraction in temporal stream, which fully utilizes three dimensional motion information of the sign language. Both streams are fused by convolutional fusion layer to get complementary features. Our approach explored the multimodal information and enhanced the recognition precision. It obtains the state-the-of-art performance on the datasets of CSL (96.7%) and IsoGD (63.78%).

15 citations