scispace - formally typeset
Search or ask a question
Book ChapterDOI

Automatic Bharatnatyam Dance Posture Recognition and Expertise Prediction Using Depth Cameras

TL;DR: P pose recognition is performed for some important postures in Bharatnatyam in order to find the origin of these postures from the Bhangas and further use this result to predict the expertise of a Bharat natyam dancer.
Abstract: Bharatnatyam is an ancient Indian Classical Dance form consisting of complex postures and movements. One main challenge which has not been addressed till now in the intelligent systems community is to perform pose recognition for the basic postures of this dance form called the Bhangas and use this for expertise prediction. In this paper, pose recognition is performed for some important postures in Bharatnatyam in order to find the origin of these postures from the Bhangas and further use this result to predict the expertise of a Bharatnatyam dancer. The features extracted are 10 joint angles using 15 joint locations to predict the 22 postures derived from the basic postures (Bhangas). Support Vector Machine classifier with a radial basis function kernel performed the best for pose recognition. By performing stick figure analysis and grouping of labels we estimate the origin of each of these postures from the Bhangas. This is followed by verification of the grouping using Hamming distance calculation. Testing is done on our own Bharatnatyam dataset consisting of 102 dancers, achieving an accuracy of 87.14%. Expertise prediction of the dancers for the 22 poses was performed for four ratings - Excellent, Good, Satisfactory and Poor giving an accuracy of 68.46% without grouping of postures and 80.80% with grouping of postures.
Citations
More filters
Proceedings ArticleDOI
18 Dec 2018
TL;DR: The results show that the state of the art fastText word vector representation based features for essays perform better than the other features considered in this work.
Abstract: Assessing handwritten essays is a human skill which is very important for school level language exams. If automated, it will enable scalable assessment and feedback at low cost. This problem involves two modalities, viz. images for Offline Handwriting Recognition (OHR) and Natural Language Processing (NLP) for essay grading. We consider the sequential information of handwriting for getting the transcriptions from text images. We train a Multidimensional Long Short Term Memory (MDLSTM) network with Connectionist Temporal Classification (CTC) cost function at the output for the task of OHR. The paper discusses the generalization of the handwriting recognition model for images taken from scanner and mobile camera. Further a comparison of results of essay grading is shown for features of essays based on GloVe and fastText based word vector representation models. We trained different models for the essay grading task considering it both as a classification and regression problem. The results show that the state of the art fastText word vector representation based features for essays perform better than the other features considered in this work. The best performing model shows Quadratic Weighted Kappa (QWK) agreement of 0.80 for grading between the human graded text essays and model graded text essays. The same model shows the QWK agreement of 0.81 for grading between the human graded text essays and the OHR transcribed essays. In this work, we consider handwritten essays written in English.

2 citations


Additional excerpts

  • ..., spoken communication [16], dancing [19] etc....

    [...]

Book ChapterDOI
01 Jan 2023
TL;DR: In this paper , the authors proposed a method to recognize the involved Key Postures (KPs) and motions in the Adavu using Convolutional Neural Network (CNN) and Support Vector Machine (SVM), respectively.
Abstract: Bharatanatyam is the oldest Indian Classical Dance (ICD) which is learned and practiced across India and the world. Adavu is the core of this dance form. There exist 15 Adavus and 58 variations. Each Adavu variation comprises a well-defined set of motions and postures (called dance steps) that occur in a particular order. So, while learning Adavus, students not only learn the dance steps but also take care of its sequence of occurrences. This paper proposed a method to recognize these sequences. In this work, firstly, we recognize the involved Key Postures (KPs) and motions in the Adavu using Convolutional Neural Network (CNN) and Support Vector Machine (SVM), respectively. In this, CNN achieves 99% and SVM’s recognition accuracy becomes 84%. Next, we compare these KP and motion sequences with the ground truth to find the best match using the Edit Distance algorithm with an accuracy of 98%. The paper contributes hugely to the state-of-the-art in the form of digital heritage, dance tutoring system, and many more. The paper addresses three novelties; (a) Recognizing the sequences based on the KPs and motions rather than only KPs as reported in the earlier works. (b) The performance of the proposed work is measured by analyzing the prediction time per sequence. We also compare our proposed approach with the previous works that deal with the same problem statement. (c) It tests the scalability of the proposed approach by including all the Adavu variations, unlike the earlier literature, which uses only one/two variations.
References
More filters
Journal ArticleDOI
TL;DR: This work takes an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem, and generates confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.
Abstract: We propose a new method to quickly and accurately predict human pose---the 3D positions of body joints---from a single depth image, without depending on information from preceding frames. Our approach is strongly rooted in current object recognition strategies. By designing an intermediate representation in terms of body parts, the difficult pose estimation problem is transformed into a simpler per-pixel classification problem, for which efficient machine learning techniques exist. By using computer graphics to synthesize a very large dataset of training image pairs, one can train a classifier that estimates body part labels from test images invariant to pose, body shape, clothing, and other irrelevances. Finally, we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.The system runs in under 5ms on the Xbox 360. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state-of-the-art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.

3,034 citations

Proceedings ArticleDOI
20 Oct 2011
TL;DR: It is verified that OpenNI, with its accompanying libraries, can be used for these activities in multi-platform learning environments and examples of how Kinect-assisted instruction can beused to achieve some of the learning outcomes in Human Computer Interaction courses as outlined in IT2008 are presented.
Abstract: The launch of the Microsoft Kinect for Xbox (a real-time 3D imaging device) and supporting libraries spurred the development of various applications including those with natural user interfaces. We propose that using Kinect offers opportunities for novel approaches to classroom instruction on natural user interaction. A number of development frameworks has come up that can be used to facilitate this instruction. We evaluate the current state of this technology and present an overview of some of its development frameworks. We then present examples of how Kinect-assisted instruction can be used to achieve some of the learning outcomes in Human Computer Interaction (HCI) courses as outlined in IT2008. We have verified that OpenNI, with its accompanying libraries, can be used for these activities in multi-platform learning environments.

141 citations

Proceedings Article
24 Feb 2013
TL;DR: Experimental results indicate that the proposed approach of simultaneous feature selection and classification is having better recognition accuracy compared to the earlier reported ones.
Abstract: In recent past the need for ubiquitous people identification has increased with the proliferation of human- robot interaction systems In this paper we propose a methodology of recognizing persons from skeleton data using Kinect First a half gait cycle is detected automatically and then features are calculated on every gait cycle As part of new features, proposed in this paper, two are related to area of upper and lower body parts and twelve related to the distances between the upper body centroid and the centriods derived from different joints of upper limbs and lower limbs Feature selection and classification is performed with connectionist system using Adaptive Neural Network (ANN) The recognition accuracy of the individual people using the proposed method is compared with the earlier methods proposed by Arian et al and Pries et al Experimental results indicate that the proposed approach of simultaneous feature selection and classification is having better recognition accuracy compared to the earlier reported ones

97 citations

Proceedings ArticleDOI
13 Oct 2013
TL;DR: This paper has presented a gait based person identification system using 3D human pose modeling for any arbitrary walking pattern in any unrestricted indoor environment, using Microsoft Kinect sensor, and modeled the gait pattern with a spatiotemporal set of key poses and sub-poses which occur periodically in different gait cycles.
Abstract: The importance of automatic person identification using non-intrusive biometric modality has created enormous interest in computer vision society over the last few years. For this, gait based person recognition is receiving much more attention in different applications like visual surveillance, security control, people counting. In this paper, we have presented a gait based person identification system using 3D human pose modeling for any arbitrary walking pattern in any unrestricted indoor environment, using Microsoft Kinect sensor. Instead of estimating gait cycle, we have modeled the gait pattern with a spatiotemporal set of key poses and sub-poses which occur periodically in different gait cycles. The robustness of the solution is increased by outlier detection to handle noisy skeleton data obtained from Kinect. The performance of the proposed system is also assessed with rotating Kinect setup to increase the field of view of single Kinect. We have done the average and worst case performance evaluation of the system with respect to the existing Kinect based approaches. It needs to be mentioned that our proposed person identification system is able to achieve a frame level F-score of more than 90% for 20 subjects with fixed Kinect setup.

43 citations

Proceedings ArticleDOI
09 Jan 2012
TL;DR: A sparse representation based dictionary learning technique is used to address dance classification as a new problem in computer vision and to present a new action descriptor to represent a dance video which overcomes the problem of the “Bags-of-Words” model.
Abstract: In this paper, we address an interesting application of computer vision technique, namely classification of Indian Classical Dance (ICD). With the best of our knowledge, the problem has not been addressed so far in computer vision domain. To deal with this problem, we use a sparse representation based dictionary learning technique. First, we represent each frame of a dance video by a pose descriptor based on histogram of oriented optical flow (HOOF), in a hierarchical manner. The pose basis is learned using an on-line dictionary learning technique. Finally each video is represented sparsely as a dance descriptor by pooling pose descriptor of all the frames. In this work, dance videos are classified using support vector machine (SVM) with intersection kernel. Our contribution here are two folds. First, to address dance classification as a new problem in computer vision and second, to present a new action descriptor to represent a dance video which overcomes the problem of the “Bags-of-Words” model. We have tested our algorithm on our own ICD dataset created from the videos collected from YouTube. An accuracy of 86.67% is achieved on this dataset. Since we have proposed a new action descriptor too, we have tested our algorithm on well known KTH dataset. The performance of the system is comparable to the state-of-the-art.

39 citations