scispace - formally typeset
Search or ask a question
Author

Neeraj Battan

Bio: Neeraj Battan is an academic researcher from International Institute of Information Technology, Hyderabad. The author has contributed to research in topics: Sentiment analysis & Deep learning. The author has an hindex of 2, co-authored 6 publications receiving 11 citations.

Papers
More filters
Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this paper, a two-stage activity generation method is proposed to synthesize a long-term (> 6000 ms) human motion trajectory across a large variety of human activity classes (> 50).
Abstract: Synthesis of long-term human motion skeleton sequences is essential to aid human-centric video generation [8] with potential applications in Augmented Reality, 3D character animations, pedestrian trajectory prediction, etc. Long-term human motion synthesis is a challenging task due to multiple factors like, long-term temporal dependencies among poses, cyclic repetition across poses, bi-directional and multi-scale dependencies among poses, variable speed of actions, and a large as well as partially overlapping space of temporal pose variations across multiple class/types of human activities. This paper aims to address these challenges to synthesize a long-term (> 6000 ms) human motion trajectory across a large variety of human activity classes (> 50). We propose a two-stage activity generation method to achieve this goal, where the first stage deals with learning the long-term global pose dependencies in activity sequences by learning to synthesize a sparse motion trajectory while the second stage addresses the generation of dense motion trajectories taking the output of the first stage. We demonstrate the superiority of the proposed method over SOTA methods using various quantitative evaluation metrics on publicly available datasets.

19 citations

Posted Content
TL;DR: This paper introduced curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigated various training strategies for enhancing model performance, which outperforms the state-of-the-art methods for Hi-En codemixed sentiment analysis.
Abstract: Sentiment Analysis and other semantic tasks are commonly used for social media textual analysis to gauge public opinion and make sense from the noise on social media. The language used on social media not only commonly diverges from the formal language, but is compounded by codemixing between languages, especially in large multilingual societies like India. Traditional methods for learning semantic NLP tasks have long relied on end to end task specific training, requiring expensive data creation process, even more so for deep learning methods. This challenge is even more severe for resource scarce texts like codemixed language pairs, with lack of well learnt representations as model priors, and task specific datasets can be few and small in quantities to efficiently exploit recent deep learning approaches. To address above challenges, we introduce curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigate various training strategies for enhancing model performance. Our method outperforms the state of the art methods for Hi-En codemixed sentiment analysis by 3.31% accuracy, and also shows better model robustness in terms of convergence, and variance in test performance.

4 citations

Book ChapterDOI
10 Aug 2019
TL;DR: This work introduces curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigates various training strategies for enhancing model performance.
Abstract: Sentiment Analysis and other semantic tasks are commonly used for social media textual analysis to gauge public opinion and make sense from the noise on social media. The language used on social media not only commonly diverges from the formal language, but is compounded by code-mixing between languages, especially in large multilingual societies like India.

3 citations

Posted Content
TL;DR: This work proposes, a 3D human motion descriptor learned using a deep network that exploits the inter-class similarity using trajectory cues, and performs far superior in a self-supervised setting.
Abstract: 3D Human Motion Indexing and Retrieval is an interesting problem due to the rise of several data-driven applications aimed at analyzing and/or re-utilizing 3D human skeletal data, such as data-driven animation, analysis of sports bio-mechanics, human surveillance etc. Spatio-temporal articulations of humans, noisy/missing data, different speeds of the same motion etc. make it challenging and several of the existing state of the art methods use hand-craft features along with optimization based or histogram based comparison in order to perform retrieval. Further, they demonstrate it only for very small datasets and few classes. We make a case for using a learned representation that should recognize the motion as well as enforce a discriminative ranking. To that end, we propose, a 3D human motion descriptor learned using a deep network. Our learned embedding is generalizable and applicable to real-world data - addressing the aforementioned challenges and further enables sub-motion searching in its embedding space using another network. Our model exploits the inter-class similarity using trajectory cues, and performs far superior in a self-supervised setting. State of the art results on all these fronts is shown on two large scale 3D human motion datasets - NTU RGB+D and HDM05.

1 citations

Posted Content
TL;DR: In this article, a two-stage activity generation method is proposed to synthesize a long-term (> 6000 ms) human motion trajectory across a large variety of human activity classes (>50).
Abstract: Synthesis of long-term human motion skeleton sequences is essential to aid human-centric video generation with potential applications in Augmented Reality, 3D character animations, pedestrian trajectory prediction, etc. Long-term human motion synthesis is a challenging task due to multiple factors like, long-term temporal dependencies among poses, cyclic repetition across poses, bi-directional and multi-scale dependencies among poses, variable speed of actions, and a large as well as partially overlapping space of temporal pose variations across multiple class/types of human activities. This paper aims to address these challenges to synthesize a long-term (> 6000 ms) human motion trajectory across a large variety of human activity classes (>50). We propose a two-stage activity generation method to achieve this goal, where the first stage deals with learning the long-term global pose dependencies in activity sequences by learning to synthesize a sparse motion trajectory while the second stage addresses the generation of dense motion trajectories taking the output of the first stage. We demonstrate the superiority of the proposed method over SOTA methods using various quantitative evaluation metrics on publicly available datasets.

Cited by
More filters
01 Jan 2009
TL;DR: In this paper, the authors present an approach to label mocap data according to a given set of motion categories or classes, each specified by a suitable set of positive example motions.
Abstract: In view of increasing collections of available 3D motion capture (mocap) data, the task of automatically annotating large sets of unstructured motion data is gaining in importance. In this paper, we present an efficient approach to label mocap data according to a given set of motion categories or classes, each specified by a suitable set of positive example motions. For each class, we derive a motion template that captures the consistent and variable aspects of a motion class in an explicit matrix representation. We then present a novel annotation procedure, where the unknown motion data is segmented and annotated by locally comparing it with the available motion templates. This procedure is supported by an efficient keyframe-based preprocessing step, which also significantly improves the annotation quality by eliminating false positive matches. As a further contribution, we introduce a genetic learning algorithm to automatically learn the necessary keyframes from the given example motions. For evaluation, we report on various experiments conducted on two freely available sets of motion capture data (CMU and HDM05).

97 citations

Proceedings ArticleDOI
24 Mar 2022
TL;DR: This work proposes a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation, and develops a contrastive learning strategy based on audio-text alignment for better audio representations.
Abstract: Generating speech-consistent body and gesture movements is a long-standing problem in virtual avatar creation. Previous studies often synthesize pose movement in a holistic manner, where poses of all joints are generated simultaneously. Such a straightforward pipeline fails to generate fine-grained co-speech gestures. One observation is that the hierarchical semantics in speech and the hierarchical structures of human gestures can be naturally described into multiple granularities and associated together. To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. In HA2G, a Hierarchical Audio Learner extracts audio representations across semantic granularities. A Hierarchical Pose Inferer subsequently renders the entire human pose gradually in a hierarchical manner. To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations. Extensive experiments and human evaluation demonstrate that the proposed method renders realistic co-speech gestures and out-performs previous methods in a clear margin. Project page: https://alvinliu0.github.io/projects/HA2G.

24 citations

Proceedings ArticleDOI
29 Mar 2022
TL;DR: This paper proposes a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision, through a self-enhancing dual-loop learning framework, and achieves encouraging results significantly outperforming the state of the art and, in some cases, even on par with results of fully-super supervised methods.
Abstract: Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions like consistency loss to guide the learning, which, inevitably, leads to inferior results in real-world scenarios with unseen poses. In this paper, we propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision, through a self-enhancing dual-loop learning framework. This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator; the three components form two loops during the training process, complementing and strengthening one another. Specifically, the pose estimator transforms an input 2D pose sequence to a low-fidelity 3D output, which is then enhanced by the imitator that enforces physical constraints. The refined 3D poses are subsequently fed to the hallucinator for producing even more diverse data, which are, in turn, strengthened by the imitator and further utilized to train the pose estimator. Such a co-evolution scheme, in practice, enables training a pose estimator on self-generated motion data without relying on any given 3D data. Extensive experiments across various benchmarks demonstrate that our approach yields encouraging results significantly outperforming the state of the art and, in some cases, even on par with results of fully-supervised methods. Notably, it achieves 89.1% 3D PCK on MPI-INF-3DHP under self-supervised cross-dataset evaluation setup, improving upon the previous best self-supervised method [16], [26] by 8.6%.

18 citations

Proceedings ArticleDOI
25 Mar 2022
TL;DR: It is shown that variable-length motions generated by the proposed action-conditional human motion generation method are better than fixed-length motion generation by the state-of-the-art method in terms of realism and diversity.
Abstract: We propose an action-conditional human motion generation method using variational implicit neural representations (INR). The variational formalism enables action-conditional distributions of INRs, from which one can easily sample representations to generate novel human motion sequences. Our method offers variable-length sequence generation by construction because a part of INR is optimized for a whole sequence of arbitrary length with temporal embeddings. In contrast, previous works reported difficulties with modeling variable-length sequences. We confirm that our method with a Transformer decoder outperforms all relevant methods on HumanAct12, NTU-RGBD, and UESTC datasets in terms of realism and diversity of generated motions. Surprisingly, even our method with an MLP decoder consistently outperforms the state-of-the-art Transformer-based auto-encoder. In particular, we show that variable-length motions generated by our method are better than fixed-length motions generated by the state-of-the-art method in terms of realism and diversity. Code at https://github.com/PACerv/ImplicitMotion.

10 citations

Proceedings ArticleDOI
10 Dec 2020
TL;DR: The authors proposed an ensemble based approach which is based on hybridization of Naive Bayes, SVM, Linear Regression, and SGD classifiers for sentiment classification of Hindi-English text.
Abstract: India is a multilingual and multi-script country and a large part of its population speaks more than one language. It has been noted that such multilingual speakers switch between languages while communicating informally. The code-mixed language is very common in informal communication and social media, and extracting sentiments from these code-mixed sentences is a challenging task. In this work, we have worked on sentiment classification for one of the most common code-mixed language pairs in India i.e. Hindi-English. The conventional sentiment analysis techniques designed for a single language don’t provide satisfactory results for such texts. We have proposed two approaches for better sentiment classification. We have proposed an Ensembling based approach which is based on hybridization of Naive Bayes, SVM, Linear Regression, and SGD classifiers. We have also developed a bidirectional LSTM based novel approach. The approaches provide quite satisfactory results for the code-mixed Hindi-English text.

9 citations