Home
/
Authors
/
Anirudh Thatipelli

Author

Anirudh Thatipelli

International Institute of Information Technology, Hyderabad

Bio: Anirudh Thatipelli is an academic researcher from International Institute of Information Technology, Hyderabad. The author has contributed to research in topics: Bottleneck. The author has an hindex of 2, co-authored 4 publications receiving 5 citations.

Topics: Bottleneck

Papers

PDF

Open Access

More filters

Posted Content•

Quo Vadis, Skeleton Action Recognition ?

[...]

Pranay Gupta¹, Anirudh Thatipelli¹, Aditya Aggarwal¹, Shubh Maheshwari¹, Neel Trivedi¹, Sourav Das¹, Ravi Kiran Sarvadevabhatla¹ - Show less +3 more•Institutions (1)

International Institute of Information Technology, Hyderabad¹

04 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The results from benchmarking the top performers of NTU-120 on Skeletics-152 reveal the challenges and domain gap induced by actions 'in the wild', and proposes new frontiers for human action recognition.

...read moreread less

Abstract: In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To begin with, we benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. To examine skeleton action recognition 'in the wild', we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. The results from benchmarking the top performers of NTU-120 on Skeletics-152 reveal the challenges and domain gap induced by actions 'in the wild'. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. Finally, as a new frontier for action recognition, we introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. It also provides an assessment of top-performing approaches across a spectrum of activity settings and via the introduced datasets, proposes new frontiers for human action recognition.

...read moreread less

24 citations

Journal Article•DOI•

Quo Vadis, Skeleton Action Recognition?

[...]

Pranay Gupta¹, Anirudh Thatipelli¹, Aditya Aggarwal¹, Shubh Maheshwari¹, Neel Trivedi¹, Sourav Das¹, Ravi Kiran Sarvadevabhatla¹ - Show less +3 more•Institutions (1)

International Institute of Information Technology, Hyderabad¹

05 May 2021-International Journal of Computer Vision

TL;DR: Skeleton-Mimetics-152 as discussed by the authors is a 3D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset, and Metaphorics, a dataset with caption style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances.

...read moreread less

Abstract: In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To study skeleton-action recognition in the wild, we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. We also introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. We benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. The results from benchmarking the top performers of NTU-120 on the newly introduced datasets reveal the challenges and domain gap induced by actions in the wild. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. Via the introduced datasets, our work enables new frontiers for human action recognition.

...read moreread less

16 citations

Posted Content•

NTU-X: An Enhanced Large-scale Dataset for Improving Pose-based Recognition of Subtle Human Actions

[...]

Neel Trivedi¹, Anirudh Thatipelli, Ravi Kiran Sarvadevabhatla¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

27 Jan 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors introduce two new pose based human action datasets, NTU60-X and NTU120-X, which includes finger and facial joints, enabling a richer skeleton representation.

...read moreread less

Abstract: The lack of fine-grained joints (facial joints, hand fingers) is a fundamental performance bottleneck for state of the art skeleton action recognition models. Despite this bottleneck, community's efforts seem to be invested only in coming up with novel architectures. To specifically address this bottleneck, we introduce two new pose based human action datasets - NTU60-X and NTU120-X. Our datasets extend the largest existing action recognition dataset, NTU-RGBD. In addition to the 25 body joints for each skeleton as in NTU-RGBD, NTU60-X and NTU120-X dataset includes finger and facial joints, enabling a richer skeleton representation. We appropriately modify the state of the art approaches to enable training using the introduced datasets. Our results demonstrate the effectiveness of these NTU-X datasets in overcoming the aforementioned bottleneck and improve state of the art performance, overall and on previously worst performing action categories.

...read moreread less

Posted Content•

NTU60-X: Towards Skeleton-based Recognition of Subtle Human Actions.

[...]

Anirudh Thatipelli, Neel Trivedi, Ravi Kiran Sarvadevabhatla

27 Jan 2021

TL;DR: In this article, the authors introduce a new skeleton based human action dataset, NTU60-X, which includes finger and facial joints, enabling a richer skeleton representation and improving state-of-the-art performance.

...read moreread less

Abstract: The lack of fine-grained joints such as hand fingers is a fundamental performance bottleneck for state of the art skeleton action recognition models trained on the largest action recognition dataset, NTU-RGBD. To address this bottleneck, we introduce a new skeleton based human action dataset - NTU60-X. In addition to the 25 body joints for each skeleton as in NTU-RGBD, NTU60-X dataset includes finger and facial joints, enabling a richer skeleton representation. We appropriately modify the state of the art approaches to enable training using the introduced dataset. Our results demonstrate the effectiveness of NTU60-X in overcoming the aforementioned bottleneck and improve state of the art performance, overall and on hitherto worst performing action categories.

...read moreread less

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Revisiting Skeleton-based Action Recognition

[...]

01 Jun 2022

TL;DR: PoseConv3D as mentioned in this paper uses a 3D heatmap volume instead of a graph sequence as the base representation of human skeletons, which is more effective in learning spatio-temporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings.

...read moreread less

Abstract: Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt GCNs to extract features on top of human skeletons. Despite the positive results shown in these attempts, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseConv3D, a new approach to skeleton-based action recognition. PoseConv3D relies on a 3D heatmap volume instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseConv3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseConv3D can handle multiple-person scenarios without additional computation costs. The hierarchical features can be easily integrated with other modalities at early fusion stages, providing a great design space to boost the performance. PoseConv3D achieves the state-of-the-art on five of six standard skeleton-based action recognition benchmarks. Once fused with other modalities, it achieves the state-of-the-art on all eight multi-modality action recognition benchmarks. Code has been made available at: https://github.com/kennymckormick/pyskl.

...read moreread less

44 citations

Journal Article•DOI•

The AMIRO Social Robotics Framework: Deployment and Evaluation on the Pepper Robot

[...]

Alexandra Ștefania Ghiță¹, Alexandru Florin Gavril¹, Mihai Nan¹, Bilal Hoteit¹, Imad Alex Awada¹, Alexandru Sorici¹, Irina Mocanu¹, Adina Magda Florea¹ - Show less +4 more•Institutions (1)

Politehnica University of Bucharest¹

18 Dec 2020-Sensors

TL;DR: The AMIRO social robotics framework as discussed by the authors is designed in a modular and robust way for assistive care scenarios, including robotic services for navigation, person detection and recognition, multi-lingual natural language interaction and dialogue management, as well as activity recognition and general behavior composition.

...read moreread less

Abstract: Recent studies in social robotics show that it can provide economic efficiency and growth in domains such as retail, entertainment, and active and assisted living (AAL). Recent work also highlights that users have the expectation of affordable social robotics platforms, providing focused and specific assistance in a robust manner. In this paper, we present the AMIRO social robotics framework, designed in a modular and robust way for assistive care scenarios. The framework includes robotic services for navigation, person detection and recognition, multi-lingual natural language interaction and dialogue management, as well as activity recognition and general behavior composition. We present AMIRO platform independent implementation based on a Robot Operating System (ROS). We focus on quantitative evaluations of each functionality module, providing discussions on their performance in different settings and the possible improvements. We showcase the deployment of the AMIRO framework on a popular social robotics platform-the Pepper robot-and present the experience of developing a complex user interaction scenario, employing all available functionality modules within AMIRO.

...read moreread less

9 citations

Journal Article•DOI•

Zoom Transformer for Skeleton-Based Group Activity Recognition

[...]

Jiaxu Zhang, Yifan Jia, Wei Xie, Zhigang Tu

01 Dec 2022

TL;DR: ZoomTransformer as discussed by the authors exploits both the low-level single-person motion information and the high-level multi-person interaction information in a uniform model structure with carefully designed relation-aware maps.

...read moreread less

Abstract: Skeleton-based human action recognition has attracted increasing attention and many methods have been proposed to boost the performance. However, these methods still confront three main limitations: 1) Focusing on single-person action recognition while neglecting the group activity of multiple people (more than 5 people). In practice, multi-person group activity recognition via skeleton data is also a meaningful problem. 2) Unable to mine high-level semantic information from the skeleton data, such as interactions among multiple people and their positional relationships. 3) Existing datasets used for multi-person group activity recognition are all RGB videos involved, which cannot be directly applied to skeleton-based group activity analysis. To address these issues, we propose a novel Zoom Transformer to exploit both the low-level single-person motion information and the high-level multi-person interaction information in a uniform model structure with carefully designed Relation-aware Maps. Besides, we estimate the multi-person skeletons from the existing real-world video datasets i.e. Kinetics and Volleyball-Activity, and release two new benchmarks to verify the effectiveness of our Zoom Transfromer. Extensive experiments demonstrate that our model can effectively cope with the skeleton-based multi-person group activity. Additionally, experiments on the large-scale NTU-RGB+D dataset validate that our model also achieves remarkable performance for single-person action recognition. The code and the skeleton data are publicly available at https://github.com/Kebii/Zoom-Transformer

...read moreread less

5 citations

Journal Article•DOI•

Zoom Transformer for Skeleton-Based Group Activity Recognition

[...]

01 Dec 2022-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Zoom Transformer as mentioned in this paper exploits both low-level single-person motion information and the high-level multi-person interaction information in a uniform model structure with carefully designed Relation-aware Maps.

...read moreread less

5 citations

Journal Article•DOI•

Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition

[...]

Yang Yang, Guangjun Liu, Xuehao Gao

01 Dec 2022

TL;DR: Li et al. as mentioned in this paper proposed a Motion Guided Attention Learning (MG-AL) framework, which formulates the action representation learning as a self-supervised motion attention prediction problem.

...read moreread less

Abstract: 3D human action recognition has received increasing attention due to its potential application in video surveillance equipment. To guarantee satisfactory performance, previous studies are mainly based on supervised methods, which have to add a large amount of manual annotation costs. In addition, general deep networks for video sequences suffer from heavy computational costs, thus cannot satisfy the basic requirement of embedded systems. In this paper, a novel Motion Guided Attention Learning (MG-AL) framework is proposed, which formulates the action representation learning as a self-supervised motion attention prediction problem. Specifically, MG-AL is a lightweight network. A set of simple motion priors (e.g., intra-joint variance, inter-frame deviation, intra-joint variance, and cross-joint covariance), which minimizes additional parameters and computational overhead, is regarded as a supervisory signal to guide the attention generation. The encoder is trained via predicting multiple self-attention tasks to capture action-specific feature representations. Extensive evaluations are performed on three challenging benchmark datasets (NTU-RGB+D 60, NTU-RGB+D 120 and NW-UCLA). The proposed method achieves superior performance compared to state-of-the-art methods, while having a very low computational cost.

...read moreread less

4 citations

1
2
3
4
…
5
6
7

Collapse