scispace - formally typeset
Journal ArticleDOI

Learning and fusing multiple hidden substages for action quality assessment

Reads0
Chats0
TLDR
In this article, a learning and fusion network of multiple hidden substages is proposed to assess athletic performance by segmenting videos into five substages by a temporal semantic segmentation, and a fully-connected-network-based hidden regression model is built to predict the score of each substage, fusing these scores into the overall score.
Abstract
Many of the existing methods for action quality assessment implement single-stage score regression networks that lack pertinence and rationality for the evaluation task. In this work, our target is to find a reasonable action quality assessment method for sports competitions that conforms to objective evaluation rules and field experience. To achieve this goal, three assessment scenarios, i.e., the overall-score-guided scenario, execution-score-guided scenario, and difficulty-level-based overall-score-guided scenario, are defined. A learning and fusion network of multiple hidden substages is proposed to assess athletic performance by segmenting videos into five substages by a temporal semantic segmentation. The feature of each video segment is extracted from the five feature backbone networks with shared weights, and a fully-connected-network-based hidden regression model is built to predict the score of each substage, fusing these scores into the overall score. We evaluate the proposed method on the UNLV-Diving dataset. The comparison results show that the proposed method based on objective evaluation rules of sports competitions outperforms the regression model directly trained on the overall score. The proposed multiple-substage network is more accurate than the single-stage score regression network and achieves state-of-the-art performance by leveraging objective evaluation rules and field experience that are beneficial for building an accurate and reasonable action quality assessment model.

read more

Citations
More filters
Journal ArticleDOI

Functional movement screen dataset collected with two Azure Kinect depth sensors

TL;DR: In this article , a dataset for vision-based autonomous functional movement screen (FMS) is presented from 45 human subjects of different ages (18-59 years old) executing the following movements: deep squat, hurdle step, in-line lunge, shoulder mobility, active straight raise, trunk stability push-up and rotary stability.
Journal ArticleDOI

Functional movement screen dataset collected with two Azure Kinect depth sensors

TL;DR: In this paper , a dataset for vision-based autonomous functional movement screen (FMS) is presented from 45 human subjects of different ages (18-59 years old) executing the following movements: deep squat, hurdle step, in-line lunge, shoulder mobility, active straight raise, trunk stability push-up and rotary stability.
Journal ArticleDOI

Skeleton-based deep pose feature learning for action quality assessment on figure skating videos

TL;DR: Wang et al. as mentioned in this paper proposed a skeleton-based deep pose feature learning method to automatically evaluate the complicated activities in long-duration sports videos, such as figure skating and artistic gymnastic.
Book ChapterDOI

Pairwise Contrastive Learning Network for Action Quality Assessment

TL;DR: Wang et al. as mentioned in this paper proposed a pairwise contrastive learning network (PCLN) to address the subtle and critical difference between videos and form an end-to-end AQA model with basic regression network.
Journal ArticleDOI

Gaussian guided frame sequence encoder network for action quality assessment

TL;DR: Wang et al. as mentioned in this paper proposed a Gaussian guided frame sequence encoder network for action quality assessment (AQA), where the image feature of each video frame is extracted by ResNet model.
References
More filters
Journal ArticleDOI

Representation Learning: A Review and New Perspectives

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Proceedings ArticleDOI

Learning Spatiotemporal Features with 3D Convolutional Networks

TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
Proceedings ArticleDOI

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

TL;DR: In this article, a Two-Stream Inflated 3D ConvNet (I3D) is proposed to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and their parameters.
Posted Content

Two-Stream Convolutional Networks for Action Recognition in Videos

TL;DR: Simonyan et al. as discussed by the authors proposed a two-stream ConvNet architecture which incorporates spatial and temporal networks, and demonstrated that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Posted Content

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

TL;DR: Temporal Segment Network (TSN) as discussed by the authors is based on the idea of long-range temporal structure modeling and combines a sparse temporal sampling strategy and video-level supervision to enable efficient and effective learning using the whole action video.
Related Papers (5)