scispace - formally typeset
Search or ask a question
Book ChapterDOI

Detecting Missed and Anomalous Action Segments Using Approximate String Matching Algorithm

16 Dec 2017-pp 101-111
TL;DR: An exemplar based Approximate String Matching (ASM) technique is proposed for detecting such anomalous and missing segments in action sequences and shows promising alignment and missed/anomalous notification results over this dataset.
Abstract: We forget action steps and perform some unwanted action movements as amateur performers during our daily exercise routine, dance performances, etc. To improve our proficiency, it is important that we get a feedback on our performances in terms of where we went wrong. In this paper, we propose a framework for analyzing and issuing reports of action segments that were missed or anomalously performed. This involves comparing the performed sequence with the standard action sequence and notifying when misalignments occur. We propose an exemplar based Approximate String Matching (ASM) technique for detecting such anomalous and missing segments in action sequences. We compare the results with those obtained from the conventional Dynamic Time Warping (DTW) algorithm for sequence alignment. It is seen that the alignment of the action sequences under conventional DTW fails in the presence of missed action segments and anomalous segments due to its boundary condition constraints. The performance of the two techniques has been tested on a complex aperiodic human action dataset with Warm up exercise sequences that we developed from correct and incorrect executions by multiple people. The proposed ASM technique shows promising alignment and missed/anomalous notification results over this dataset.
Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a new action scoring system termed as Reference Guided Regression (RGR), which comprises a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and a Score Estimation Module that uses the resemblance of a video with a reference video to give the assessment score.
Abstract: Automated vision-based score estimation models can be used to provide an alternate opinion to avoid judgment bias. Existing works have learned score estimation models by regressing the video representation to ground truth score provided by judges. However, such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video, which would capture the temporal variations vis-a-vis the reference video and map those variations to the final score. In this work, we propose a new action scoring system termed as Reference Guided Regression (RGR) , which comprises (1) a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and (2) a Score Estimation Module that uses the first module to find the resemblance of a video with a reference video to give the assessment score. The proposed scoring model is tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models.

29 citations


Cites background from "Detecting Missed and Anomalous Acti..."

  • ...Few early works [7], [6], [11], [10], [9] in the domain were hand crafted for specific actions and could not be generalised to different types of actions....

    [...]

  • ...rehabilitation [14], exercise [6], [7] and actions of daily living [27]....

    [...]

Journal ArticleDOI
17 Apr 2020
TL;DR: The aim of this study was to develop two novel methods of evaluating performance in the STS using a low-cost RGB camera and another an instrumented chair containing load cells in the seat of the chair to detect center of pressure movements and ground reaction forces.
Abstract: The sit-to-stand test (STS) is a simple test of function in older people that can identify people at risk of falls. The aim of this study was to develop two novel methods of evaluating performance in the STS using a low-cost RGB camera and another an instrumented chair containing load cells in the seat of the chair to detect center of pressure movements and ground reaction forces. The two systems were compared to a Kinect and a force plate. Twenty-one younger subjects were tested when performing two 5STS movements at self-selected slow and normal speeds while 16 older fallers were tested when performing one 5STS at a self-selected pace. All methods had acceptable limits of agreement with an expert for total STS time for younger subjects and older fallers, with smaller errors observed for the chair (−0.18 ± 0.17 s) and force plate (−0.19 ± 0.79 s) than for the RGB camera (−0.30 ± 0.51 s) and the Kinect (−0.38 ± 0.50 s) for older fallers. The chair had the smallest limits of agreement compared to the expert for both younger and older participants. The new device was also able to estimate movement velocity, which could be used to estimate muscle power during the STS movement. Subsequent studies will test the device against opto-electronic systems, incorporate additional sensors, and then develop predictive equations for measures of physical function.

12 citations


Cites methods from "Detecting Missed and Anomalous Acti..."

  • ...Poses estimated using this library are accurate at assessing human movement [25]....

    [...]

Proceedings ArticleDOI
01 Oct 2018
TL;DR: This work presents a novel community detection-based human action segmentation algorithm that marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks.
Abstract: Temporal segmentation of complex human action videos into action primitives plays a pivotal role in building models for human action understanding Studies in the past have introduced unsupervised frameworks for deriving a known number of motion primitives from action videos Our work focuses towards answering a question: Given a set of videos with humans performing an activity, can the action primitives be derived from them without specifying any prior knowledge about the count for the constituting sub-actions categories? To this end, we present a novel community detection-based human action segmentation algorithm Our work marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks We test our proposed technique over the stitched Weizmann dataset and MHADI01-s motion capture dataset and our technique outperforms the state-of-the-art techniques of complex action segmentation without the count of actions being pre-specified

5 citations


Cites methods from "Detecting Missed and Anomalous Acti..."

  • ...Bag-of-Features approach [1], Template matching based segmentation approach[2] and Hidden Markov Model (HMM) [3][4]; b) Unsupervised approaches that model the video sequences without their ground-truth labels and have an explicit training phase, e....

    [...]

Proceedings ArticleDOI
01 Nov 2019
TL;DR: This work introduces a novel sequence-to-sequence autoencoder-based scoring model which learns the representation from only expert performances and judges an unknown performance based on how well it can be regenerated from the learned model.
Abstract: Developing a model for the task of assessing quality of human action is a key research area in computer vision. The quality assessment task has been posed as a supervised regression problem, where models have been trained to predict score, given action representation features. However, human proficiency levels can widely vary and so do their scores. Providing all such performance variations and their respective scores is an expensive solution as it requires a domain expert to annotate many videos. The question arises - Can we exploit the variations of the performances from that of expert and map the variations to their respective scores? To this end, we introduce a novel sequence-to-sequence autoencoder-based scoring model which learns the representation from only expert performances and judges an unknown performance based on how well it can be regenerated from the learned model. We evaluated our model in predicting scores of a complex Sun- Salutation action sequence, and demonstrate that our model gives remarkable prediction accuracy compared to the baselines.

5 citations


Cites methods from "Detecting Missed and Anomalous Acti..."

  • ...Evaluation Metrics Baseline and Experiment Settings We compare our model with 3 baseline works - 1) Pose vs SVR [1], 2) C3D vs SVR, LSTM+SVR [3] 3) Expert Template Matching Approach [10] For Pose + SVR-based scoring[1], the pose sequences are pre-processed using DCT and DFT operations....

    [...]

  • ...The technique is compared with the state-of-the-art regression-based action scoring techniques[1, 3] and template-based assessment technique[10]....

    [...]

  • ...Following our previous work[10], we use the stacked hourglass networks[11] for human pose estimation....

    [...]

  • ...Evaluation Metrics Baseline and Experiment Settings We compare our model with 3 baseline works - 1) Pose vs SVR [1], 2) C3D vs SVR, LSTM+SVR [3] 3) Expert Template Matching Approach [10] For Pose + SVR-based scoring[1], the pose sequences are pre-processed using DCT and DFT operations....

    [...]

  • ...For the template based approach[10], and our approach, the poses are converted to 7 codebook words considering 7 distinct poses....

    [...]

Posted Content
TL;DR: In this article, the authors proposed a new action scoring system as a two-phase system: (1) a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges; (2) Score Estimation Module that uses the first module to find the resemblance of a video to a reference video in order to give the assessment score.
Abstract: Automated vision-based score estimation models can be used as an alternate opinion to avoid judgment bias. In the past works the score estimation models were learned by regressing the video representations to the ground truth score provided by the judges. However such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video. This would capture the temporal variations w.r.t. the reference video and map those variations to the final score. In this work, we propose a new action scoring system as a two-phase system: (1) A Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges; (2) A Score Estimation Module that uses the first module to find the resemblance of a video to a reference video in order to give the assessment score. The proposed scoring model has been tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: An action tutor system which enables the user to interactively retrieve a learning exemplar of the target action movement and to immediately acquire motion instructions while learning it in front of the Kinect.
Abstract: The difficulty of vision-based posture estimation is greatly decreased with the aid of commercial depth camera, such as Microsoft Kinect. However, there is still much to do to bridge the results of human posture estimation and the understanding of human movements. Human movement assessment is an important technique for exercise learning in the field of healthcare. In this paper, we propose an action tutor system which enables the user to interactively retrieve a learning exemplar of the target action movement and to immediately acquire motion instructions while learning it in front of the Kinect. The proposed system is composed of two stages. In the retrieval stage, nonlinear time warping algorithms are designed to retrieve video segments similar to the query movement roughly performed by the user. In the learning stage, the user learns according to the selected video exemplar, and the motion assessment including both static and dynamic differences is presented to the user in a more effective and organized way, helping him/her to perform the action movement correctly. The experiments are conducted on the videos of ten action types, and the results show that the proposed human action descriptor is representative for action video retrieval and the tutor system can effectively help the user while learning action movements.

68 citations

Book ChapterDOI
01 Apr 2003
TL;DR: A probabilistic model of a team play is developed, which is based on the detection of key events in the team behavior, and has been used to assess the team performance in three different types of basketball offense, based on trajectories of all players, obtained by whole-body tracker.
Abstract: Most approaches to detection and classification of human activity deal with observing individual persons. However, people often tend to organize into groups to achieve certain goals, and human activity is sometimes more readily defined and observed in the context of whole group, where the activity is coordinated among its members. An excellent example of this are team sports, which can provide valuable test ground for development of methods for analysis of coordinated group activity. We used basketball play in this work and developed a probabilistic model of a team play, which is based on the detection of key events in the team behavior. The model is based on expert coach knowledge and has been used to assess the team performance in three different types of basketball offense, based on trajectories of all players, obtained by whole-body tracker. Results show that our high-level behaviour model may be used both for activity recognition and performance evaluation in certain basketball activities.

53 citations

Journal ArticleDOI
Chuan-Jun Su1
21 Jul 2013
TL;DR: A Kinect-based system - KHRD using Dynamic Time Warping (DTW) algorithm and fuzzy logic for ensuring home-based rehabilitation and the outcomes of the evaluation can be used as a reference for the patient to validate his/her exercise and to prevent adverse events.
Abstract: presence of a professional may cause adverse event or lead to secondary injury. In this paper, we describe our development of a Kinect-based system - KHRD using Dynamic Time Warping (DTW) algorithm and fuzzy logic for ensuring home-based rehabilitation. The KHRD allows a patient perform a prescribed exercise with the presence of a professional. The exercise performed will then be recorded as a base for evaluating the patient's rehabilitation exercise at home. The outcomes of the evaluation can be used as a reference for the patient to validate his/her exercise and to prevent adverse events. A summary report of the outcomes may also be uploaded to a cloud setting for physicians to monitor the patient's progress and adjust the prescription. B. Kinect

49 citations

Journal ArticleDOI
TL;DR: A novel framework is presented, using data captured by Kinect-based human skeleton tracking, where the evaluation of user's performance is achieved against a gold-standard performance of a teacher, and a set of quaternionic correlation-based measures (scores) are proposed for evaluating and ranking theperformance of a dancer.
Abstract: In this paper, the problem of automatic dance performance evaluation from human Motion Capture (MoCap) data is addressed. A novel framework is presented, using data captured by Kinect-based human skeleton tracking, where the evaluation of user's performance is achieved against a gold-standard performance of a teacher. The framework addresses several technical challenges, including global and local temporal synchronization, spatial alignment and comparison of two “dance motion signals.” Towards the solution of these technical challenges, a set of appropriate quaternionic vector-signal processing methodologies is proposed, where the 4D (spatiotemporal) human motion data are represented as sequences of pure quaternions. Such a quaternionic representation offers several advantages, including the facts that joint angles and rotations are inherently encoded in the phase of quaternions and the three coordinates variables ( X,Y,Z) are treated jointly, with their intra-correlations being taken into account. Based on the theory of quaternions, a number of advantageous algorithms are formulated. Initially, global temporal synchronization of dance MoCap data is achieved by the use of quaternionic cross-correlations, which are invariant to rigid spatial transformations between the users. Secondly, a quaternions-based algorithm is proposed for the fast spatial alignment of dance MoCap data. Thirdly, the MoCap data can be temporally synchronized in a local fashion, using Dynamic Time Warping techniques adapted to the specific problem. Finally, a set of quaternionic correlation-based measures (scores) are proposed for evaluating and ranking the performance of a dancer. These quaternions-based scores are invariant to rigid transformations, as proved and demonstrated. A total score metric, through a weighted combination of three different metrics is proposed, where the weights are optimized using Particle Swarm Optimization (PSO). The presented experimental results using the Huawei/3DLife/EMC 2 dataset are promising and verify the effectiveness of the proposed methods.

47 citations

Proceedings ArticleDOI
01 Jan 2015
TL;DR: A new approach for quantification of ‘dynamical regularity’ as applied to modeling human actions using approximate entropy-based feature representation to achieve temporal segmentation in untrimmed motion capture data and fine-grained quality assessment of diving actions in videos.
Abstract: In this paper, we propose a new approach for quantification of ‘dynamical regularity’ as applied to modeling human actions. We use approximate entropy-based feature representation to model the dynamics in human movement to achieve temporal segmentation in untrimmed motion capture data and fine-grained quality assessment of diving actions in videos. The principle herein is to quantify regularity (frequency of typical patterns) in the dynamical space computed from trajectories of action data. We extend conventional ideas for modeling dynamics in human movement by introducing multivariate and cross approximate entropy features. Our experimental evaluation on theoretical models and two publicly available databases show that the proposed features can achieve state-ofthe-art results on applications such as temporal segmentation and quality assessment of actions.

42 citations