scispace - formally typeset
Book ChapterDOI

Detecting Missed and Anomalous Action Segments Using Approximate String Matching Algorithm

16 Dec 2017-pp 101-111

...read more


Citations
More filters
Journal ArticleDOI

[...]

17 Apr 2020
TL;DR: The aim of this study was to develop two novel methods of evaluating performance in the STS using a low-cost RGB camera and another an instrumented chair containing load cells in the seat of the chair to detect center of pressure movements and ground reaction forces.
Abstract: The sit-to-stand test (STS) is a simple test of function in older people that can identify people at risk of falls. The aim of this study was to develop two novel methods of evaluating performance in the STS using a low-cost RGB camera and another an instrumented chair containing load cells in the seat of the chair to detect center of pressure movements and ground reaction forces. The two systems were compared to a Kinect and a force plate. Twenty-one younger subjects were tested when performing two 5STS movements at self-selected slow and normal speeds while 16 older fallers were tested when performing one 5STS at a self-selected pace. All methods had acceptable limits of agreement with an expert for total STS time for younger subjects and older fallers, with smaller errors observed for the chair (−0.18 ± 0.17 s) and force plate (−0.19 ± 0.79 s) than for the RGB camera (−0.30 ± 0.51 s) and the Kinect (−0.38 ± 0.50 s) for older fallers. The chair had the smallest limits of agreement compared to the expert for both younger and older participants. The new device was also able to estimate movement velocity, which could be used to estimate muscle power during the STS movement. Subsequent studies will test the device against opto-electronic systems, incorporate additional sensors, and then develop predictive equations for measures of physical function.

6 citations


Cites methods from "Detecting Missed and Anomalous Acti..."

  • [...]

Posted Content

[...]

TL;DR: In this article, the authors proposed a new action scoring system as a two-phase system: (1) a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges; (2) Score Estimation Module that uses the first module to find the resemblance of a video to a reference video in order to give the assessment score.
Abstract: Automated vision-based score estimation models can be used as an alternate opinion to avoid judgment bias. In the past works the score estimation models were learned by regressing the video representations to the ground truth score provided by the judges. However such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video. This would capture the temporal variations w.r.t. the reference video and map those variations to the final score. In this work, we propose a new action scoring system as a two-phase system: (1) A Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges; (2) A Score Estimation Module that uses the first module to find the resemblance of a video to a reference video in order to give the assessment score. The proposed scoring model has been tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models.

4 citations

Proceedings ArticleDOI

[...]

01 Oct 2018
TL;DR: This work presents a novel community detection-based human action segmentation algorithm that marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks.
Abstract: Temporal segmentation of complex human action videos into action primitives plays a pivotal role in building models for human action understanding Studies in the past have introduced unsupervised frameworks for deriving a known number of motion primitives from action videos Our work focuses towards answering a question: Given a set of videos with humans performing an activity, can the action primitives be derived from them without specifying any prior knowledge about the count for the constituting sub-actions categories? To this end, we present a novel community detection-based human action segmentation algorithm Our work marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks We test our proposed technique over the stitched Weizmann dataset and MHADI01-s motion capture dataset and our technique outperforms the state-of-the-art techniques of complex action segmentation without the count of actions being pre-specified

4 citations


Cites methods from "Detecting Missed and Anomalous Acti..."

  • [...]

Proceedings ArticleDOI

[...]

06 Apr 2020
TL;DR: This work introduces a fusion- based technique to combine multiple sensors leveraging advantages of individual sensors, in such a way that the resulting assessment is more accurate.
Abstract: Automated clinical tests that assess quality of geriatric screening tests such as the Five-Times-Sit- To-Stand (5STS) and the Timed-Up-and-Go (TUG) are being designed to assess the decline in functional ability of elderly. The existing techniques to assess the quality of these physical activities include sensor-based techniques including body mounted sensors, force sensors and, vision and imaging sensors. These sensors have their own advantages and disadvantages towards the task of clinical assessment. In this work, we introduce a fusion- based technique to combine multiple sensors leveraging advantages of individual sensors, in such a way that the resulting assessment is more accurate. We evaluate our technique for 5STS test using a fusion of a chair and RGB sensors. In a test of 15 older people, there was no significant difference in performance between the two sensors, obtaining 76% and 73% for the RGB and chair, respectively. However, a significant improvement was obtained for the fusion technique, with 90% accuracy for all the phases of the STS test. The proposed fusion technique was observed to be better than the individual sensor assessment.

2 citations


Cites methods from "Detecting Missed and Anomalous Acti..."

  • [...]

Journal ArticleDOI

[...]

TL;DR: This work proposes a new action scoring system termed as Reference Guided Regression (RGR), which comprises a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and a Score Estimation Module that uses the resemblance of a video with a reference video to give the assessment score.
Abstract: Automated vision-based score estimation models can be used to provide an alternate opinion to avoid judgment bias. Existing works have learned score estimation models by regressing the video representation to ground truth score provided by judges. However, such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video, which would capture the temporal variations vis-a-vis the reference video and map those variations to the final score. In this work, we propose a new action scoring system termed as Reference Guided Regression (RGR) , which comprises (1) a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and (2) a Score Estimation Module that uses the first module to find the resemblance of a video with a reference video to give the assessment score. The proposed scoring model is tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models.

2 citations


Cites background from "Detecting Missed and Anomalous Acti..."

  • [...]

  • [...]


References
More filters
Journal ArticleDOI

[...]

H. Sakoe1, S. Chiba1
TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.
Abstract: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition. First, a general principle of time-normalization is given using time-warping function. Then, two time-normalized distance definitions, called symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimental comparison with various DP-algorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about two-thirds errors, even compared to the best conventional algorithm.

5,478 citations

Book ChapterDOI

[...]

08 Oct 2016
TL;DR: This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.
Abstract: This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.

2,561 citations

Posted Content

[...]

TL;DR: Stacked hourglass networks as mentioned in this paper were proposed for human pose estimation, where features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body, and repeated bottom-up, top-down processing with intermediate supervision is critical to improving the performance of the network.
Abstract: This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a "stacked hourglass" network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.

2,368 citations

Book ChapterDOI

[...]

06 Sep 2014
TL;DR: A learning-based framework that takes steps towards assessing how well people perform actions in videos by training a regression model from spatiotemporal pose features to scores obtained from expert judges and can provide interpretable feedback on how people can improve their action.
Abstract: While recent advances in computer vision have provided reliable methods to recognize actions in both images and videos, the problem of assessing how well people perform actions has been largely unexplored in computer vision. Since methods for assessing action quality have many real-world applications in healthcare, sports, and video retrieval, we believe the computer vision community should begin to tackle this challenging problem. To spur progress, we introduce a learning-based framework that takes steps towards assessing how well people perform actions in videos. Our approach works by training a regression model from spatiotemporal pose features to scores obtained from expert judges. Moreover, our approach can provide interpretable feedback on how people can improve their action. We evaluate our method on a new Olympic sports dataset, and our experiments suggest our framework is able to rank the athletes more accurately than a non-expert human. While promising, our method is still a long way to rivaling the performance of expert judges, indicating that there is significant opportunity in computer vision research to improve on this difficult yet important task.

130 citations

Journal ArticleDOI

[...]

01 Sep 2014
TL;DR: This paper presents the development of a Kinect-based system for ensuring home-based rehabilitation using a Dynamic Time Warping (DTW) algorithm and fuzzy logic to assist patients in conducting safe and effective home- based rehabilitation without the immediate supervision of a physician.
Abstract: Most formal rehabilitation facilities are situated in a hospital or care center setting, which may not always be conveniently accessible for patients, especially those in geographically isolated areas Home-based rehabilitation has potential to offer greater accessibility and thus increase consistent uptake In addition, the exercise performed in conventional rehabilitation contexts may be insufficient to ensure the patient's speedy recovery, with complimentary rehabilitation exercises at home required to make a difference The goal is to provide effective home-based rehabilitation offering outcomes similar to those obtained through hospital-based rehabilitation under the supervision of an occupational therapist This paper presents the development of a Kinect-based system for ensuring home-based rehabilitation using a Dynamic Time Warping (DTW) algorithm and fuzzy logic The ultimate goal is to assist patients in conducting safe and effective home-based rehabilitation without the immediate supervision of a physician

100 citations