scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Comparison of Four Approaches to Evaluate the Sit-to-Stand Movement

TL;DR: The aim of this study was to develop two novel methods of evaluating performance in the STS using a low-cost RGB camera and another an instrumented chair containing load cells in the seat of the chair to detect center of pressure movements and ground reaction forces.
Abstract: The sit-to-stand test (STS) is a simple test of function in older people that can identify people at risk of falls. The aim of this study was to develop two novel methods of evaluating performance in the STS using a low-cost RGB camera and another an instrumented chair containing load cells in the seat of the chair to detect center of pressure movements and ground reaction forces. The two systems were compared to a Kinect and a force plate. Twenty-one younger subjects were tested when performing two 5STS movements at self-selected slow and normal speeds while 16 older fallers were tested when performing one 5STS at a self-selected pace. All methods had acceptable limits of agreement with an expert for total STS time for younger subjects and older fallers, with smaller errors observed for the chair (−0.18 ± 0.17 s) and force plate (−0.19 ± 0.79 s) than for the RGB camera (−0.30 ± 0.51 s) and the Kinect (−0.38 ± 0.50 s) for older fallers. The chair had the smallest limits of agreement compared to the expert for both younger and older participants. The new device was also able to estimate movement velocity, which could be used to estimate muscle power during the STS movement. Subsequent studies will test the device against opto-electronic systems, incorporate additional sensors, and then develop predictive equations for measures of physical function.

Summary (2 min read)

Introduction

  • In older people that can identify people at risk of falls.
  • Other studies have used visual sensors to evaluate the STS movement.
  • It fails in a cluttered environment when silhouette extraction becomes difficult.
  • A wooden chair with a 47cm seat height was instrumented with four load cells, which were positioned in a cross with a distance of 31 cm between each adjacent pair of load cells.

B. Single Camera-based Posture Analysis

  • Cameras are readily available in the form of android devices or installed surveillance cameras.
  • One way of accomplishing this is by background subtraction and extraction of the human silhouette.
  • Poses estimated using this library are accurate at assessing human movement [25].
  • The Stacked Hourglass Network method defines local features such as the wrist, ankle, elbow and the orientation and arrangement of these features with respect to each other.
  • Calibration of the camera was performed using the chair as a reference, with the back of the chair measuring 0.5m.

C. STS Parameter Calculation

  • The total time taken for each 5STS was estimated for each of the four recording systems.
  • An example of head position signals during the 5STS for the RGB and Kinect systems is shown in Fig. 4(a-b).
  • Force data were also low pass filtered with a 4th order Butterworth filter with a 2Hz cut-off frequency.
  • Accordingly, for the chair sit-to-stand phase, when vertical force decreased below 90% of peak force, subjects were considered to have started to stand up, while a subject was considered to be standing when their force decreased below 10% of peak.
  • STS velocity was calculated for the two camera-based systems using the method proposed by Ejupi et al. [15] for the period between the end of the sitting phase and the standing phase of each STS movement.

D. Comparison of STS Parameters

  • The performance of the four systems was compared using data collected from a sample of 21 healthy younger subjects and a sample of 16 older fallers.
  • The ethics committee of the Asian Centre for Medical Education, Research & Innovation approved the study (ACMERI/18/001), with all subjects giving informed consent.
  • Comparative performances of the four methods of obtaining STS time and STS velocity were undertaken using correlation analysis and limits of agreement, using Bland-Altman plots [26].
  • All data processing was performed using custom-built software developed using LabVIEW (Version 2018, National Instruments Corporation, Austin, Texas, USA).

A. Total STS Time

  • The performances of the four systems for young subjects for 5STS time against the expert time of 11.7 ± 2.1 s are shown in Table 1.
  • The performance for 5STS time for the older fallers compared to the expert time of 18.0 ± 3.4 s is shown in Table 2.
  • Bland Altman plots of the limits of agreement for the four methods for both groups of subjects combined when compared to the expert values are shown in Fig.
  • When the ranking of each system was compared for the four measures of performance used against the expert, the chair had the best performance.
  • When the younger subjects were considered, the chair was the best for both LOA measures, along with twosecond rankings for the correlation and error measures.

B. STS Velocity

  • Comparisons for STS velocity are shown in Table 3 for younger participants and in Table 4 for the older fallers.
  • When the younger and older faller results were compared, greater discrepancies for a given system were observed for the two camera-based systems than for the two force-based systems, with lower correlations and higher mean differences, especially for the fallers.
  • A comparison of the STS velocity measures from the four devices was made with gait velocity for the group of older fallers.
  • Both new methods had an excellent agreement with an expert estimation of STS in terms of the number of data points that fell within 2SD of the mean difference.
  • The camera method underestimated the total STS time compared to the expert by around one second.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

This manuscript version is made available under the CC-BY-NC-ND 4.0 license
http://creativecommons.org/licenses/by-nc-nd/4.0/
Shukla, B. K., et al. (2020). "A Comparison of Four Approaches to Evaluate the Sit-To-Stand
Movement." IEEE Transactions on Neural Systems and Rehabilitation Engineering. Vol 28: in
press.
The final version of this article is available on the publisher’s website:
https://ieeexplore.ieee.org/document/9070208
The DOI of the article is: https://doi.org/10.1109/TNSRE.2020.2987357

TNSRE-2019-00352
1
AbstractThe sit-to-stand test (STS) is a simple test of function
in older people that can identify people at risk of falls. The aim of
this study was to develop two novel methods of evaluating
performance in the STS using a low-cost RGB camera and another
an instrumented chair containing load cells in the seat of the chair
to detect center of pressure movements and ground reaction
forces. The two systems were compared to a Kinect and a force
plate. Twenty-one younger subjects were tested when performing
two 5STS movements at self-selected slow and normal speeds while
16 older fallers were tested when performing one 5STS at a self-
selected pace. All methods had acceptable limits of agreement with
an expert for total STS time for younger subjects and older fallers,
with smaller errors observed for the chair (-0.18 ± 0.17 s) and force
plate (-0.19 ± 0.79 s) than for the RGB camera (-0.30 ± 0.51 s) and
the Kinect (-0.38 ± 0.50 s) for older fallers. The chair had the
smallest limits of agreement compared to the expert for both
younger and older participants. The new device was also able to
estimate movement velocity, which could be used to estimate
muscle power during the STS movement. Subsequent studies will
test the device against opto-electronic systems, incorporate
additional sensors, and then develop predictive equations for
measures of physical function.
Index TermsBiomedical monitoring, functional screening,
Kinect, RGB camera, sit-to-stand.
I. INTRODUCTION
ALLS are a major concern in older people, with around
30% of people aged over 65 falling each year, with the
prevalence increasing in older age groups [1]. Risk factors for
falls include low strength, poor balance and mobility problems
[2]. People who are at risk of falls need to be identified to
implement targeted fall-reduction programs including balance
and strength training [3]. A simple test of physical function to
identify fallers is the Five-times Sit-to-Stand test (5STS) [4].
The 5STS test was shown to outperform both the Timed-Up-
and-Go (TUG) and single-leg stance tests in differentiating
between low, moderate and high risk of falls [5]. The
importance of the STS test has been highlighted in many works
in the past that have used it to screen for older adults with fall
risk [6, 7]. There are two main variations of the test in which
the person either performs five STS as quickly as possible [8]
Submitted for review on the 8
th
November 2019.
B. K. Shukla is with the Indian Institute of Technology Jodhpur, Karwar
342037, India (e-mail: shukla.1@iitj.ac.in).
H. Jain is with the Indian Institute of Technology Jodhpur, Karwar 342037,
India (e-mail: jain.4@iitj.ac.in).
V. Vijay is with the Indian Institute of Technology Jodhpur, Karwar 342037,
India (e-mail: vivek@iitj.ac.in).
or the person performs as many STS as possible within 30
seconds [9].
Performance in the STS is typically measured using a
stopwatch to record the time taken for the task or the number of
repetitions performed. However, instrumented versions of both
tests have been developed to improve the accuracy of
measurement and also to extract additional information about
the STS performance. Such tests have used a range of
techniques including body-worn accelerometers [10, 11],
pressure sensors [12], and visual sensors, often using multiple
cameras [13, 14]. In addition to the possibility of automatic
detection of STS time, in one study parameters extracted using
a Kinect were more closely related to the strength of the
participants than was the overall STS time [15]. Such a finding
indicates that extracting data on the way on how the STS is
performed, rather than simply the time to perform the 5STS,
could be beneficial.
Previous techniques to evaluate the STS have included the
use of wearable and visual sensors. For instance, a triaxial
accelerometer mounted on the waist was used to classify
different activities like running, walking, or postures such as
sitting and lying, as well as transitional activities such as the
STS and falling [10]. Accelerometers have also been used to
distinguish between normal subjects and people with
Parkinson’s disease with respect to their STS performance as
part of the TUG test [11]. Although sensor-based tests can be
effective, the user is required to wear the sensors when the test
is being performed, which can be inconvenient. The preferred
locations of wearable sensors have been reported as the wrist,
on glasses, or the arm [16]. In such cases, sensors are not good
at detecting the movement of the entire body, such as that
performed in the STS [17].
Other studies have used visual sensors to evaluate the STS
movement. For instance, Allin et al. [14] used three cameras to
extract 3-D features like the distance between the feet and head,
to construct body centroids. Ellipsoid tracking was then used,
along with the Weka Machine Toolkit, to classify postures
based on the position of the head, feet and torso [18], with an
excellent correlation observed between the Berg Balance Score
and the rise time of the STS. However, this process necessitated
S.K. Yadav is with the Indian Institute of Technology Jodhpur, Karwar
342037, India (e-mail: sy@iitj.ac.in).
A. Mathur is with the Asian Centre for Medical Education, Research &
Innovation, Jodhpur 342003, India (email: mathurarvindju@gmail.com).
D.J. Hewson is with the Institute for Health Research University of
Bedfordshire, Luton LU1 3JU, UK (e-mail: david.hewson@beds.ac.uk).
A Comparison of Four Approaches to Evaluate
the Sit-To-Stand Movement
Brajesh K. Shukla, Hiteshi Jain, Vivek Vijay, Sandeep K Yadav, Arvind Mathur, and David J Hewson
F

TNSRE-2019-00352
2
manual labeling of individual body parts for one image of each
subject to enable color information to be learned for each person
tested. Moreover, three carefully positioned cameras were
required to measure the STS time, making such a system
difficult to use outside of a laboratory setting. In another study,
pose-based descriptors from volumetric image data were used
to identify the STS movement [19].
Activities, including the STS, were then identified and
classified using the nearest neighbor method. More recently, 3-
D modeling of a human body in voxel space has been used to
estimate STS time [13]. This study used an ellipse-fitting
algorithm that obtained features from the image to determine
body orientation. The best segmentation accuracies for this
method used the ellipse fit and voxel height. This framework
was suggested as being suitable for real-time video monitoring
of community-dwelling older people to detect fallers, with two
cameras required to calculate human voxels. Furthermore, the
accuracies of background subtraction are highly dependent on
the type of background. A cluttered background leads to false
silhouette extractions and thus a non-robust solution [20].
In response to the difficulties outlined above, the solutions
developed in this paper are two-fold: 1) We propose the design
of a novel device in which four force sensors are built into a
chair to measure individual STS cycles, which removes the
requirement for participants to wear body sensors throughout
the experiment. 2) We propose a low-cost video framework to
measure STS time using only a single inexpensive RGB
camera. The human skeleton from the frames captured with the
RGB camera is extracted using a deep learning network, with
frame sequences then segmented into STS cycles using the
change in the location of the head.
In this paper, we analyze the performances of these two novel
approaches to evaluate the STS and compare them to two
previously used instrumented systems to evaluate the STS, the
Kinect, and a force plate. Our framework provides a number of
advantages, such as the use of a single low-cost RGB camera
that can be easily extended to android phones [15, 21, 22] and
a method that does not involve background subtraction to
extract the human silhouette. Although such a method has been
used previously with an RGB-based camera setup [13], it fails
in a cluttered environment when silhouette extraction becomes
difficult. In contrast, the new method uses a deep pose library
to extract body position. The use of visual sensors allows
monitoring of both the time taken to perform the STS and the
way it is performed, which is not possible in sensor-based
approaches alone. Finally, while both STS performance and
STS time can be analyzed using an RGB camera, the
instrumented chair provides additional information related to
the movement of the center of pressure, which could provide
useful information about the STS movement.
Our goal in this study is to design a framework to evaluate
the STS in an unstructured setting, without requiring human
intervention. In the next section we explain the chair design and
the pose estimation using the RGB camera. Next, we describe
the methodology used to determine STS time and STS velocity
using both the visual sensors (RGB and Kinect) and the force-
based sensors (chair and force plate). We then present our
experimental results, compare the performance of the methods
for the four systems, and conclude with discussions and future
work.
II. OUR FRAMEWORK
In this section we propose two new methods to estimate STS
time and STS velocity during the STS movement. Firstly, an
instrumented chair is designed using four load cells that
eliminates the need of subjects to wear body sensors while
performing the STS test. Next, we introduce a single RGB
camera-based system to capture the STS movement and
propose a technique to estimate STS time. A detailed
description of both modules follows.
A. Instrumented Chair Design
A wooden chair with a 47cm seat height was instrumented
with four load cells, which were positioned in a cross with a
distance of 31 cm between each adjacent pair of load cells. Each
load cell was rated for 40 kg with a precision of 8 g (CZL 601,
Standard Load Cells, Vadodara, Gujarat, India). The load cells
were fixed to the seat of the chair and covered by an additional
piece of wood. Each pair of load cells on one side of the chair
was connected to a 24-bit analogue to digital converter (ADC)
(HX711 Avia Semiconductors, Xiamen, China), with each
ADC placed on a bracing strut on the side of the chair in which
it was located. The two ADC receiving signals from the left and
right load cells were connected to a microcontroller board
(Arduino Mega 2560, Arduino LLC, Somerville, MA, USA),
with data acquired at 80Hz using a custom-built software
program written in Python (Fig. 1). Instantaneous center of
pressure (CoP) of the forces applied through the chair was
calculated as the barycenter of the four load cells signals.
Anteroposterior (AP) and mediolateral (ML) displacement of
the CoP were also calculated, while the sum of the forces from
the individual load cells were taken to be an estimate of vertical
ground reaction force (Fz).
Fig. 1. Load cell Arduino computer Interface

TNSRE-2019-00352
3
It should be noted that Fz and CoP data can only be obtained
when the person is in contact with the chair during the STS
movement. In addition, data from the force plate is zeroed when
participants are seated prior to the start of any testing.
Calibration of the chair was carried out using a series of known
masses, which were placed at different locations on the seat of
the chair. This was used to verify the CoP and Fz data, with all
values accurate to within the load cell manufacturer’s
specifications of ± 32 g for the mass and ± 1mm for the CoP.
B. Single Camera-based Posture Analysis
Cameras are readily available in the form of android devices
or installed surveillance cameras. These visual sensors can be a
useful resource in health care monitoring. Typically, multiple
cameras are used in order to extract human silhouettes from
video recordings [13, 14]. In the method developed for this
study, only a novel single camera solution is used to calculate
STS time.
Accurate pose estimation is essential to identify people in a
video frame. This requires the location of the body to be
identified in each RGB frame. One way of accomplishing this
is by background subtraction and extraction of the human
silhouette. Although this technique is relatively simple, it gives
false boundaries when the background is cluttered, while the
silhouettes do not define body joints distinctively. In contrast,
the exact location of pixels that correspond to key-points of the
body, also known as joint points, are required for an accurate
clinical test [23].
Pose estimation is a challenge in computer vision research,
with several problems arising for researchers to deal with. Any
pose estimation method needs to deal with clothing, lighting
conditions, background, view angles, and occlusion. With the
advent of deep-learning techniques, many solutions to human
pose estimation have been introduced, such as the recently-
introduced Stacked Hourglass Network method [24]. Poses
estimated using this library are accurate at assessing human
movement [25].
The Stacked Hourglass Network method defines local
features such as the wrist, ankle, elbow and the orientation and
arrangement of these features with respect to each other. In
order to capture the right description of human joints, the
images are analyzed at different scales, with a low-level
resolution for joints and a high-level resolution for orientation.
The Stacked Hourglass Network consists of downscaling and
upscaling layers, which resembles an hourglass that is stacked
multiple times. The result of this deep network model is a set of
K heatmaps that correspond to K joint points. The network is
pre-trained on two datasets FLIC and MPII such that it can
easily predict different orientations of human bodies.
A pose consisting of 15 joint locations was estimated by the
network for each frame of the image, as shown in Fig. 2. The
joint locations used are head, right and left shoulder, right and
left elbow, right and left wrist, pelvis, right and left hip, right
and left knee, and right and left ankle. A sample estimation for
a subject performing the STS is shown in Fig. 3, with the
skeleton on the left and heat maps of joint estimation
probability on the right.
Calibration of the camera was performed using the chair as a
reference, with the back of the chair measuring 0.5m. This was
used to ensure that the pixels within the image that covered the
chair corresponded to 0.5m when the other measurements were
taken. For all recordings, the camera was placed 2.3 m on a line
perpendicular to the front of the chair. The frame of reference
used for the 3D data from Kinect has the IR sensor as the origin,
while the RGB camera, which is in 2D, has the origin at the top
left corner of the image. The frame of reference for both sensors
transformed a frame of reference fixed on the body of the
subject, with nearest hip of the subject taken as the origin in all
directions of movement.
C. STS Parameter Calculation
The total time taken for each 5STS was estimated for each of
the four recording systems. The method used to estimate STS
time for both the RGB and Kinect systems was adapted from
that of Ejupi et al. [15]. This consists of an estimation of the
head position obtained from the camera for the duration of the
recording. Position data were low pass filtered with a 4th order
Butterworth filter with a 2Hz cut-off frequency. The peaks
identified were taken to be the mid-point of the standing
positions while the troughs were taken to be the mid-point of
the sitting positions. If the head position was within 5cm of the
nearest peak the subject was considered to be standing, while a
position within 5cm of the nearest valley was taken to be sitting.
An example of head position signals during the 5STS for the
RGB and Kinect systems is shown in Fig. 4(a-b).
The mean duration of the 5STS was calculated for the force
plate and the chair, as shown in Fig. 4(c-d). Force data were also
low pass filtered with a 4th order Butterworth filter with a 2Hz
cut-off frequency. For the force plate, the start of each sit-to-
Fig. 2. The 15-segment model of a pose used to estimate the STS
Fig. 3. Example of pose estimation during the STS movement

TNSRE-2019-00352
4
stand phase was taken to be 10% of the peak force obtained
during the transition to a standing position, which corresponds
to the same ratio as the 5cm value used for the two camera-
based systems when compared to the mean standing height of
50 cm. A subject was considered to be standing when the force
reached 90% of the peak force for the individual STS. The
standing phase of the STS was considered to have finished
when vertical force decreased below 90% of peak force, with
subjects considered to have returned to a sitting position when
vertical force reached 10% of the previous peak. For the chair,
the opposite method was used since force decreases during the
sit-to-stand but increases for the force plate. Accordingly, for
the chair sit-to-stand phase, when vertical force decreased
below 90% of peak force, subjects were considered to have
started to stand up, while a subject was considered to be
standing when their force decreased below 10% of peak. The
same approach was used for the stand-to-sit, which began when
force reached 10% of peak force, with subjects considered to be
sitting when 90% of peak force was reached.
In addition to total STS time, a worthwhile parameter that
can be obtained from an instrumented STS is sit-to-stand
velocity. STS velocity is better able to distinguish between
fallers and non-fallers, than total STS time [15]. STS velocity
was calculated for the two camera-based systems using the
method proposed by Ejupi et al. [15] for the period between the
end of the sitting phase and the standing phase of each STS
movement. The height change between these two points was
divided by the time taken to obtain STS velocity. For the force
plate and the chair, velocity was derived using Newton’s second
law of motion between the time when force was between 10%
and 90% of maximal force during the sit-to-stand movement.
The force-time curve was divided by mass to produce an
acceleration-time curve, which was then numerically integrated
using the trapezoid rule to produce the velocity-time curve from
which peak STS velocity was obtained. The average of STS
velocity for the five STS movements was used in all subsequent
analyses.
D. Comparison of STS Parameters
The performance of the four systems was compared using
data collected from a sample of 21 healthy younger subjects and
a sample of 16 older fallers. The younger participants
performed two trials, the first of which was at a self-selected
slow speed, while subjects were asked to perform the second
trial as fast as possible. The older fallers performed a single trial
at a self-selected speed. The ethics committee of the Asian
Centre for Medical Education, Research & Innovation
approved the study (ACMERI/18/001), with all subjects giving
informed consent.
Comparative performances of the four methods of obtaining
STS time and STS velocity were undertaken using correlation
analysis and limits of agreement, using Bland-Altman plots
[26]. Overall STS time was compared to a reference time that
was obtained from the analysis of a frame-by-frame record of
each STS from the RGB camera [13]. The expert manually
identified the beginning and end of each STS, with the
beginning taken to be when the subject began to move their
torso forward in the first STS, while the end of the STS was
estimated as the moment when the subject’s torso returned to
vertical after completing the 5th STS movement. These start
and endpoints were chosen based on the four phases of the STS
movement described previously [27]. The use of an expert
assessment of the video as the gold-standard for STS time was
chosen rather than a stopwatch, as previous research has
reported errors due to delays in starting the stopwatch after the
command was given to start being included in the time, while
errors also occur when stopping the timer [13].
All four methods were compared with that of the expert for
total 5STS time using Bland-Altman plots. For STS velocity,
no expert velocity was available, therefore Bland-Altman plots
were not used. All data processing was performed using
custom-built software developed using LabVIEW (Version
2018, National Instruments Corporation, Austin, Texas, USA).
Statistical analysis was performed using SPSS (version 25, IBM
Corporation, Armonk, New York, USA).
Fig. 5. Example recording from the instrumented chair during the 5STS test
0
100
200
300
400
500
0 2 4 6 8 10 12 14 16
Force (N)
Time (sec)
Right Back
Right Front
Left Back
Left Front
0.0
0.2
0.4
0.6
0.8
1.0
Height (m)
0.0
0.2
0.4
0.6
0.8
1.0
Height (m)
0
200
400
600
800
1,000
Force (N)
0
200
400
600
800
1,000
0 2.5 5 7.5 10 12.5
Force (N)
Time (sec)
Peak standing
Valley sitting
Stop stand-to-sit
Stop sit-to-stand
Start sit-to-stand
Start stand-to-sit
(a)
(d)
(b)
(c)

Citations
More filters
Journal ArticleDOI
04 Sep 2020-Sensors
TL;DR: Observations indicate that the IMU embedded in smart glasses is accurate to measure vertical acceleration during STS movements to assess the STS movement in unstandardized settings and to report vertical acceleration values in an elderly population of fallers and non-fallers.
Abstract: Wearable sensors have recently been used to evaluate biomechanical parameters of everyday movements, but few have been located at the head level. This study investigated the relative and absolute reliability (intra- and inter-session) and concurrent validity of an inertial measurement unit (IMU) embedded in smart eyeglasses during sit-to-stand (STS) movements for the measurement of maximal acceleration of the head. Reliability and concurrent validity were investigated in nineteen young and healthy participants by comparing the acceleration values of the glasses’ IMU to an optoelectronic system. Sit-to-stand movements were performed in laboratory conditions using standardized tests. Participants wore the smart glasses and completed two testing sessions with STS movements performed at two speeds (slow and comfortable) under two different conditions (with and without a cervical collar). Both the vertical and anteroposterior acceleration values were collected and analyzed. The use of the cervical collar did not significantly influence the results obtained. The relative reliability intra- and inter-session was good to excellent (i.e., intraclass correlation coefficients were between 0.78 and 0.91) and excellent absolute reliability (i.e., standard error of the measurement lower than 10% of the average test or retest value) was observed for the glasses, especially for the vertical axis. Whatever the testing sessions in all conditions, significant correlations (p < 0.001) were found for the acceleration values recorded either in the vertical axis and in the anteroposterior axis between the glasses and the optoelectronic system. Concurrent validity between the glasses and the optoelectronic system was observed. Our observations indicate that the IMU embedded in smart glasses is accurate to measure vertical acceleration during STS movements. Further studies should investigate the use of these smart glasses to assess the STS movement in unstandardized settings (i.e., clinical and/or home) and to report vertical acceleration values in an elderly population of fallers and non-fallers.

12 citations


Cites background from "A Comparison of Four Approaches to ..."

  • ...Consequently, many studies have attempted to gain insight into the STS movement through biomechanical analyses with various systems such as force plates, combined with or without optoelectronic systems [8,11–15], video analysis [16], goniometry [17,18], and more recently accelerometry [15,19–21]....

    [...]

Journal ArticleDOI
TL;DR: In this article, a learning and fusion network of multiple hidden substages is proposed to assess athletic performance by segmenting videos into five substages by a temporal semantic segmentation, and a fully-connected-network-based hidden regression model is built to predict the score of each substage, fusing these scores into the overall score.
Abstract: Many of the existing methods for action quality assessment implement single-stage score regression networks that lack pertinence and rationality for the evaluation task. In this work, our target is to find a reasonable action quality assessment method for sports competitions that conforms to objective evaluation rules and field experience. To achieve this goal, three assessment scenarios, i.e., the overall-score-guided scenario, execution-score-guided scenario, and difficulty-level-based overall-score-guided scenario, are defined. A learning and fusion network of multiple hidden substages is proposed to assess athletic performance by segmenting videos into five substages by a temporal semantic segmentation. The feature of each video segment is extracted from the five feature backbone networks with shared weights, and a fully-connected-network-based hidden regression model is built to predict the score of each substage, fusing these scores into the overall score. We evaluate the proposed method on the UNLV-Diving dataset. The comparison results show that the proposed method based on objective evaluation rules of sports competitions outperforms the regression model directly trained on the overall score. The proposed multiple-substage network is more accurate than the single-stage score regression network and achieves state-of-the-art performance by leveraging objective evaluation rules and field experience that are beneficial for building an accurate and reasonable action quality assessment model.

7 citations

Proceedings Article
01 Jun 2022
TL;DR: In this paper , a low-complex decision tree algorithm was used to detect sit-to-stand and stand-to sit postural transitions in addition to other human physical activities such as walk and no-walk.
Abstract: Monitoring and analyzing basic human daily life activities will help in enhancing the quality of life for both healthy and physically handicapped people. The recognition of sit-to-stand and stand-to-sit transitions in activities of daily living (ADL) is complex task. This is due to the intricate body’s movements during such postural transition. This work proposes a novel method for detecting sit-to-stand and stand-to-sit postural transitions in addition to other human physical activities such as walk and no-walk. In contrast to previous methods for such transitions determination, our solution does not require complex time- or frequency-domain based algorithms. Our solution relies on fusing motion data collected from an inertial measurement unit device with light data generated from visible light sensing technology utilizing an RGB photodiode. By utilizing a low-complex decision tree algorithm, the activity can be precisely recognized in resource-efficient way. The applicability of our approach was tested through two scenarios representing various ADL in smart environment.

2 citations

Journal ArticleDOI
13 Jan 2021-Sensors
TL;DR: In this article, a joint moment estimation system for asymmetric sit-to-stands is proposed based on a kinematic model that estimates segment angles using a single inertial sensor attached to the shank and a force plate.
Abstract: To provide effective diagnosis and rehabilitation, the evaluation of joint moments during sit-to-stand is essential. The conventional systems for the evaluation, which use motion capture cameras, are quite accurate. However, the systems are not widely used in clinics due to their high cost, inconvenience, and the fact they require lots of space. To solve these problems, some studies have attempted to use inertial sensors only, but they were still inconvenient and inaccurate with asymmetric weight-bearing. We propose a novel joint moment estimation system that can evaluate both symmetric and asymmetric sit-to-stands. To make a simplified system, the proposal is based on a kinematic model that estimates segment angles using a single inertial sensor attached to the shank and a force plate. The system was evaluated with 16 healthy people through symmetric and asymmetric weight-bearing sit-to-stand. The results showed that the proposed system (1) has good accuracy in estimating joint moments (root mean square error 0.99) and (2) is clinically relevant due to its simplicity and applicability of asymmetric sit-to-stand.

2 citations

References
More filters
Proceedings ArticleDOI
01 May 2019
TL;DR: It is found that the accuracy of step recognition is improved by adding wearable sensing data to video data shot from two different angles, indicating a hybrid recognition method utilizing the merits of both video and wearable sensor.
Abstract: In this paper, we propose a hybrid activity recognition method for ballroom dance exercise using video and wearable sensor. The purpose of our research is to design a mechanism to support ballroom dance exercise, and this paper reports the first part to design a mechanism is to support ballroom dance exercise, and this paper reports the first outcome to achieve the purpose - recognizing ballroom dance exercise. There are two conceivable ways to recognize dance exercise: videos and wearable sensor. However, each of them has its disadvantages. Using video is a good way to recognize the movement of the body. However, it cannot provide us accurate timing or strength of foot actions because the number of their flames per seconds is too small to recognize the fast movements of dancers. On the other hand, while a wearable sensor is good at recognizing foot timing and strength, it is not good at recognizing the movement of the whole body. Therefore we propose a hybrid recognition method utilizing the merits of both video and wearable sensor. This paper focuses to recognize four different types of steps in Latin American, a kind of ballroom dance. For each step, we record wearable sensing data and videos. As a result, it is found that the accuracy of step recognition is improved by adding wearable sensing data to video data shot from two different angles.

10 citations

Proceedings ArticleDOI
18 Dec 2016
TL;DR: An algorithm is proposed that assesses how well a person practices Sun Salutation in terms of grace and consistency and introduces a dataset for Sun Saluting videos comprising 30 sequences of perfect Sun Salutations performed by seven experts to train the system.
Abstract: There are many exercises which are repetitive in nature and are required to be done with perfection to derive maximum benefits. Sun Salutation or Surya Namaskar is one of the oldest yoga practice known. It is a sequence of ten actions or 'asanas' where the actions are synchronized with breathing and each action and its transition should be performed with minimal jerks. Essentially, it is important that this yoga practice be performed with Grace and Consistency. In this context, Grace is the ability of a person to perform an exercise with smoothness i.e. without sudden movements or jerks during the posture transition and Consistency measures the repeatability of an exercise in every cycle. We propose an algorithm that assesses how well a person practices Sun Salutation in terms of grace and consistency. Our approach works by training individual HMMs for each asana using STIP features[11] followed by automatic segmentation and labeling of the entire Sun Salutation sequence using a concatenated-HMM. The metric of grace and consistency are then laid down in terms of posture transition times. The assessments made by our system are compared with the assessments of the yoga trainer to derive the accuracy of the system. We introduce a dataset for Sun Salutation videos comprising 30 sequences of perfect Sun Salutation performed by seven experts and used this dataset to train our system. While Sun Salutation can be judged on multiple parameters, we focus mainly on judging Grace and Consistency.

9 citations


"A Comparison of Four Approaches to ..." refers methods in this paper

  • ...Our framework provides a number of advantages, such as the use of a single low-cost RGB camera that can be easily extended to android phones [15], [21], [22] and a method that does not involve background subtraction to extract the human silhouette....

    [...]

Book ChapterDOI
16 Dec 2017
TL;DR: An exemplar based Approximate String Matching (ASM) technique is proposed for detecting such anomalous and missing segments in action sequences and shows promising alignment and missed/anomalous notification results over this dataset.
Abstract: We forget action steps and perform some unwanted action movements as amateur performers during our daily exercise routine, dance performances, etc. To improve our proficiency, it is important that we get a feedback on our performances in terms of where we went wrong. In this paper, we propose a framework for analyzing and issuing reports of action segments that were missed or anomalously performed. This involves comparing the performed sequence with the standard action sequence and notifying when misalignments occur. We propose an exemplar based Approximate String Matching (ASM) technique for detecting such anomalous and missing segments in action sequences. We compare the results with those obtained from the conventional Dynamic Time Warping (DTW) algorithm for sequence alignment. It is seen that the alignment of the action sequences under conventional DTW fails in the presence of missed action segments and anomalous segments due to its boundary condition constraints. The performance of the two techniques has been tested on a complex aperiodic human action dataset with Warm up exercise sequences that we developed from correct and incorrect executions by multiple people. The proposed ASM technique shows promising alignment and missed/anomalous notification results over this dataset.

7 citations


"A Comparison of Four Approaches to ..." refers methods in this paper

  • ...Poses estimated using this library are accurate at assessing human movement [25]....

    [...]

Frequently Asked Questions (11)
Q1. What are the contributions in this paper?

The aim of this study was to develop two novel methods of evaluating performance in the STS using a low-cost RGB camera and another an instrumented chair containing load cells in the seat of the chair to detect center of pressure movements and ground reaction forces. 

With the advent of deep-learning techniques, many solutions to human pose estimation have been introduced, such as the recentlyintroduced Stacked Hourglass Network method [24]. 

The use of an expert assessment of the video as the gold-standard for STS time was chosen rather than a stopwatch, as previous research has reported errors due to delays in starting the stopwatch after the command was given to start being included in the time, while errors also occur when stopping the timer [13]. 

It would also be possible to estimate the power produced during the STS using the method proposed by Lindemann et al., in which the difference between seated height and standing height is combined with the rate of force development to estimate power [32]. 

STS velocity was calculated for the two camera-based systems using the method proposed by Ejupi et al. [15] for the period between the end of the sitting phase and the standing phase of each STS movement. 

For the force plate, the start of each sit-to-TNSRE-2019-003524stand phase was taken to be 10% of the peak force obtained during the transition to a standing position, which corresponds to the same ratio as the 5cm value used for the two camerabased systems when compared to the mean standing height of 50 cm. 

The error of the chair method was less than 10% of the minimal detectable change for the 5STS, which has been reported to be 2.5 seconds [29]. 

Power during the STS is a strong predictor of overall muscle power and even cross-sectional area of the quadriceps [33, 34], which means the instrumented chair might be able to estimate muscle mass. 

In order to capture the right description of human joints, the images are analyzed at different scales, with a low-level resolution for joints and a high-level resolution for orientation. 

The highest correlation with gait velocity was obtained for chair STS velocity (r=0.76), followed by the force plate (r=0.49), RGB camera (r=0.12), and the Kinect (r=0.07). 

Although the observed relationship between STS velocity and gait velocity was encouraging, it would have been useful to have measures of leg strength for the older subjects rather than using gait velocity as a proxy measure.