What is the method used to estimate the power produced during the STS?

It would also be possible to estimate the power produced during the STS using the method proposed by Lindemann et al., in which the difference between seated height and standing height is combined with the rate of force development to estimate power [32].

How long did the 5STS take to detect the error?

The error of the chair method was less than 10% of the minimal detectable change for the 5STS, which has been reported to be 2.5 seconds [29].

What is the way to estimate muscle mass?

Power during the STS is a strong predictor of overall muscle power and even cross-sectional area of the quadriceps [33, 34], which means the instrumented chair might be able to estimate muscle mass.

What is the highest correlation with gait velocity?

The highest correlation with gait velocity was obtained for chair STS velocity (r=0.76), followed by the force plate (r=0.49), RGB camera (r=0.12), and the Kinect (r=0.07).

What was the way to measure the STS?

Although the observed relationship between STS velocity and gait velocity was encouraging, it would have been useful to have measures of leg strength for the older subjects rather than using gait velocity as a proxy measure.

(Open Access) A Comparison of Four Approaches to Evaluate the Sit-to-Stand Movement (2020) | Brajesh K. Shukla

Q: What was the mean force of the sit-to-stand phase?

For the force plate, the start of each sit-to-TNSRE-2019-003524stand phase was taken to be 10% of the peak force obtained during the transition to a standing position, which corresponds to the same ratio as the 5cm value used for the two camerabased systems when compared to the mean standing height of 50 cm.

This manuscript version is made available under the CC-BY-NC-ND 4.0 license

http://creativecommons.org/licenses/by-nc-nd/4.0/

Shukla, B. K., et al. (2020). "A Comparison of Four Approaches to Evaluate the Sit-To-Stand

Movement." IEEE Transactions on Neural Systems and Rehabilitation Engineering. Vol 28: in

press.

The final version of this article is available on the publisher’s website:

https://ieeexplore.ieee.org/document/9070208

The DOI of the article is: https://doi.org/10.1109/TNSRE.2020.2987357

TNSRE-2019-00352

Abstract—The sit-to-stand test (STS) is a simple test of function

in older people that can identify people at risk of falls. The aim of

this study was to develop two novel methods of evaluating

performance in the STS using a low-cost RGB camera and another

an instrumented chair containing load cells in the seat of the chair

to detect center of pressure movements and ground reaction

forces. The two systems were compared to a Kinect and a force

plate. Twenty-one younger subjects were tested when performing

two 5STS movements at self-selected slow and normal speeds while

16 older fallers were tested when performing one 5STS at a self-

selected pace. All methods had acceptable limits of agreement with

an expert for total STS time for younger subjects and older fallers,

with smaller errors observed for the chair (-0.18 ± 0.17 s) and force

plate (-0.19 ± 0.79 s) than for the RGB camera (-0.30 ± 0.51 s) and

the Kinect (-0.38 ± 0.50 s) for older fallers. The chair had the

smallest limits of agreement compared to the expert for both

younger and older participants. The new device was also able to

estimate movement velocity, which could be used to estimate

muscle power during the STS movement. Subsequent studies will

test the device against opto-electronic systems, incorporate

additional sensors, and then develop predictive equations for

measures of physical function.

Index Terms—Biomedical monitoring, functional screening,

Kinect, RGB camera, sit-to-stand.

I. INTRODUCTION

ALLS are a major concern in older people, with around

30% of people aged over 65 falling each year, with the

prevalence increasing in older age groups [1]. Risk factors for

falls include low strength, poor balance and mobility problems

[2]. People who are at risk of falls need to be identified to

implement targeted fall-reduction programs including balance

and strength training [3]. A simple test of physical function to

identify fallers is the Five-times Sit-to-Stand test (5STS) [4].

The 5STS test was shown to outperform both the Timed-Up-

and-Go (TUG) and single-leg stance tests in differentiating

between low, moderate and high risk of falls [5]. The

importance of the STS test has been highlighted in many works

in the past that have used it to screen for older adults with fall

risk [6, 7]. There are two main variations of the test in which

the person either performs five STS as quickly as possible [8]

Submitted for review on the 8

November 2019.

B. K. Shukla is with the Indian Institute of Technology Jodhpur, Karwar

342037, India (e-mail: shukla.1@iitj.ac.in).

H. Jain is with the Indian Institute of Technology Jodhpur, Karwar 342037,

India (e-mail: jain.4@iitj.ac.in).

V. Vijay is with the Indian Institute of Technology Jodhpur, Karwar 342037,

India (e-mail: vivek@iitj.ac.in).

or the person performs as many STS as possible within 30

seconds [9].

Performance in the STS is typically measured using a

stopwatch to record the time taken for the task or the number of

repetitions performed. However, instrumented versions of both

tests have been developed to improve the accuracy of

measurement and also to extract additional information about

the STS performance. Such tests have used a range of

techniques including body-worn accelerometers [10, 11],

pressure sensors [12], and visual sensors, often using multiple

cameras [13, 14]. In addition to the possibility of automatic

detection of STS time, in one study parameters extracted using

a Kinect were more closely related to the strength of the

participants than was the overall STS time [15]. Such a finding

indicates that extracting data on the way on how the STS is

performed, rather than simply the time to perform the 5STS,

could be beneficial.

Previous techniques to evaluate the STS have included the

use of wearable and visual sensors. For instance, a triaxial

accelerometer mounted on the waist was used to classify

different activities like running, walking, or postures such as

sitting and lying, as well as transitional activities such as the

STS and falling [10]. Accelerometers have also been used to

distinguish between normal subjects and people with

Parkinson’s disease with respect to their STS performance as

part of the TUG test [11]. Although sensor-based tests can be

effective, the user is required to wear the sensors when the test

is being performed, which can be inconvenient. The preferred

locations of wearable sensors have been reported as the wrist,

on glasses, or the arm [16]. In such cases, sensors are not good

at detecting the movement of the entire body, such as that

performed in the STS [17].

Other studies have used visual sensors to evaluate the STS

movement. For instance, Allin et al. [14] used three cameras to

extract 3-D features like the distance between the feet and head,

to construct body centroids. Ellipsoid tracking was then used,

along with the Weka Machine Toolkit, to classify postures

based on the position of the head, feet and torso [18], with an

excellent correlation observed between the Berg Balance Score

and the rise time of the STS. However, this process necessitated

S.K. Yadav is with the Indian Institute of Technology Jodhpur, Karwar

342037, India (e-mail: sy@iitj.ac.in).

A. Mathur is with the Asian Centre for Medical Education, Research &

Innovation, Jodhpur 342003, India (email: mathurarvindju@gmail.com).

D.J. Hewson is with the Institute for Health Research University of

Bedfordshire, Luton LU1 3JU, UK (e-mail: david.hewson@beds.ac.uk).

A Comparison of Four Approaches to Evaluate

the Sit-To-Stand Movement

Brajesh K. Shukla, Hiteshi Jain, Vivek Vijay, Sandeep K Yadav, Arvind Mathur, and David J Hewson

TNSRE-2019-00352

manual labeling of individual body parts for one image of each

subject to enable color information to be learned for each person

tested. Moreover, three carefully positioned cameras were

required to measure the STS time, making such a system

difficult to use outside of a laboratory setting. In another study,

pose-based descriptors from volumetric image data were used

to identify the STS movement [19].

Activities, including the STS, were then identified and

classified using the nearest neighbor method. More recently, 3-

D modeling of a human body in voxel space has been used to

estimate STS time [13]. This study used an ellipse-fitting

algorithm that obtained features from the image to determine

body orientation. The best segmentation accuracies for this

method used the ellipse fit and voxel height. This framework

was suggested as being suitable for real-time video monitoring

of community-dwelling older people to detect fallers, with two

cameras required to calculate human voxels. Furthermore, the

accuracies of background subtraction are highly dependent on

the type of background. A cluttered background leads to false

silhouette extractions and thus a non-robust solution [20].

In response to the difficulties outlined above, the solutions

developed in this paper are two-fold: 1) We propose the design

of a novel device in which four force sensors are built into a

chair to measure individual STS cycles, which removes the

requirement for participants to wear body sensors throughout

the experiment. 2) We propose a low-cost video framework to

measure STS time using only a single inexpensive RGB

camera. The human skeleton from the frames captured with the

RGB camera is extracted using a deep learning network, with

frame sequences then segmented into STS cycles using the

change in the location of the head.

In this paper, we analyze the performances of these two novel

approaches to evaluate the STS and compare them to two

previously used instrumented systems to evaluate the STS, the

Kinect, and a force plate. Our framework provides a number of

advantages, such as the use of a single low-cost RGB camera

that can be easily extended to android phones [15, 21, 22] and

a method that does not involve background subtraction to

extract the human silhouette. Although such a method has been

used previously with an RGB-based camera setup [13], it fails

in a cluttered environment when silhouette extraction becomes

difficult. In contrast, the new method uses a deep pose library

to extract body position. The use of visual sensors allows

monitoring of both the time taken to perform the STS and the

way it is performed, which is not possible in sensor-based

approaches alone. Finally, while both STS performance and

STS time can be analyzed using an RGB camera, the

instrumented chair provides additional information related to

the movement of the center of pressure, which could provide

useful information about the STS movement.

Our goal in this study is to design a framework to evaluate

the STS in an unstructured setting, without requiring human

intervention. In the next section we explain the chair design and

the pose estimation using the RGB camera. Next, we describe

the methodology used to determine STS time and STS velocity

using both the visual sensors (RGB and Kinect) and the force-

based sensors (chair and force plate). We then present our

experimental results, compare the performance of the methods

for the four systems, and conclude with discussions and future

work.

II. OUR FRAMEWORK

In this section we propose two new methods to estimate STS

time and STS velocity during the STS movement. Firstly, an

instrumented chair is designed using four load cells that

eliminates the need of subjects to wear body sensors while

performing the STS test. Next, we introduce a single RGB

camera-based system to capture the STS movement and

propose a technique to estimate STS time. A detailed

description of both modules follows.

A. Instrumented Chair Design

A wooden chair with a 47cm seat height was instrumented

with four load cells, which were positioned in a cross with a

distance of 31 cm between each adjacent pair of load cells. Each

load cell was rated for 40 kg with a precision of 8 g (CZL 601,

Standard Load Cells, Vadodara, Gujarat, India). The load cells

were fixed to the seat of the chair and covered by an additional

piece of wood. Each pair of load cells on one side of the chair

was connected to a 24-bit analogue to digital converter (ADC)

(HX711 Avia Semiconductors, Xiamen, China), with each

ADC placed on a bracing strut on the side of the chair in which

it was located. The two ADC receiving signals from the left and

right load cells were connected to a microcontroller board

(Arduino Mega 2560, Arduino LLC, Somerville, MA, USA),

with data acquired at 80Hz using a custom-built software

program written in Python (Fig. 1). Instantaneous center of

pressure (CoP) of the forces applied through the chair was

calculated as the barycenter of the four load cells signals.

Anteroposterior (AP) and mediolateral (ML) displacement of

the CoP were also calculated, while the sum of the forces from

the individual load cells were taken to be an estimate of vertical

ground reaction force (Fz).

Fig. 1. Load cell – Arduino – computer Interface

TNSRE-2019-00352

It should be noted that Fz and CoP data can only be obtained

when the person is in contact with the chair during the STS

movement. In addition, data from the force plate is zeroed when

participants are seated prior to the start of any testing.

Calibration of the chair was carried out using a series of known

masses, which were placed at different locations on the seat of

the chair. This was used to verify the CoP and Fz data, with all

values accurate to within the load cell manufacturer’s

specifications of ± 32 g for the mass and ± 1mm for the CoP.

B. Single Camera-based Posture Analysis

Cameras are readily available in the form of android devices

or installed surveillance cameras. These visual sensors can be a

useful resource in health care monitoring. Typically, multiple

cameras are used in order to extract human silhouettes from

video recordings [13, 14]. In the method developed for this

study, only a novel single camera solution is used to calculate

STS time.

Accurate pose estimation is essential to identify people in a

video frame. This requires the location of the body to be

identified in each RGB frame. One way of accomplishing this

is by background subtraction and extraction of the human

silhouette. Although this technique is relatively simple, it gives

false boundaries when the background is cluttered, while the

silhouettes do not define body joints distinctively. In contrast,

the exact location of pixels that correspond to key-points of the

body, also known as joint points, are required for an accurate

clinical test [23].

Pose estimation is a challenge in computer vision research,

with several problems arising for researchers to deal with. Any

pose estimation method needs to deal with clothing, lighting

conditions, background, view angles, and occlusion. With the

advent of deep-learning techniques, many solutions to human

pose estimation have been introduced, such as the recently-

introduced Stacked Hourglass Network method [24]. Poses

estimated using this library are accurate at assessing human

movement [25].

The Stacked Hourglass Network method defines local

features such as the wrist, ankle, elbow and the orientation and

arrangement of these features with respect to each other. In

order to capture the right description of human joints, the

images are analyzed at different scales, with a low-level

resolution for joints and a high-level resolution for orientation.

The Stacked Hourglass Network consists of downscaling and

upscaling layers, which resembles an hourglass that is stacked

multiple times. The result of this deep network model is a set of

K heatmaps that correspond to K joint points. The network is

pre-trained on two datasets FLIC and MPII such that it can

easily predict different orientations of human bodies.

A pose consisting of 15 joint locations was estimated by the

network for each frame of the image, as shown in Fig. 2. The

joint locations used are head, right and left shoulder, right and

left elbow, right and left wrist, pelvis, right and left hip, right

and left knee, and right and left ankle. A sample estimation for

a subject performing the STS is shown in Fig. 3, with the

skeleton on the left and heat maps of joint estimation

probability on the right.

Calibration of the camera was performed using the chair as a

reference, with the back of the chair measuring 0.5m. This was

used to ensure that the pixels within the image that covered the

chair corresponded to 0.5m when the other measurements were

taken. For all recordings, the camera was placed 2.3 m on a line

perpendicular to the front of the chair. The frame of reference

used for the 3D data from Kinect has the IR sensor as the origin,

while the RGB camera, which is in 2D, has the origin at the top

left corner of the image. The frame of reference for both sensors

transformed a frame of reference fixed on the body of the

subject, with nearest hip of the subject taken as the origin in all

directions of movement.

C. STS Parameter Calculation

The total time taken for each 5STS was estimated for each of

the four recording systems. The method used to estimate STS

time for both the RGB and Kinect systems was adapted from

that of Ejupi et al. [15]. This consists of an estimation of the

head position obtained from the camera for the duration of the

recording. Position data were low pass filtered with a 4th order

Butterworth filter with a 2Hz cut-off frequency. The peaks

identified were taken to be the mid-point of the standing

positions while the troughs were taken to be the mid-point of

the sitting positions. If the head position was within 5cm of the

nearest peak the subject was considered to be standing, while a

position within 5cm of the nearest valley was taken to be sitting.

An example of head position signals during the 5STS for the

RGB and Kinect systems is shown in Fig. 4(a-b).

The mean duration of the 5STS was calculated for the force

plate and the chair, as shown in Fig. 4(c-d). Force data were also

low pass filtered with a 4th order Butterworth filter with a 2Hz

cut-off frequency. For the force plate, the start of each sit-to-

Fig. 2. The 15-segment model of a pose used to estimate the STS

Fig. 3. Example of pose estimation during the STS movement

TNSRE-2019-00352

stand phase was taken to be 10% of the peak force obtained

during the transition to a standing position, which corresponds

to the same ratio as the 5cm value used for the two camera-

based systems when compared to the mean standing height of

50 cm. A subject was considered to be standing when the force

reached 90% of the peak force for the individual STS. The

standing phase of the STS was considered to have finished

when vertical force decreased below 90% of peak force, with

subjects considered to have returned to a sitting position when

vertical force reached 10% of the previous peak. For the chair,

the opposite method was used since force decreases during the

sit-to-stand but increases for the force plate. Accordingly, for

the chair sit-to-stand phase, when vertical force decreased

below 90% of peak force, subjects were considered to have

started to stand up, while a subject was considered to be

standing when their force decreased below 10% of peak. The

same approach was used for the stand-to-sit, which began when

force reached 10% of peak force, with subjects considered to be

sitting when 90% of peak force was reached.

In addition to total STS time, a worthwhile parameter that

can be obtained from an instrumented STS is sit-to-stand

velocity. STS velocity is better able to distinguish between

fallers and non-fallers, than total STS time [15]. STS velocity

was calculated for the two camera-based systems using the

method proposed by Ejupi et al. [15] for the period between the

end of the sitting phase and the standing phase of each STS

movement. The height change between these two points was

divided by the time taken to obtain STS velocity. For the force

plate and the chair, velocity was derived using Newton’s second

law of motion between the time when force was between 10%

and 90% of maximal force during the sit-to-stand movement.

The force-time curve was divided by mass to produce an

acceleration-time curve, which was then numerically integrated

using the trapezoid rule to produce the velocity-time curve from

which peak STS velocity was obtained. The average of STS

velocity for the five STS movements was used in all subsequent

analyses.

D. Comparison of STS Parameters

The performance of the four systems was compared using

data collected from a sample of 21 healthy younger subjects and

a sample of 16 older fallers. The younger participants

performed two trials, the first of which was at a self-selected

slow speed, while subjects were asked to perform the second

trial as fast as possible. The older fallers performed a single trial

at a self-selected speed. The ethics committee of the Asian

Centre for Medical Education, Research & Innovation

approved the study (ACMERI/18/001), with all subjects giving

informed consent.

Comparative performances of the four methods of obtaining

STS time and STS velocity were undertaken using correlation

analysis and limits of agreement, using Bland-Altman plots

[26]. Overall STS time was compared to a reference time that

was obtained from the analysis of a frame-by-frame record of

each STS from the RGB camera [13]. The expert manually

identified the beginning and end of each STS, with the

beginning taken to be when the subject began to move their

torso forward in the first STS, while the end of the STS was

estimated as the moment when the subject’s torso returned to

vertical after completing the 5th STS movement. These start

and endpoints were chosen based on the four phases of the STS

movement described previously [27]. The use of an expert

assessment of the video as the gold-standard for STS time was

chosen rather than a stopwatch, as previous research has

reported errors due to delays in starting the stopwatch after the

command was given to start being included in the time, while

errors also occur when stopping the timer [13].

All four methods were compared with that of the expert for

total 5STS time using Bland-Altman plots. For STS velocity,

no expert velocity was available, therefore Bland-Altman plots

were not used. All data processing was performed using

custom-built software developed using LabVIEW (Version

2018, National Instruments Corporation, Austin, Texas, USA).

Statistical analysis was performed using SPSS (version 25, IBM

Corporation, Armonk, New York, USA).

Fig. 5. Example recording from the instrumented chair during the 5STS test

100

200

300

400

500

0 2 4 6 8 10 12 14 16

Force (N)

Time (sec)

Right Back

Right Front

Left Back

Left Front

Fig. 4. Calculation of STS time and STS phases for RGB camera (a), Kinect

(b), force plate (c) and chair (d).

0.0

0.2

0.4

0.6

0.8

1.0

Height (m)

0.0

0.2

0.4

0.6

0.8

1.0

Height (m)

200

400

600

800

1,000

Force (N)

200

400

600

800

1,000

0 2.5 5 7.5 10 12.5

Force (N)

Time (sec)

Peak standing

Valley sitting

Stop stand-to-sit

Stop sit-to-stand

Start sit-to-stand

Start stand-to-sit

(a)

(d)

(b)

(c)

A Comparison of Four Approaches to Evaluate the Sit-to-Stand Movement

Figures

Citations

Data Mining Practical Machine Learning Tools and Techniques

Sit-To-Stand Movement Evaluated Using an Inertial Measurement Unit Embedded in Smart Glasses—A Validation Study

Learning and fusing multiple hidden substages for action quality assessment

Sit-to-stand and Stand-to-sit Activities Recognition by Visible Light Sensing

A Novel Simplified System to Estimate Lower-Limb Joint Moments during Sit-to-Stand.

References

A 30-s chair-stand test as a measure of lower body strength in community-residing older adults

Human motion analysis: a review

Human Motion Analysis

Epidemiology of falls

Clinical Measurement of Sit-to-Stand Performance in People With Balance Disorders: Validity of Data for the Five-Times-Sit-to-Stand Test

Related Papers (5)

Kinect-Based Five-Times-Sit-to-Stand Test for Clinical and In-Home Assessment of Fall Risk in Older People.

A body-fixed-sensor-based analysis of power during sit-to-stand movements.

Assessment of Sit-to-Stand Transfers during Daily Life Using an Accelerometer on the Lower Back.

Accuracy and concurrent validity of a sensor-based analysis of sit-to-stand movements in older adults.

Measurement of the Chair Rise Performance of Older People Based on Force Plates and IMUs

Frequently Asked Questions (11)

Q1. What are the contributions in this paper?

Q2. What is the recent introduction of the Stacked Hourglass Network method?

Q3. Why was the use of a stopwatch chosen as the gold standard for STS time?

Q4. What is the method used to estimate the power produced during the STS?

Q5. What was the mean force of the STS for the two camera-based systems?

Q6. What was the mean force of the sit-to-stand phase?

Q7. How long did the 5STS take to detect the error?

Q8. What is the way to estimate muscle mass?

Q9. What is the way to capture the right description of human joints?

Q10. What is the highest correlation with gait velocity?

Q11. What was the way to measure the STS?