How many people were recorded to form their posture dataset?

To form their posture dataset, 3200 postures (including 800 stands, 800 sits, 800 lies and 800 bends) from 15 people were recorded.

How many images are used in the training dataset?

Assuming the training dataset The authorcontains a number of N images: The author= {imag1, ..., imagN}, then, for a single pixel (x,y), it has N training samples imag(x, y)1, ..., imag(x, y)N .

What are the advantages of the proposed fall detection system?

(2) If the training dataset is large enough, the well-trained classifier can effectively distinguish different types of postures, which are used for fall detection.

What are the features of the fitted ellipse and projection histogram?

From the extracted foreground silhouette, the authors extract features from the fitted ellipse and projection histogram, which are used for classification purposes.

What are the problems that can be addressed by using a multiple cameras scheme?

as the authors have discussed, multiple moving objects and occlusions are two problems needed to be solved for their fall detection system, which can be addressed by using a multiple cameras scheme with adding corresponding modules for people counting and object classification.

What is the classification method for a lying on a sofa?

For (b), although a ‘lie’ posture is detected, the human body blob is not in the floor region, so the lying on the sofa case is correctly classified as non-fall.

What are the two types of postures that are not detected as falls?

For (d) and (e), either the detected ‘bend’ posture does not hold for a long time (for case (d), a person ties his shoe lace and the ‘bend’ posture recovers to ‘stand’ posture in a short time), or the posture is not in the ground region (only a small portion of the human body region is in the ground), so they are not detected as falls.

(Open Access) A Posture Recognition-Based Fall Detection System for Monitoring an Elderly Person in a Smart Home Environment (2012) | Miao Yu

Q: What are the contributions in "Posture recognition based fall detection system for monitoring an elderly person in a smart home environment" ?

The authors propose a novel computer vision based fall detection system for monitoring an elderly person in a home care application. From a dataset of 15 people, the authors show that their fall detection system can achieve a high fall detection rate ( 97. 08 % ) and a very low false detection rate ( 0. 8 % ) in a simulated home environment.

This item was submitted to Loughborough’s Institutional Repository

(https://dspace.lboro.ac.uk/) by the author and is made available under the

following Creative Commons Licence conditions.

For the full text of this licence, please go to:

http://creativecommons.org/licenses/by-nc-nd/2.5/

Posture Recognition Based Fall Detection System

For Monitoring An Elderly Person In A Smart

Home Environment

Miao Yu, Adel Rhuma, Syed Mohsen Naqvi, Liang Wang and Jonathon Chambers

Abstract— We propose a novel computer vision based fall

detection system for monitoring an elderly person in a home

care application. Background subtraction is applied to extract

the foreground human body and the result is improved by using

certain post-processing. Information from ellipse ﬁtting and a

projection histogram along the axes of the ellipse are used as

the features for distinguishing different postures of the human.

These features are then fed into a directed acyclic graph support

vector machine (DAGSVM) for posture classiﬁcation, the result

of which is then combined with derived ﬂoor information to

detect a fall. From a dataset of 15 people, we show that our fall

detection system can achieve a high fall detection rate (97.08%)

and a very low false detection rate (0.8%) in a simulated home

environment.

Index Terms— Health care, assistive living, fall detec-

tion, multi-class classiﬁcation, DAGSVM, system integration

I. INTRODUCTION

In this section, we will brieﬂy review the existing fall detection

systems and describe our new computer vision based fall detection

system.

A. Current fall detection techniques

Nowadays, the trend in western countries is for populations to

contain an increasing number of elderly people. As shown in [1],

the old-age dependency ratio (the number of people 65 and over

relative to those between 15 and 64) in the European Union (EU) is

projected to double to 54 percent by 2050, which means that the EU

will move from having four persons of working age for every elderly

citizen to only two. So, the topic of home care for elderly people is

receiving more and more attention. Among such care, one important

issue is to detect whether an elderly person has fallen or not [2].

According to [2], falls are the leading cause of death due to injury

among the elderly population and 87% of all fractures in this group

are caused by falls. Although many falls do not result in injuries,

47% of non-injured fallers can not get up without assistance and this

period of time spent immobile also affects their health. An efﬁcient

fall detection system is essential for monitoring an elderly person and

can even save his life in some cases. When an elderly person falls,

a fall detection system will detect the anomalous behavior and an

alarm signal will be sent to certain caregivers (such as hospitals or

health centers) or the elderly person’s family members by a modern

communication method. Fig. 1 shows such a fall detection system.

Different methods have been proposed for detecting falls and

are mainly divided into two categories: non-computer vision based

methods and computer vision based methods.

Miao Yu, Adel Rhuma, Syed Mohsen Naqvi and Jonathon Chambers are

with the Advanced Signal Processing Group, School of Electronic, Electrical

and Systems Engineering, Loughborough University, UK, e-mails: (m.yu,

a.rhuma, s.m.r.naqvi, j.a.chambers)@lboro.ac.uk.

Liang Wang is with the National Laboratory of Pattern Recognition

(NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing,

China, e-mail: wangliang@nlpr.ia.ac.cn.

1) Non-computer vision based methods: There are many non-

computer vision based methods for fall detection [3], [4], [5] and [6].

For these methods, different sensors (including acceleration sensors,

acoustic sensors and ﬂoor vibration sensors) are used to capture the

sound, vibration and human body movement information and such

information is applied to determine a fall.

Veltink et al. [3] were the ﬁrst to utilize a single axis acceleration

sensor to distinguish dynamic and static activities. In their work,

acceleration sensors were placed over the chest and at the feet to

observe the changes, and a threshold based algorithm was applied on

the measured signals for fall detection. Kangas et al. [4] proposed an

improved scheme, they used a single three axis acceleration sensor

to attach to the subject’s body in different positions and the dynamic

and static acceleration components measured from these acceleration

sensors were compared with appropriate thresholds to determine a

fall. Experimental results conﬁrmed that a simple threshold based

algorithm was appropriate for certain falls. Some researchers have

also used acoustic sensors for fall detection. In [5], an acoustic

fall detection system (FADE) that would automatically signal a fall

to the monitoring care giver was designed. A circular microphone

array was applied to capture and enhance sounds in a room for

the classiﬁcation of ‘fall’ or ‘non-fall’, and the height information

of the sound source was used to reduce the false alarm rate. The

authors evaluated the performance of FADE using simulated fall and

nonfall sounds performed by three stunt actors trained to behave like

elderly people under different environmental conditions and good

performance was obtained (100% fall detection rate and 3% false

detection rate using a dataset consisting of 120 falls and 120 nonfalls).

Y. Zigel et al. in [6] proposed a fall detection system based on ﬂoor

vibration and sound sensing. Temporal and spectral features were

extracted from signals and a Bayes’ classiﬁer was applied to classify

fall and nonfall activities. In their work, a doll which mimicked a

human was used to simulate falls and their system detected such

falls with a fall detection rate of 97.5% and a false detection rate of

1.4%.

Although non-computer vision based methods may appear to

be suitable for wide application in the fall detection ﬁeld, several

problems do exist; they are either inconvenient (elderly people have

to wear acceleration sensors) or easily affected by noise in the

environment (acoustic sensors and ﬂoor vibration sensors). In order

to overcome these problems, computer vision based fall detection

techniques are adopted. Infringement of personal privacy is a con-

cerning issue for computer vision based fall detection systems and

elderly people may worry that they are being ‘watched’ by cameras.

However, in most computer vision based fall detection systems,

only the alarm signal (sometimes with a short video clip as further

conﬁrmation of whether an elderly person has fallen or not) will be

sent to the caregivers or family members when a fall is detected;

additionally, the original video recordings of an elderly person’s

normal activities will not be stored, nor transmitted.

2) Computer vision based methods: In the last 10 years, there

have been many advances in computer vision and camera/video and

image processing techniques that use real time movement of the

subject, which opens up a new branch of methods for fall detection.

For computer vision based fall detection methods, some re-

Fig. 1. Schematic representation of a fall detection system.

searchers have extracted information from the captured video and

a simple threshold method has been applied to determine whether

there is a fall or not; representative ones due to Rougier et al. are [7]

and [8]. In these two papers, the head’s velocity information and the

shape change information were extracted and appropriate thresholds

were set manually to differentiate fall and non-fall activities. However

these two methods produce high false detection rates (such as when

a fast sitting activity was misclassiﬁed as a fall activity [7]) and

the performance was strongly related to the set threshold. Another

threshold based method was proposed in [9] in which calibrated cam-

eras were used to reconstruct the three-dimensional shape of people.

Fall events were detected by analyzing the volume distribution along

the vertical axis, and an alarm was triggered when the major part

of this distribution was abnormally near the ﬂoor over a predeﬁned

period of time. The experimental results showed good performance

of this system (achieving 99.7% fall detection rate or better with four

cameras or more) and a graphic processing unit (GPU) was applied

for efﬁcient computation.

With the recent rapid development of pattern recognition tech-

niques many researchers have exploited such methods in fall detec-

tion. Posture recognition based fall detection methods are proposed in

[10], [11], [12] and [13]; in [10], the researchers used a neural fuzzy

network for posture classiﬁcation, and when the detected posture

changed from ‘stand’ to ‘lie’ in a short time, a fall activity was

detected. A similar idea was proposed in [11] except that the classiﬁer

was replaced with a more common k-nearest neighbour classiﬁer;

moreover, statistical hypothesis testing was applied to obtain the

critical time difference to differentiate a fall incident event from a

lying down event, and a correct detection rate of 84.44% was obtained

according to their experimental results. In [12] and [13], Mihailidis

et al. used a single camera to classify fall and non-fall activities.

Carefully engineered features, such as silhouette features, lighting

features and ﬂow features were extracted to achieve robustness in the

system to lighting, environment and the presence of multiple moving

objects. In [13] three pattern recognition methods were compared

(logistic regression, neural network and support vector machine) and

the neural network achieved the best performance with a fall detection

rate of 92% and a false detection rate of 5%.

Some other researchers classiﬁed fall and non-fall activities based

on the features extracted from short video clips. The representative

papers are [14] and [15]. For [14], a bounding box and motion

information were extracted from consecutive silhouettes as features.

These were then used to train a hidden Markov model (HMM) for

classifying fall and non-fall activities. In [15], a person’s three-

dimensional orientation information was extracted from multiple

uncalibrated cameras, and an improved version of HMM–layered

hidden Markov model (LHMM) was used for fall detection. Although

theoretically elegant, insufﬁcient experimental results were provided

in this paper (it only concerned two kinds of activities – walking and

falling).

There are also some other computer vision based methods for

fall detection. Nait-Charif and McKenna [16] proposed a method for

automatically extracting motion trajectory and providing a human

readable summary of activity and detection of unusual inactivity

in a smart home. A fall was detected as a deviation from usual

activity according to the particle ﬁlter-based tracking results. This

method exploited an unsupervised approach to detect abnormal events

(mainly falls) and as it is common with unsupervised methods has

the disadvantage that a long training period was required. In [17], D.

Anderson proposed a fuzzy logic based linguistic summarization of

video for fall detection. A hierarchy of fuzzy logic was used, where

the output from each level was summarized and fed into the next

level for inference. Corresponding fuzzy rules were designed under

the supervision of nurses to ensure that they reﬂect the manner in

which elders perform their activities. This system was tested on a

dataset which contained 14 fall a ctivities and 32 non-fall activities;

all the fall activities were correctly detected and only two non-fall

activities were mistaken as fall activities, which shows an acceptable

level of performance.

In this paper, we propose a new computer vision fall detection

system which is based on posture recognition using a single camera

to monitor an elderly person who lives alone at home. An efﬁcient

codebook background subtraction algorithm is applied to extract

the human body foreground and some post-processing is applied

to improve the results. From the extracted foreground silhouette,

we extract features from the ﬁtted ellipse and projection histogram,

which are used for classiﬁcation purposes. These features are fed into

the DAGSVM (which is trained from a dataset containing features

extracted from different postures in different orientations) and the

extracted foreground silhouette is classiﬁed as one of four different

postures (bend, lie, sit and stand). The classiﬁcation results, together

with the detected ﬂoor information, are then used to determine fall

or non-fall activities. The ﬂow chart of the proposed fall detection

system is shown in Fig. 2. In the next sections, we will describe

different blocks of this ﬂow chart in detail.

II. METHODS

A. Human body extraction

1) Background subtraction: In visual surveillance, a common

approach for discriminating moving objects from the background

is detection by background subtraction. Currently, there are many

background subtraction algorithms, these include the single-mode

model background subtraction method [18] and [19], the mixture

of Gaussians (MoG) background subtraction method [20], the non-

parametric density estimation based method [21] and the codebook

background subtraction method [22]. In this fall detection system,

we use the codebook method because of its advantages. There is

no parametric assumption on the codebook model and it shows the

following merits as proposed in [22]: (1) resistance to artifacts of

acquisition, digitization and compression, (2) capability of coping

with illumination changes, (3) adaptive and compressed background

Fig. 2. The ﬂow chart of the proposed fall detection system

models that can capture structural background motion over a long

period of time under limited memory, (4) unconstrained training that

allows moving foreground objects in the scene during the initial

training period.

The codebook method is available for both colour and gray-

scale images, it is a pixel-based approach and initially a code-

book is constructed for each pixel during a training phase. As-

suming the training dataset I contains a number of N images:

I = {imag

, ..., imag

}, then, for a single pixel (x,y), it has

N training samples imag(x, y)

, ..., imag(x, y)

. From these N

training samples, a codebook is constructed for this pixel, which

includes a certain number of codewords. Each codeword, denoted

by c, consists of an RGB vector v = (R, G, B) and a 6-tuple

aux = (

I, f, λ, p, q). The meanings of the six parameters in aux

are described as follows:

I Maximum intensity that has been represented by the codeword.

I Minimum intensity that has been represented by the codeword.

f Number of times that the codeword has been used.

λ Maximum negative runtime length (MNRL) in number of frames.

p The ﬁrst frame in which this codeword was used.

q The last frame in which this codeword was used.

The details of the training procedure are given in [22] and the

trained codebooks of pixels are then used for background subtraction

purpose. For an incoming colour frame f, its pixel f(x, y) =

(R(x, y), G(x, y), B(x, y)) (a 3-dimensional vector) is determined

as a foreground or background pixel by comparing f(x, y) with

codewords in the codebook of this pixel. If f(x, y) is not matched

with any codeword, then it is a foreground pixel. For a particular

codeword c , we say the codeword c matches f(x, y) if the following

two conditions are met.

colordist(f(x, y ), c) ≤ ε

br ightness(I, ⟨

I⟩) = true (1)

where ε is a preset threshold value for comparison, I represents the

norm of f(x, y),

I and

I are the ﬁrst two parameters of the 6-tuple

aux vector of the codeword c.

The col ordist(f(x, y), c) measures the chromatic difference be-

tween two colour vectors, which can be calculated by:

colordist(f(x, y), c) =



∥ f(x, y) ∥

−

⟨f(x, y), v⟩

∥ v ∥

(2)

where v represents the RGB vector v = (R, G, B) of codeword c,

and ∥ · ∥ and ⟨·⟩ denote respectively the Euclidean norm and dot

product operations.

The brightness(I, ⟨

I⟩) is deﬁned as:

br ightness(I, ⟨

I⟩) =



tr ue if I

low

≤∥ f(x, y) ∥≤ I

false otherwise

(3)

where I

low

= α

I and I

= min{β

}. In our experiment, α and

β are ﬁxed to be 0.5 and 2 respectively for background subtraction.

An important problem in background subtraction is background

model updating, because the background will not be kept constant

(such as with gradual light change, or movement of the furniture).

The codebook background subtraction method therefore provides a

background model updating scheme. The matched codeword accord-

ing to (1) is updated as shown in [22]. Moreover, an additional cache

model is introduced. If one codeword in this model is matched with

the incoming pixel values for a period longer than a time threshold

(which means this codeword is a new background codeword), it is

added to the original codebook. And for a codeword which is not

matched with incoming pixels longer than a time threshold (which

means this codeword is no longer a background codeword), it is

deleted from the codebook. Through the background model updating

scheme, we can cope with change of the background in an indoor

environment.

2) Post-processing: The result of the codebook background

subtraction is deﬁnitely not perfect and needs to be improved to obtain

a more accurate result to deﬁne the human’s silhouette. As shown

in one example in Fig. 3 (d) (the original background subtraction

result), we can see two types of problems: 1) There are many noise-

like pixel regions (very small areas which have sizes of less than 50

pixels, marked in blue); 2) Occasional movement of furniture, may

produce ghost foreground regions (marked in yellow in Fig. 3 (d)) and

the furniture at a new position can also be taken as the foreground

(marked green in Fig. 3 (d)). These two problems will deﬁnitely

deteriorate the result of the human body extraction. In order to solve

these problems, certain post-processing is applied.

As proposed in [23], the connected foreground pixels form a region

termed as a blob. By using the OpenCV blob library [24], we obtain

blobs in a binary image format and small blobs with a size smaller

than 50 pixels are removed. In this way, noise can be removed.

The background updating scheme can cope to some extent with the

large ghosting errors caused by movement of furniture, and furniture

appearing at a new position through absorbtion into the background

model [22]. However, there are two problems if we rely solely on the

background updating scheme: 1) It will take a time for ghosting and

furniture to be absorbed into the background model by background

updating; 2) The background updating scheme will wrongly absorb a

foreground human body into the background model if he/she is static

for a long time. In order to solve these two problems, we use a novel

three step blob operation strategy as follows:

Step1. Blob merging: If the distance between two blobs is less

than a threshold, these two blobs will be merged (as shown in Fig.

3 (d), the blobs B2 and B3 contain several separate blobs which are

near to each other). The distance between two blobs is deﬁned as the

minimum 4-distance [23] between two rectangles which enclose the

blobs as given by:

D istance( B1, B2) = min

p1∈R1,p2∈R2

(p1, p2) (4)

where B1 and B2 are two blobs, R1 and R2 are two rectangles

which enclose them, and p1 and p2 are points belonging to R1 and

R2. Fig. 4 shows examples of the distance between two blobs with

respect to their positions.

Step2. Active blob determination: If the number of blobs after

blob merging is more than one, it suggests some furniture has been

moved (and we assume that the elderly person lives alone so that

normally there should be only one human moving object). In this

case, we determine which blob is the moving blob by using the frame

(a)

(b)

(c)

(d)

(e)

Fig. 3. The background subtraction and the human body blob determination.

a) Background image; b) Image with human object; c) Frame difference result

obtained from two consecutive frames; d) Original background subtraction

result, there are three large blobs (B1, B2 and B3) after the blob merging

operation and they are marked red, green and yellow, and the blue colour

represents the small noise-like blobs; e) The ﬁnal obtained human body blob.

difference technique [23]. Frame differencing is applied between

consecutive frames to obtain the moving pixels (shown in Fig.3 (c)),

and the blob with the greatest number of moving pixels is taken as

the moving blob (human body blob). From Fig.3, we can see that the

blob B1 contains the most moving pixels and so B1 is ﬁnally taken

as the human body blob.

Step3. Selective updating: The non-active blobs are removed (as

shown in Fig.3 (e), B2 and B3 are removed from the ﬁnal background

subtraction result) and their pixel values form new codewords to

be added to the background codebook immediately for background

model updating. And no updating is performed for pixels in the active

blob.

In this way, ghosting and furniture at a new position are absorbed

into the background model immediately; while the foreground human

body object is not absorbed into the background model even though

he/she is static for a long time.

Fig. 4. Four cases of the distance between two blobs with respect to their

relative positions

3) Background model retraining: The trained background

codebook model can be affected in various ways, such as dramatic

global illumination change due to suddenly turning on the light. In

this situation, the codebook needs to be re-trained because the previ-

ous codebook is no longer available. The dramatic global illumination

change can be detected by frame differencing results, if the percent

of the active pixels in an image is larger than a threshold (we set

50%), then we assume that dramatic global illumination change has

occurred and the background model is retrained.

Next, having extracted a silhouette representation of the human,

we consider feature extraction to describe the posture of the person.

B. Feature extraction

After human body region extraction, the next step is to extract

useful features from the human body region. For feature extraction,

we extract two kinds of features: global features (which roughly

describe the shape of the human body) and local features (which

encapsulate the detail information of the posture of the human body).

To obtain global features, we use ellipse ﬁtting [25] for a binary

image. The moments for a binary image f(x, y) are given as:



x,y

f(x, y) with p, q = 0, 1, 2, 3........... (5)

By using the ﬁrst and zero order spatial moments, we can compute

the center of the ellipse as: ¯x = m

and ¯y = m

. The

angle between the major axis of the person and the horizontal axis x

gives the orientation of the ellipse, and it is computed as:

Θ =

arctan(

− u

) (6)

where the central moment can be calculated as:



x,y

(x − ¯x)

(y − ¯y)

f(x, y) with p, q = 0, 1, 2, 3...........

(7)

The major semi-axis a and the minor semi-axis b can be obtained

by calculating the greatest and least moments of inertia, here we

denote them as I

max

and I

min

. They can be calculated by evaluating

the eigenvalues of the covariance matrix:

J =





(8)

These are calculated as:

max

+ u



− u

)

+ 4u

(9)

min

+ u

−



− u

)

+ 4u

(10)

Finally, according to [8], we can calculate the major semi-axis a

and minor semi-axis b as:

a = (4/π)

1/4

[

max

)

min

]

1/8

(11)

b = (4/π)

1/4

[

min

)

max

]

1/8

(12)

An ellipse ﬁtting result is depicted in Fig.5, and we compare the

ellipse ﬁtting result and the rectangle ﬁtting result used in [10]. The

ellipse ﬁtting is clearly better in describing the human posture in the

presence of noise (such as the line underneath a person’s feet due to

the poor segmentation, as shown in Fig.5). After ellipse ﬁtting, the

orientation of the ellipse and the ratio between a and b are taken as

global features, which have been found experimentally to be sufﬁcient

to describe the posture of a human body.

Such global features are, however, insufﬁcient to describe the

postures in detail, and sometimes it is hard to differentiate two

postures by using only the global information (such as a sit posture

and a sit-like bend posture). We need to use more information (local

A Posture Recognition-Based Fall Detection System for Monitoring an Elderly Person in a Smart Home Environment

Figures

Citations

Internet of Things: Architectures, Protocols, and Applications

Smart Homes for Elderly Healthcare—Recent Advances and Research Challenges

A Survey on Activity Detection and Classification Using Wearable Sensors

Remote patient monitoring: a comprehensive study

Survey on Fall Detection and Fall Prevention Using Wearable and External Sensors

References

Statistical learning theory

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Adaptive background mixture models for real-time tracking

Pfinder: real-time tracking of the human body

Related Papers (5)

Robust Video Surveillance for Fall Detection Based on Human Shape Deformation

A survey on fall detection: Principles and approaches

A Microphone Array System for Automatic Fall Detection

Fall Detection With Multiple Cameras: An Occlusion-Resistant Method Based on 3-D Silhouette Vertical Distribution

Challenges, issues and trends in fall detection systems

Frequently Asked Questions (12)

Q1. What are the contributions in "Posture recognition based fall detection system for monitoring an elderly person in a smart home environment" ?

Q2. How many people were recorded to form their posture dataset?

Q3. What are the main problems of computer vision based fall detection systems?

Q4. What is the common approach for detecting moving objects from the background?

Q5. How many images are used in the training dataset?

Q6. What are the advantages of the proposed fall detection system?

Q7. What are the features of the fitted ellipse and projection histogram?

Q8. What are the problems that can be addressed by using a multiple cameras scheme?

Q9. What is the classification method for a lying on a sofa?

Q10. What are the main problems of non-computer vision based methods?

Q11. What are the two types of methods proposed for detecting falls?

Q12. What are the two types of postures that are not detected as falls?