scispace - formally typeset
Open AccessBook ChapterDOI

Recognizing Workshop Activity Using Body Worn Microphones and Accelerometers

Reads0
Chats0
TLDR
In this article, the authors presented a technique to automatically track the progress of maintenance or assembly tasks using body worn sensors based on a novel way of combining data from accelerometers with simple frequency matching sound classification.
Abstract
The paper presents a technique to automatically track the progress of maintenance or assembly tasks using body worn sensors. The technique is based on a novel way of combining data from accelerometers with simple frequency matching sound classification. This includes the intensity analysis of signals from microphones at different body locations to correlate environmental sounds with user activity. To evaluate our method we apply it to activities in a wood shop. On a simulated assembly task our system can successfully segment and identify most shop activities in a continuous data stream with zero false positives and 84.4% accuracy.

read more

Content maybe subject to copyright    Report

Recognizing Workshop Activity Using Body Worn Microphones and
Accelerometers
P. Lukowicz, J. Ward, H. Junker, M. St¨ager, G. Tr¨oster
ETH - Swiss Federal Institute of Technology
Wearable Computing Laboratory
8092 Z¨urich, Switzerland
www.wearable.ethz.ch
A. Atrash and T. Starner
College of Computing
Georgia Institute of Technology
Atlanta, Georgia 30332-0280
amin,thad
@cc.gatech.edu
Abstract
Most gesture recognition systems analyze gestures intended
for communication (e.g. sign language) or for command
(e.g. navigation in a virtual world). We attempt instead to
recognize gestures made in the course of performing every-
day work activities. Specifically, we examine activities in
a wood shop, both in isolation as well as in the context of
a simulated assembly task. We apply linear discriminant
analysis (LDA) and hidden Markov model (HMM) tech-
niques to features derived from body-worn accelerometers
and microphones. The resulting system can successfully
segment and identify most shop activities with zero false
positives and 83.5% accuracy.
1 Introduction
Advances in technology are allowing computer support for
mobile applications. Delivery, maintenance, and manufac-
turing personnel are adopting mobile computing devices to
support their work. Similarly, consumers now have access
to mobile electronic tourist guides, communication devices,
and health and wellness monitoring devices.
A key issue in most such mobile applications is the ef-
fort required to devote to operating the devices. Whereas
in a desktop setting the computer is the focus of the user’s
attention, the user is forced to focus his attention on the
environment for many mobile applications. Accessing the
computer should require minimal cognitive and physical ef-
fort to prevent distracting the user from his primary task.
1.1 Context Sensitivity in Wearable Systems
In addressing the above issues wearable computers have re-
cently emerged as a promising new paradigm. To reduce
the physical effort required to operate the device they are
designed to be a permanently accessible part of the user’s
outfit, have mostly hands free input devices, and head-up
displays.
With respect to the cognitive load, many wearable sys-
tems focus on context sensitivity and proactiveness (e.g [1]).
The system should be aware of the user’s action and the ac-
tivities occuring in his environment. Based on this aware-
ness, the system can adapt its configuration, deliver infor-
mation to the user, or record interesting events without any
explicit user input [17]. For example, a maintenance sup-
port system could recognize what particular task is being
performed by the user and automatically display the rele-
vant manual pages on the system’s head-up display. The
wearable could also record the sequence of operations that
are being performed for later analysis or could warn the user
if an important step has been forgotten.
1.2 Recognition Approach
Past approaches by the authors have used head-mounted
cameras and computer vision techniques to identify user
context [17]. Although the visual signal contains much rel-
evant information about any given situation, vision–based
recognition has several disadvantages. For one, reliable lo-
calization and recognition of the relevant objects (hands,
machine parts, tools) in complex scenes is an open research
problem. In addition, computervisiontechniqueshave diffi-
culty with the unstructured, moving backgrounds and vary-
ing lighting condition as is common to many wearable sce-
narios, and relevant parts of the scene might be out of view
or obstructed. Finally, video recognition is computationally
intensive, often requiring resources not available on a wear-
able system.
A recognition approach gaining popularity in the wear-
able community is simple sensors integrated in the user’s
outfit and in the user’s artifacts (e.g. tools, appliances, or
parts of the machinery) [10]. One of the key aspects of
this approach is the recognition and tracking of postures and
gestures using motion sensors attached to appropriate loca-
tions on the user’s limbs. Initial experiments have shown
that many activities can be well identified through such
analysis[8]. Another important source of information about

environmental activity is sound. It has been shown that in
many situations ambient sound analysis can be used to dis-
tinguish between different settings, activities, and situations
[4].
1.3 Paper Aims and Contributions
This paper is part of our work aiming to develop a reli-
able context recognition methodology based on the above
approach. It presents a novel way of combining motion
sensor-based gesture recognition with sound data from dis-
tributed microphones. In particular we exploit intensity dif-
ferences between microphones on the wrist of the dominant
hand and on the chest to identify relevant actions performed
by the user’s hand.
In the paper we focus on tracking user activity during as-
sembly or maintenance tasks. Such tasks are among the
most important applications of wearable computing (e.g.
[2, 7]) and could significantly benefit from context sensi-
tivity. At the same time these tasks are well structured and
limited to a reasonable number of often repetitive actions.
In addition, machines and tools typical to a workshop envi-
ronment generate distinct sounds. Therefore, these activi-
ties are well suited for a combination of gesture and sound–
based recognition.
This paper describes our approach and the results pro-
duced in an experiment performed on an assembly task in
a wood workshop. We demonstrate that simple sensors
placed on the user’s body can reliably select and recognize
user actions during a workshop procedure.
1.4 Related Work
Acceleration–basedactivityrecognition has been studied by
different research groups [11, 14, 19]. However all of the
above work focused on recognizing comparatively simple
activities (walking, running, and sitting). Sound based sit-
uation analysis has been investigated by Pelton et al. and
in the wearables domain by Clarkson and Pentland [12, 5].
Intelligent hearing aids have also exploited sound analysis
to improve their performance [3].
2 Experimental Setup
Performing initial experiments on live assembly or mainte-
nance tasks is unadvisable due to the cost and safety con-
cerns and the ability to obtain repeatable measurements un-
der experimental conditions. As a consequence we have de-
cided to focus on an “artificial” task performed at the work-
bench of wood workshop of our lab (see Figure 1). The task
consisted of assembling a simple object made of two pieces
of wood and a piece of metal. The task required 8 process-
ing steps using different tools and including walking and
Figure 1: Left: the wood workshop with 1) grinder, 2) drill,
3)file and saw, 4) vice, and 5) cabinet with drawers. Right:
The sensor type and placement is identical with the one used
in our experiment: 1,4: microphone, 2,3 and 5: 3-axis ac-
celeration sensors.
other gestures similar to an assembly task in a real world
setting.
2.1 Procedure
No action
1 take the wood out of the drawer
2 put the wood into the vice
3 take out the saw
4 saw
5 put the saw into the drawer
6 take the wood out of the vice
7 drill
8 get the nail and the hammer
9 hammer
10 put away the hammer, get the driver and the screw
11 drive the screw in
12 put away the driver
13 pick up the metal
14 grind
15 put away the metal, pick up the wood
16 put the wood into the vice
17 take the file out of the drawer
18 file
19 put away the file, take the sandpaper
20 sand
21 take the wood out of the vice
Table 1: Steps of workshop assembly task.
The assembly sequence consists of sawing a piece of
wood, drilling a hole in it, grinding a piece of metal, at-
taching it to the piece of wood with a screw, hammering in
a nail to connect the two pieces of wood, and then finish-
ing the product by smoothing away rough edges with a file

and a piece of sandpaper. The wood was fixed in the vice
for sawing, filing, and smoothing (and removed whenever
necessary). The test subject moved between areas in the
workshop between steps. Also, whenever a tool or an ob-
ject (nail screw, wood) was required, it was retrieved from
its drawer in the cabinet and returned after use.
The exact sequence of actions is listed in Table 1. The
task was to recognize all tool-based activities. Tool-based
activities excludes drawer manipulation, user locomotion,
and clapping (a calibration gesture). The experiment was
repeated 10 times in the same sequence to collect data for
training and testing. For practical reasons, the individual
processing steps were only executed long enough to obtain
an adequate sample of the activity. This policy did not re-
quire the complete execution of any one task (e.g. the wood
was not completely sawn), allowing us to complete the ex-
periment in a reasonable amount of time. However this pro-
tocol influenced only the duration of each activity and not
the manner in which it was performed.
2.2 Data Collection System
The data was collected using the ETH PadNET sensor net-
work [8] equipped with 3 axis accelerometer nodes and two
Sony mono microphones connected to a body worn com-
puter. The position of the sensors on the body is shown in
Figure 1: an accelerometer node on both wrist and on the
upper arm of the right hand and a microphone on the chest
and on the right wrist (the test subject was right handed).
As can be seen in Figure 1 each PadNET sensor node
consist of two modules. The main module incorporates
a MSP430149 low power 16-Bit mixed signal micropro-
cessor (MPU) from Texas Instruments running at 6 MHz
maximum clock speed. The current module version reads
out up to three analog sensor signals including amplifica-
tion and filtering and handles the communication between
modules through dedicated I/O pins. The sensors them-
selves are hosted on an even smaller sensor-module’ that
can be either placed directly on the main module or con-
nected through wires. In the experiment described in this
paper sensor modules were based on a 3-axis accelerom-
eter package consisting of two ADXL202E devices from
Analog Devices. The analog signals from the sensor were
lowpass filtered (

) and digitized with 12Bit
resolution using a sampling rate of 100Hz.
3 Recognition
3.1 Acceleration Data Analysis
Figure 2 shows a segment of the acceleration data collected
during the experiment. The segment includes sawing, re-
moving the wood from the vice, and drilling. The user ac-
cesses the drawer two times and walks between the vice and
the drill. Clear differences can be seen in the acceleration
signals. For example, sawing clearly reflects a periodic mo-
tion. By contrast, the drawer access (marked as 1a and 1b in
the figure) shows a low frequency “bump” in acceleration.
This bump corresponds to the 90 degree turns of the wrist
as the user releases the drawer handle, retrieves the object,
and grasps the handle again to close the drawer.
Given the data, time series recognition techniques such
as hidden Markov models (HMMs) [13] should allow the
recognition of the relevant gestures. However, a closer anal-
ysis reveals two potential problems. First, not all relevant
activities are strictly constrained to a particular sequence of
motions. While the characteristic motions associated with
sawing or hammering are distinct, there is high variation in
drawer manipulation and grinding. Secondly, the activities
are separated by sequences of user motions unrelated to the
task (e.g the user scratching his head). Such motions may
be confused with the relevant activities. We define a “noise”
class to handle these unrelated gestures.
3.2 Sound Data Analysis
Considering that most gestures relevant for the assem-
bly/maintanance scenario are associated with a distinct
sounds, sound analysis should help to address the problems
described above. We distinguish between three different
types of sounds:
1. Sounds made by a handtool: - Such sounds are directly
correlated with user hand motion. Examples are saw-
ing, hammering, filing, and sanding. These actions are
generally repetitive, quasi–stationary sounds (i.e. rel-
atively constant over time - such that each time slice
on a sample would produce an identical spectrum over
a reasonable length of time). In addition these sounds
are much louder than the background noise (dominant)
and are likely to be much louder at the microphone
on the user’s hand than on his chest. For example,
the intensity curve for sanding (see Figure 2 top right)
reflects the periodic sanding motion with the minima
corresponding to the changes in direction and the max-
ima coinciding with the maximum sanding speed in the
middle of the motion. Since the user’s hand is directly
on the source of the sound the intensity difference is
large. For other activities it is smaller, however in most
cases still detectable.
2. Semi-autonomous sounds: These sounds are initiated
by user’s hand, possibly (but not necessarily) remain-
ing close to the source for most of the sound duration.
This class includessound produced by a machine, such
as the drill or grinder. Although ideal quasi-stationary
sounds, sounds in this class may not necessarily be

1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6
x 10
4
−2
−1
0
1
2
3
4
5
6
7
8
Data Frame
Accelerometer Reading (Offset per Axis for Readability)
sound
correleation
r−wrist
axis 1
r−wrist
axis 2
r−wrist axis 3
l−wrist
axis 1
r−arm
axis 1
sawing
open vice
close vice
turn drill
wheel to move
it down
taking out
the saw
putting back
the saw
1a
1b
2
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
2
4
6
8
10
12
14
16
18
20
time [s]
Intensity
Sanding
Wrist microphone
Chest microphone
−40 −20 0 20 40 60 80
−60
−50
−40
−30
−20
−10
0
10
20
saw
drill
drawer
Figure 2: Left: example accelerometer data from sawing and drilling. Right top: audio profile of sanding from wrist and
chest microphones. Right bottom: clustering of activities in LDA space
dominant and tend to have a less distinct intensity dif-
ference between the hand and the chest (for example,
when a user moves their hand away from the machine
during operation).
3. Autonomous sounds: These are sounds generated by
activities not driven by the user’s hands (e.g loud back-
ground noises or the user speaking).
Obviously the vast majority of relevant actions in assembly
and maintanance are associated with hand tool sounds and
semi–autonomoussounds. In principle, these sounds should
be easy to identify using intensity differences between the
wrist and the chest microphone. In addition, if extracted ap-
propriately, these sounds may be treated as quasi-stationary
and can be reliably classified using simple spectrum pattern
matching techniques.
The main problem with this approach is that many ir-
relevant actions are also likely to fall within the definition
of handtool and semi–autonomous sound. Such actions
include scratching or putting down an object. Thus, like
acceleration analysis, sound–based classification also has
problem distinguishing relevant from irrelevant actions and
will produce a number of false positives.
3.3 Recognition Methodology
Neither acceleration nor sound provide enough information
for perfect extraction and classification of all relevant activ-
ities; however, we hypothesize that their sources of error are
likely to be statistically distinct. Thus, we develop a tech-
nique based on the fusion of both methods. Our procedure
consists of three steps:
1. Extraction of the relevant data segments using the in-
tensity difference between the wrist and the chest mi-
crophone. We expect that this technique will segment
the data stream into individualactions (including many
actions we will model as noise).
2. Independent classification of the actions based on
sound or acceleration. This step will yield imperfect
recognition results by both the sound and acceleration
subsystems.
3. Removal of false positives. While the sound and ac-
celeration subsystems are each imperfect, when their
classifications of a segment agree, the result may be
more reliable (if the sources of error are statistically
distinct).
4 Isolated Activity Recognition
As an initial experiment, we segment the activities in the
data files by hand and test the accuracy of the sound and
acceleration methods separately.

4.1 Sound Recognition
4.1.1 Method
The basic classification scheme operates on individual
sound segments of length

. The approach follows a three
step process: feature extraction, dimensionality reduction,
and the actual classification.
The features used are the spectral components of each

obtained by Fast Fourier Transformation (FFT). This pro-
duces



dimensional feature vectors.
Rather than attempting to classify such large
-
dimensional vectors directly, Linear Discriminant Analysis
(LDA)[6] is employed to derive an optimal projection of the
data into a smaller,
dimensional feature space (where M
is the number of classes). In the “recognition phase”, the
LDA transformation is applied to the data segment under
test to produce the corresponding

dimensional fea-
ture vector.
Using a labeled training-set, class means are calculated
in the

dimensional space. Classification is performed
simply by choosing the class mean which has the minimum
Euclidean distance from the test feature vector (see Figure
3 bottom right).
4.1.2 Intensity Analysis
Making use of the fact that signal intensity is inversely pro-
portional to the square of the distance from its source, the
ratio of the two intensities



is used as a mea-
sure of absolute distance of source from the user. Assuming
the sound source is distance
from the wrist microphone
and
 "!
from the chest, the ratio of the intensities will be
proportional to
#$

&%
'
 !(
*)+!,"!
-.
)!
!
When both microphones are separated by at least
!
, any
sound produced a distance
( where
0/1/2!
) from the
user will bring this ratio close to one. Sounds produced near
the chest microphone (e.g. the user speaking) will cause the
ratio to approach zero whereas any sounds close to the wrist
mic will make this ratio large.
Sound extraction is performed by sliding a window
3
4
over the
Hz resampled audio data. On each iteration,
the signal energy over
354
for each channel is calculated.
For these windows, the difference in ratio
6
7

and
its reciprocal are obtained, which are then compared to an
empirically obtained threshold
894
.
The difference
:


"

7#$
provides a
convenient metric for thresholding - zero indicates a far off
(or exactly equidistant) sound while above or below zero
values indicate a sound closer to the wrist or to the chest
mic respectively.
Sound LDA IA+LDA maj(IA+LDA)
Hammer 96.79 98.85 100
Saw 92.71 92.98 100
Filing 69.68 81.43 100
Drilling 99.59 99.35 100
Sanding 93.66 92.87 100
Grinding 97.77 97.75 100
Screwing 91.17 93.29 100
Vice 80.10 81.14 100
Overall 90.18 92.21 100
Table 2: Isolated Recognition Accuracy Per Sound (in %)
for LDA alone, LDA with IA preselection and majority de-
cision.
4.2 Results
In order to analysis the performanceof the LDA on isolated
classes, individual examples of each class were partitioned
from each of the 10 experiments, providing 10 examples
of each class. Eight examples of each class were used for
training while testing on the remaining two examples.
Earlier work[15] cited
;
=5kHz and

=0.05 seconds
(256 points) as optimal parameters for general purpose
sound recognition tasks. In this task, it was found that
recognition rates were improved using a larger
=0.1; at
the same time
could be reduced to 2kHz without any no-
table adverse effects.
With these parameters, a sliding window
LDA classi-
fication was run directly over all the class partitioned sam-
ples. This process returned an overall recognition rate of
90.19%. The individual class results are given in the first
column of Table 2. We next used intensity analysis to select
only the samples over a given threshold to pass to the LDA
procedure. This technique resulted in a slightly higher accu-
racy of 92.21% as shown in the second column of Table 2.
The third column of Table 2 shows a variation of this tech-
nique where we slide a window over the data and classify
the data at each window segment. A majority decision over
the window segments was used to determine the overall la-
bel for a given isolated activity. This technique resulted in
100% recognition over the test data.
Figure 3: HMMs topologies.

Citations
More filters
Proceedings Article

A public domain dataset for human activity recognition using smartphones

TL;DR: An Activity Recognition database is described, built from the recordings of 30 subjects doing Activities of Daily Living while carrying a waist-mounted smartphone with embedded inertial sensors, which is released to public domain on a well-known on-line repository.
Journal ArticleDOI

Sensor-Based Activity Recognition

TL;DR: A comprehensive survey to examine the development and current status of various aspects of sensor-based activity recognition, making a primary distinction in this paper between data-driven and knowledge-driven approaches.
Journal ArticleDOI

Activity identification using body-mounted sensors — a review of classification techniques

TL;DR: This article reviews the different techniques which have been used to classify normal activities and/or identify falls from body-worn sensor data and illustrates the variety of approaches which have previously been applied.
Journal ArticleDOI

A Survey on Ambient Intelligence in Healthcare

TL;DR: The state-of-the-art artificial intelligence (AI) methodologies used for developing AmI system in the healthcare domain are summarized, including various learning techniques (for learning from user interaction), reasoning techniques ( for reasoning about users' goals and intensions), and planning techniques (For planning activities and interactions).
Journal ArticleDOI

Physical Human Activity Recognition Using Wearable Sensors.

TL;DR: A review of different classification techniques used to recognize human activities from wearable inertial sensor data shows that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms.
References
More filters
Journal ArticleDOI

An introduction to hidden Markov models

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.
Journal ArticleDOI

Real-time American sign language recognition using desk and wearable computer based video

TL;DR: Two real-time hidden Markov model-based systems for recognizing sentence-level continuous American sign language (ASL) using a single camera to track the user's unadorned hands are presented.
Proceedings ArticleDOI

Recognizing human motion with multiple acceleration sensors

TL;DR: In this paper experiments with acceleration sensors are described for human activity recognition of a wearable device user and the use of principal component analysis and independent component analysis with a wavelet transform is tested for feature generation.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What contributions have the authors mentioned in the paper "Recognizing workshop activity using body worn microphones and accelerometers" ?

The authors attempt instead to recognize gestures made in the course of performing everyday work activities. Specifically, the authors examine activities in a wood shop, both in isolation as well as in the context of a simulated assembly task. 

In the future, the authors hope to apply these promising techniques for recognizing everyday gestures in more general scenarios. 

Hidden Markov models (HMMs) are probabilistic models used to represent non-deterministic processes in partially observable domains and are defined over a set of states, transitions, and observations. 

The current module version reads out up to three analog sensor signals including amplification and filtering and handles the communication between modules through dedicated I/O pins. 

To test the performance of the HMMs in isolation, the shop accelerometer data was partitioned by hand into 255 individual examples of gestures then used as a training setfor the HMMs. 

In order to analysis the performance of the LDA on isolated classes, individual examples of each class were partitioned from each of the 10 experiments, providing 10 examples of each class. 

To reduce the physical effort required to operate the device they are designed to be a permanently accessible part of the user’s outfit, have mostly hands free input devices, and head-up displays. 

Making use of the fact that signal intensity is inversely proportional to the square of the distance from its source, the ratio of the two intensities is used as a measure of absolute distance of source from the user. 

In this task, it was found that recognition rates were improved using a larger =0.1; at the same time could be reduced to 2kHz without any notable adverse effects. 

Performing initial experiments on live assembly or maintenance tasks is unadvisable due to the cost and safety concerns and the ability to obtain repeatable measurements under experimental conditions. 

whenever a tool or an object (nail screw, wood) was required, it was retrieved from its drawer in the cabinet and returned after use.