scispace - formally typeset
Open AccessJournal ArticleDOI

Deep Learning for Electromyographic Hand Gesture Signal Classification Using Transfer Learning

Reads0
Chats0
TLDR
The proposed transfer learning scheme is shown to systematically and significantly enhance the performance for all three networks on the two datasets, achieving an offline accuracy of 98.31% and real-time feedback allows users to adapt their muscle activation strategy which reduces the degradation in accuracy normally experienced over time.
Abstract
In recent years, deep learning algorithms have become increasingly more prominent for their unparalleled ability to automatically learn discriminant features from large amounts of data. However, within the field of electromyography-based gesture recognition, deep learning algorithms are seldom employed as they require an unreasonable amount of effort from a single person, to generate tens of thousands of examples. This paper’s hypothesis is that general, informative features can be learned from the large amounts of data generated by aggregating the signals of multiple users, thus reducing the recording burden while enhancing gesture recognition. Consequently, this paper proposes applying transfer learning on aggregated data from multiple users while leveraging the capacity of deep learning algorithms to learn discriminant features from large datasets. Two datasets comprised 19 and 17 able-bodied participants, respectively (the first one is employed for pre-training), were recorded for this work, using the Myo armband. A third Myo armband dataset was taken from the NinaPro database and is comprised ten able-bodied participants. Three different deep learning networks employing three different modalities as input (raw EMG, spectrograms, and continuous wavelet transform (CWT)) are tested on the second and third dataset. The proposed transfer learning scheme is shown to systematically and significantly enhance the performance for all three networks on the two datasets, achieving an offline accuracy of 98.31% for 7 gestures over 17 participants for the CWT-based ConvNet and 68.98% for 18 gestures over 10 participants for the raw EMG-based ConvNet. Finally, a use-case study employing eight able-bodied participants suggests that real-time feedback allows users to adapt their muscle activation strategy which reduces the degradation in accuracy normally experienced over time.

read more

Content maybe subject to copyright    Report

1
Deep Learning for Electromyographic Hand Gesture
Signal Classification Using Transfer Learning
Ulysse C
ˆ
ot
´
e-Allard, Cheikh Latyr Fall, Alexandre Drouin,
Alexandre Campeau-Lecours, Cl
´
ement Gosselin, Kyrre Glette, Franc¸ois Laviolette, and Benoit Gosselin
Abstract—In recent years, deep learning algorithms have
become increasingly more prominent for their unparalleled
ability to automatically learn discriminant features from large
amounts of data. However, within the field of electromyography-
based gesture recognition, deep learning algorithms are seldom
employed as they require an unreasonable amount of effort from
a single person, to generate tens of thousands of examples.
This work’s hypothesis is that general, informative features can
be learned from the large amounts of data generated by aggre-
gating the signals of multiple users, thus reducing the recording
burden while enhancing gesture recognition. Consequently, this
paper proposes applying transfer learning on aggregated data
from multiple users, while leveraging the capacity of deep learn-
ing algorithms to learn discriminant features from large datasets.
Two datasets comprised of 19 and 17 able-bodied participants
respectively (the first one is employed for pre-training) were
recorded for this work, using the Myo Armband. A third Myo
Armband dataset was taken from the NinaPro database and
is comprised of 10 able-bodied participants. Three different
deep learning networks employing three different modalities
as input (raw EMG, Spectrograms and Continuous Wavelet
Transform (CWT)) are tested on the second and third dataset.
The proposed transfer learning scheme is shown to systematically
and significantly enhance the performance for all three networks
on the two datasets, achieving an offline accuracy of 98.31%
for 7 gestures over 17 participants for the CWT-based ConvNet
and 68.98% for 18 gestures over 10 participants for the raw
EMG-based ConvNet. Finally, a use-case study employing eight
able-bodied participants suggests that real-time feedback allows
users to adapt their muscle activation strategy which reduces the
degradation in accuracy normally experienced over time.
Index Terms—[h] Surface Electromyography, EMG, Transfer
Learning, Domain Adaptation, Deep Learning, Convolutional
Networks, Hand Gesture Recognition
I. INTRODUCTION
Robotics and artificial intelligence can be leveraged to
increase the autonomy of people living with disabilities. This is
accomplished, in part, by enabling users to seamlessly interact
with robots to complete their daily tasks with increased inde-
pendence. In the context of hand prosthetic control, muscle
activity provides an intuitive interface on which to perform
hand gesture recognition [1]. This activity can be recorded by
surface electromyography (sEMG), a non-invasive technique
Ulysse C
ˆ
ot
´
e-Allard*, Cheikh Latyr Fall and Benoit Gosselin are with
the Department of Computer and Electrical Engineering, Alexandre Drouin
and Franc¸ois Laviolette are with the Department of Computer Science and
Software Engineering, Alexandre Campeau-Lecours and Cl
´
ement Gosselin are
with the Department of Mechanical Engineering, Universit
´
e Laval, Qu
´
ebec,
Qu
´
ebec, Canada. Kyrre Glette is with RITMO and the Department of
Informatics, University of Oslo, Oslo, Norway.
*Contact author email: ulysse.cote-allard.1@ulaval.ca
These authors share senior authorship.
widely adopted both in research and clinical settings. The
sEMG signals, which are non-stationary, represent the sum
of subcutaneous motor action potentials generated through
muscular contraction [1]. Artificial intelligence can then be
leveraged as the bridge between sEMG signals and the pros-
thetic behavior.
The literature on sEMG-based gesture recognition primarily
focuses on feature engineering, with the goal of characterizing
sEMG signals in a discriminative way [1], [2], [3]. Recently,
researchers have proposed deep learning approaches [4], [5],
[6], shifting the paradigm from feature engineering to feature
learning. Regardless of the method employed, the end-goal
remains the improvement of the classifier’s robustness. One
of the main factors for accurate predictions, especially when
working with deep learning algorithms, is the amount of
training data available. Hand gesture recognition creates a
peculiar context where a single user cannot realistically be
expected to generate tens of thousands of examples in a
single sitting. Large amounts of data can however be obtained
by aggregating the recordings of multiple participants, thus
fostering the conditions necessary to learn a general mapping
of users’ sEMG signal. This mapping might then facilitate
the hand gestures’ discrimination task with new subjects.
Consequently, deep learning offers a particularly attractive
context from which to develop a Transfer Learning (TL)
algorithm to leverage inter-user data by pre-training a model
on multiple subjects before training it on a new participant.
As such, the main contribution of this work is to present a
new TL scheme employing a convolutional network (ConvNet)
to leverage inter-user data within the context of sEMG-
based gesture recognition. A previous work [7] has already
shown that learning simultaneously from multiple subjects
significantly enhances the ConvNet’s performance whilst re-
ducing the size of the required training dataset typically seen
with deep learning algorithms. This paper expands upon the
aforementioned conference paper’s work, improving the TL
algorithm to reduce its computational load and improving
its performance. Additionally, three new ConvNet architec-
tures, employing three different input modalities, specifically
designed for the robust and efficient classification of sEMG
signals are presented. The raw signal, short-time Fourier
transform-based spectrogram and Continuous Wavelet Trans-
form (CWT) are considered for the characterization of the
sEMG signals to be fed to these ConvNets. To the best of
the authors’ knowledge, this is the first time that CWTs are
employed as features for the classification of sEMG-based
hand gesture recognition (although they have been proposed

2
for the analysis of myoelectric signals [8]). Another major
contribution of this article is the publication of a new sEMG-
based gesture classification dataset comprised of 36 able-
bodied participants. This dataset and the implementation of
the ConvNets along with their TL augmented version are
made readily available
1
. Finally, this paper further expands
the aforementioned conference paper by proposing a use-case
experiment on the effect of real-time feedback on the online
performance of a classifier without recalibration over a period
of fourteen days. Note that, due to the stochastic nature of the
algorithms presented in this paper, unless stated otherwise, all
experiments are reported as an average of 20 runs.
This paper is organized as follows. An overview of the
related work in hand gesture recognition through deep learning
and transfer learning/domain adaptation is given in Sec. II.
Sec. III presents the proposed new hand gesture recognition
dataset, with data acquisition and processing details alongside
an overview of the NinaPro DB5 dataset. A presentation
of the different state-of-the-art feature sets employed in this
work is given in Sec. IV. Sec. V thoroughly describes the
proposed networks’ architectures, while Sec. VI presents the
TL algorithm used to augment said architecture. Moreover,
comparisons with the state-of-the-art in gesture recognition
are given in Sec. VII. A real-time use-case experiment on the
ability of users to counteract signal drift from sEMG signals is
presented in Sec. VIII. Finally, results are discussed in Sec. IX.
II. RELATED WORK
sEMG signals can vary significantly between subjects,
even when precisely controlling for electrode placement [9].
Regardless, classifiers trained from a user can be applied
to new participants achieving slightly better than random
performances [9] and high accuracy (85% over 6 gestures)
when augmented with TL on never before seen subjects [10].
As such, sophisticated techniques have been proposed to
leverage inter-user information. For example, research has
been done to find a projection of the feature space that bridges
the gap between an original subject and a new user [11],
[12]. Several works have also proposed leveraging a pre-
trained model removing the need to simultaneously work with
data from multiple users [13], [14], [15]. These non-deep
learning TL approaches showed important performance gains
compared to their non-augmented versions. Although, some of
these gains might be due to the baseline’s poorly optimized
hyperparameters [16].
Short-Time Fourier Transform (STFT) have been sparsely
employed in the last decades for the classification of sEMG
data [17], [18]. A possible reason for this limited interest in
STFT is that much of the research on sEMG-based gesture
recognition focuses on designing feature ensembles [2]. Be-
cause STFT on its own generates large amounts of features
and are relatively computationally expensive, they can be
challenging to integrate with other feature types. Addition-
ally, STFTs have also been shown to be less accurate than
Wavelet Transforms [17] on their own for the classification of
sEMG data. Recently however, STFT features, in the form of
1
https://github.com/Giguelingueling/MyoArmbandDataset
spectrograms, have been applied as input feature space for the
classification of sEMG data by leveraging ConvNets [4], [6].
CWT features have been employed for electrocardiogram
analysis [19], electroencephalography [20] and EMG signal
analysis, but mainly for lower limbs [21], [22]. Wavelet-
based features have been used in the past for sEMG-based
hand gesture recognition [23]. The features employed however,
are based on the Discrete Wavelet Transform [24] and the
Wavelet Packet Transform (WPT) [17] instead of the CWT.
This preference might be due to the fact that both DWT
and WPT are less computationally expensive than the CWT
and are thus better suited to be integrated into an ensemble
of features. Similarly to spectrograms however, CWT offers
an attractive image-like representation to leverage ConvNets
for sEMG signal classification and can now be efficiently
implemented on embedded systems (see Appendix C). To the
best of the authors’ knowledge, this is the first time that CWT
is utilized for sEMG-based hand gesture recognition.
Recently, ConvNets have started to be employed for hand
gesture recognition using single array [4], [5] and matrix [25]
of electrodes. Additionally, other authors applied deep learning
in conjunction with domain adaptation techniques [6] but
for inter-session classification as opposed to the inter-subject
context of this paper. A thorough overview of deep learning
techniques applied to EMG classification is given in [26]. To
the best of our knowledge, this paper, which is an extension
of [7], is the first time inter-user data is leveraged through TL
for training deep learning algorithms on sEMG data.
III. SEMG DATASETS
A. Myo Dataset
One of the major contributions of this article is to provide a
new, publicly available, sEMG-based hand gesture recognition
dataset, referred to as the Myo Dataset. This dataset contains
two distinct sub-datasets with the first one serving as the pre-
training dataset and the second as the evaluation dataset. The
former, which is comprised of 19 able-bodied participants,
should be employed to build, validate and optimize classi-
fication techniques. The latter, comprised of 17 able-bodied
participants, is utilized only for the final testing. To the best
of our knowledge, this is the largest dataset published utilizing
the commercially available Myo Armband (Thalmic Labs) and
it is our hope that it will become a useful tool for the sEMG-
based hand gesture classification community.
The data acquisition protocol was approved by the Comit
´
es
d’
´
Ethique de la Recherche avec des
ˆ
etres humains de
l’Universit
´
e Laval (approbation number: 2017-026/21-02-
2016) and informed consent was obtained from all participants.
1) sEMG Recording Hardware: The electromyographic
activity of each subject’s forearm was recorded with the
Myo Armband; an 8-channel, dry-electrode, low-sampling rate
(200Hz), low-cost consumer-grade sEMG armband.
The Myo is non-intrusive, as the dry-electrodes allow
users to simply slip the bracelet on without any preparation.
Comparatively, gel-based electrodes require the shaving and
washing of the skin to obtain optimal contact between the
subject’s skin and electrodes. Unfortunately, the convenience

3
of the Myo Armband comes with limitations regarding the
quality and quantity of the sEMG signals that are collected.
Indeed, dry electrodes, such as the ones employed in the
Myo, are less accurate and robust to motion artifact than
gel-based ones [27]. Additionally, while the recommended
frequency range of sEMG signals is 5-500Hz [28] requiring
a sampling frequency greater or equal to 1000Hz, the Myo
Armband is limited to 200Hz. This information loss was shown
to significantly impact the ability of various classifiers to
differentiate between hand gestures [29]. As such, robust and
adequate classification techniques are needed to process the
collected signals accurately.
2) Time-Window Length: For real-time control in a closed
loop, input latency is an important factor to consider. A
maximum latency of 300ms was first recommended in [30].
Even though more recent studies suggest that the latency
should optimally be kept between 100-250ms [31], [32],
the performance of the classifier should take priority over
speed [31], [33]. As is the case in [7], a window size of
260ms was selected to achieve a reasonable number of samples
between each prediction due to the low frequency of the Myo.
3) Labeled Data Acquisition Protocol: The seven
hand/wrist gestures considered in this work are depicted in
Fig. 1. For both sub-datasets, the labeled data was created
by requiring the user to hold each gesture for five seconds.
The data recording was manually started by a researcher
only once the participant correctly held the requested gesture.
Generally, ve seconds was given to the user between each
gesture. This rest period was not recorded and as a result,
the final dataset is balanced for all classes. The recording of
the full seven gestures for ve seconds is referred to as a
cycle, with four cycles forming a round. In the case of the
pre-training dataset, a single round is available per subject.
For the evaluation dataset three rounds are available with the
first round utilized for training (i.e. 140s per participant) and
the last two for testing (i.e. 240s per participant).
Fig. 1. The 7 hand/wrist gestures considered in the Myo Dataset.
During recording, participants were instructed to stand up
and have their forearm parallel to the floor and supported by
themselves. For each of them, the armband was systematically
tightened to its maximum and slid up the user’s forearm, until
the circumference of the armband matched that of the forearm.
This was done in an effort to reduce bias from the researchers,
and to emulate the wide variety of armband positions that end-
users without prior knowledge of optimal electrode placement
might use (see Fig. 2). While the electrode placement was not
controlled for, the orientation of the armband was always such
that the blue light bar on the Myo was facing towards the hand
of the subject. Note that this is the case for both left and right
handed subjects. The raw sEMG data of the Myo is what is
made available with this dataset.
Fig. 2. Examples of the range of armband placements on the subjects’ forearm
Signal processing must be applied to efficiently train a
classifier on the data recorded by the Myo armband. The data
is first separated by applying sliding windows of 52 samples
(260ms) with an overlap of 235ms (i.e. 7x190 samples for
one cycle (5s of data)). Employing windows of 260ms allows
40ms for the pre-processing and classification process, while
still staying within the 300ms target [30]. Note that utilizing
sliding windows is viewed as a form of data augmentation in
the present context (see Appendix B). This is done for each
gesture in each cycle on each of the eight channels. As such,
in the dataset, an example corresponds to the eight windows
associated with their respective eight channels. From there, the
processing depends on the classification techniques employed
which will be detailed in Sec. IV and V.
B. NinaPro DB5
The NinaPro DB5 is a dataset built to benchmark sEMG-
based gesture recognition algorithms [34]. This dataset, which
was recorded with the Myo Armband, contains data from
10 able-bodied participants performing a total of 53 different
movements (including neutral) divided into three exercise sets.
The second exercise set, which contains 17 gestures + neutral
gesture, is of particular interest, as it includes all the gestures
considered so far in this work. The 11 additional gestures
which are presented in [35] include wrist pronation, wrist
supination and diverse finger extension amongst others. While
this particular dataset was recorded with two Myo Armband,
only the lower armband is considered as to allow direct
comparison to the preceding dataset.
1) Data Acquisition and Processing: Each participant was
asked to hold a gesture for ve seconds followed by three
seconds of neutral gesture and to repeat this action five more
times (total of six repetitions). This procedure was repeated
for all the movements contained within the dataset. The first
four repetitions serve as the training set (20s per gesture) and
the last two (10s per gesture) as the test set for each gesture.
Note that the rest movement (i.e. neutral gesture) was treated
identically as the other gestures (i.e. first four repetitions for
training (12s) and the next two for testing (6s)).
All data processing (e.g. window size, window overlap) are
exactly as described in the previous sections.
IV. CLASSIC SEMG CLASSIFICATION
Traditionally, one of the most researched aspects of sEMG-
based gesture recognition comes from feature engineering

4
(i.e. manually finding a representation for sEMG signals that
allows easy differentiation between gestures). Over the years,
several efficient combinations of features both in the time and
frequency domain have been proposed [36], [37], [38], [39].
This section presents the feature sets used in this work. See
Appendix D for a description of each feature.
A. Feature Sets
As this paper’s main purpose is to present a deep learning-
based TL approach to the problem of sEMG hand gesture
recognition, contextualizing the performance of the proposed
algorithms within the current state-of-the-art is essential. As
such, four different feature sets were taken from the litera-
ture to serve as a comparison basis. The four feature sets
will be tested on ve of the most common classifiers em-
ployed for sEMG pattern recognition: Support Vector Machine
(SVM) [38], Artificial Neural Networks (ANN) [40], Ran-
dom Forest (RF) [38], K-Nearest Neighbors (KNN) [38] and
Linear Discriminant Analysis (LDA) [39]. Hyperparameters
for each classifier were selected by employing three fold
cross-validation alongside random search, testing 50 different
combinations of hyperparameters for each participant’s dataset
for each classifier. The hyperparameters considered for each
classifier are presented in Appendix E.
As is often the case, dimensionality reduction is applied [1],
[3], [41]. LDA was chosen to perform feature projection as
it is computationally inexpensive, devoid of hyperparameters
and was shown to allow for robust classification accuracy
for sEMG-based gesture recognition [39], [42]. A comparison
of the accuracy obtained with and without dimensionality
reduction on the Myo Dataset is given in Appendix F. This
comparison shows that in the vast majority of cases, the
dimensionality reduction both reduced the computational load
and enhanced the average performances of the feature sets.
The implementation employed for all the classifiers comes
from the scikit-learn (v.1.13.1) Python package [43]. The four
feature sets employed for comparison purposes are:
1) Time Domain Features (TD) [37]: This set of features,
which is probably the most commonly employed in the litera-
ture [29], often serves as the basis for bigger feature sets [1],
[39], [34]. As such, TD is particularly well suited to serve as
a baseline comparison for new classification techniques. The
four features are: Mean Absolute Value (MAV), Zero Crossing
(ZC), Slope Sign Changes (SSC) and Waveform Length (WL).
2) Enhanced TD [39]: This set of features includes the TD
features in combination with Skewness, Root Mean Square
(RMS), Integrated EMG (IEMG), Autoregression Coefficients
(AR) (P=11) and the Hjorth Parameters. It was shown to
achieve excellent performances on a setup similar to the one
employed in this article.
3) Nina Pro Features [38], [34]: This set of features was
selected as it was found to perform the best in the article
introducing the NinaPro dataset. The set consists of the the
following features: RMS, Marginal Discrete Wavelet Trans-
form (mDWT) (wavelet=db7, S=3), EMG Histogram (HIST)
(bins=20, threshold=3σ) and the TD features.
4) SampEn Pipeline [36]: This last feature combination
was selected among fifty features that were evaluated and
ranked to find the most discriminating ones. The SampEn
feature was ranked first amongst all the others. The best multi-
features set found was composed of: SampEn(m=2, r=0.2σ),
Cepstral Coefficient (order=4), RMS and WL.
V. DEEP LEARNING CLASSIFIERS OVERVIEW
ConvNets tend to be computationally expensive and thus
ill-suited for embedded systems, such as those required when
guiding a prosthetic. However, in recent years, algorithmic
improvements and new hardware architectures have allowed
for complex networks to run on very low power systems
(see Appendix C). As previously mentioned, the inherent
limitations of sEMG-based gesture recognition force the pro-
posed ConvNets to contend with a limited amount of data
from any single individual. To address the over-fitting issue,
Monte Carlo Dropout (MC Dropout) [44], Batch Normaliza-
tion (BN) [45], and early stopping are employed.
A. Batch Normalization
BN is a technique that accelerates training and provides
some form of regularization with the aims of maintaining a
standard distribution of hidden layer activation values through-
out training [45]. BN accomplishes this by normalizing the
mean and variance of each dimension of a batch of examples.
To achieve this, a linear transformation based on two learned
parameters is applied to each dimension. This process is done
independently for each layer of the network. Once training is
completed, the whole dataset is fed through the network one
last time to compute the final normalization parameters in a
layer-wise fashion. At test time, these parameters are applied
to normalize the layer activations. BN was shown to yield
faster training times whilst allowing better generalization.
B. Proposed Convolutional Network Architectures
Videos are a representation of how spatial information
(images) change through time. Previous works have combined
this representation with ConvNets to address classification
tasks [46], [47]. One such successful algorithm is the slow-
fusion model [47] (see Fig. 3).
Fig. 3. Typical slow-fusion ConvNet architecture [47]. In this graph, the input
(represented by grey rectangles) is a video (i.e. a sequence of images). The
model separates the temporal part of the examples into disconnected parallel
layers, which are then slowly fused together throughout the network.
When calculating the spectrogram of a signal, the informa-
tion is structured in a Time x Frequency fashion (Time x Scale
for CWT). When the signal comes from an array of electrodes,

5
these examples can naturally be structured as Time x Spa-
tial x Frequency (Time x Spatial x Scale for CWT). As such,
the motivation for using a slow-fusion architecture based Con-
vNet in this work is due to the similarities between videos data
and the proposed characterization of sEMG signals, as both
representations have analogous structures (i.e. Time x Spa-
tial x Spatial for videos) and can describe non-stationary
information. Additionally, the proposed architectures inspired
by the slow-fusion model were by far the most successful of
the ones tried on the pre-training dataset.
1) ConvNet for Spectrograms: The spectrograms, which are
fed to the ConvNet, were calculated with Hann windows of
length 28 and an overlap of 20 yielding a matrix of 4x15. The
first frequency band was removed in an effort to reduce base-
line drift and motion artifact. As the armband features eight
channels, eight such spectrograms were calculated, yielding a
final matrix of 4x8x14 (Time x Channel x Frequency).
The implementation of the spectrogram ConvNet architec-
ture (see Appendix A, Fig. 8) was created with Theano [48]
and Lasagne [49]. As usual in deep learning, the architecture
was created in a trial and error process taking inspiration from
previous architectures (primarily [4], [6], [47], [7]). The non-
linear activation functions employed are the parametric expo-
nential linear unit (PELU) [50] and PReLU [51]. ADAM [52]
is utilized for the optimization of the ConvNet (learning
rate=0.00681292). The deactivation rate for MC Dropout is
set at 0.5 and the batch size at 128. Finally, to further reduce
overfitting, early stopping is employed by randomly removing
10% of the data from the training and using it as a validation
set at the beginning of the optimization process. Note that
learning rate annealing is applied with a factor of 5 when
the validation loss stops improving. The training stops when
two consecutive decays occurs with no network performance
amelioration on the validation set. All hyperparameter values
were found by a random search on the pre-training dataset.
2) ConvNet for Continuous Wavelet Transforms: The archi-
tecture for the CWT ConvNet, (Appendix A, Fig. 9), was built
in a similar fashion as the spectrogram ConvNet one. Both the
Morlet and Mexican Hat wavelet were considered for this work
due to their previous application in EMG-related work [53],
[54]. In the end, the Mexican Hat wavelet was selected, as it
was the best performing during cross-validation on the pre-
training dataset. The CWTs were calculated with 32 scales
yielding a 32x52 matrix. Downsampling is then applied at a
factor of 0.25 employing spline interpolation of order 0 to
reduce the computational load of the ConvNet during training
and inference. Following downsampling, similarly to the spec-
trogram, the last row of the calculated CWT was removed as to
reduce baseline drift and motion artifact. Additionally, the last
column of the calculated CWT was also removed as to provide
an even number of time-columns from which to perform the
slow-fusion process. The final matrix shape is thus 12x8x7 (i.e.
Time x Channel x Scale). The MC Dropout deactivation rate,
batch size, optimization algorithm, and activation functions
remained unchanged. The learning rate was set at 0.0879923
(found by cross-validation).
3) ConvNet for raw EMG: A third ConvNet architecture
taking the raw EMG signal as input is also considered. This
network will help assess if employing time-frequency features
lead to sufficient gains in accuracy performance to justify the
increase in computational cost. As the raw EMG represents
a completely different modality, a new type of architecture
must be employed. To reduce bias from the authors as much
as possible, the architecture considered is the one presented
in [55]. The raw ConvNet architecture can be seen in Ap-
pendix A, Fig. 10. This architecture was selected as it was
also designed to classify a hand gesture dataset employing the
Myo Armband. The architecture implementation (in PyTorch
v.0.4.1) is exactly as described in [55] except for the learning
rate (=1.1288378916846883e 5) which was found by cross-
validation (tested 20 uniformly distributed values between
1e6 to 1e1 on a logarithm scale) and extending the length
of the window size as to match with the rest of this manuscript.
The raw ConvNet is further enhanced by introducing a second
convolutional and pooling layer as well as adding dropout,
BN, replacing RELU activation function with PReLU and
using ADAM (learning rate=0.002335721469090121) as the
optimizer. The enhanced raw ConvNets architecture, which
is shown in Appendix A, Fig. 11, achieves an average accu-
racy of 97.88% compared to 94.85% for the raw ConvNet.
Consequently, all experiments using raw emg as input will
employ the raw enhanced ConvNet.
VI. TRANSFER LEARNING
One of the main advantages of deep learning comes from
its ability to leverage large amounts of data for learning. As
it would be too time-consuming for a single individual to
record tens of thousands of examples, this work proposes to
aggregate the data of multiple individuals. The main challenge
thus becomes to find a way to leverage data from multiple
users, with the objective of achieving higher accuracy with less
data. TL techniques are well suited for such a task, allowing
the ConvNets to generate more general and robust features
that can be applied to a new subject’s sEMG activity.
As the data recording was purposefully as unconstrained
as possible, the armband’s orientation from one subject to
another can vary widely. As such, to allow for the use of TL,
automatic alignment is a necessary first step. The alignment
for each subject was made by identifying the most active
channel (calculated using the IEMG feature) for each gesture
on the first subject. On subsequent subjects, the channels were
then circularly shifted until their activation for each gesture
matched those of the first subject as closely as possible.
A. Progressive Neural Networks
Fine-tuning is the most prevalent TL technique in deep
learning [56], [57]. It consists of training a model on a
source domain (abundance of labeled data) and using the
trained weights as a starting point when presented with a
new task. However, fine-tuning can suffer from catastrophic
forgetting [58], where relevant and important features learned
during pre-training are lost on the target domain (i.e. new
task). Moreover, by design, fine-tuning is ill-suited when
significant differences exist between the source and the target,
as it can bias the network into poorly adapted features for

Figures
Citations
More filters
Journal ArticleDOI

A Comprehensive Survey on Transfer Learning

TL;DR: Transfer learning aims to improve the performance of target learners on target domains by transferring the knowledge contained in different but related source domains as discussed by the authors, in which the dependence on a large number of target-domain data can be reduced for constructing target learners.
Journal ArticleDOI

EMG Pattern Recognition in the Era of Big Data and Deep Learning

TL;DR: The main factors that expand EMG data resources into the era of big data are introduced and directions for future research in EMG pattern recognition are outlined and discussed.
Journal ArticleDOI

Deep Learning in Physiological Signal Data: A Survey.

TL;DR: The objective of this paper is to conduct a detailed study to comprehend, categorize, and compare the key parameters of the deep-learning approaches that have been used in physiological signal analysis for various medical applications.
Journal ArticleDOI

Deep Learning for EMG-based Human-Machine Interaction: A Review

TL;DR: In this paper, a literature review describes the role that deep learning plays in EMG-based human-machine interaction (HMI) applications and provides an overview of typical network structures and processing schemes.
Journal ArticleDOI

Real-Time Surface EMG Pattern Recognition for Hand Gestures Based on an Artificial Neural Network.

TL;DR: A real-time hand gesture recognition model using sEMG is proposed that might be able to recognize a gesture before the gesture is completed, and a feedforward artificial neural network (ANN) is founded and trained by the training dataset.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What are the contributions mentioned in the paper "Deep learning for electromyographic hand gesture signal classification using transfer learning" ?

This work ’ s hypothesis is that general, informative features can be learned from the large amounts of data generated by aggregating the signals of multiple users, thus reducing the recording burden while enhancing gesture recognition. Consequently, this paper proposes applying transfer learning on aggregated data from multiple users, while leveraging the capacity of deep learning algorithms to learn discriminant features from large datasets. Two datasets comprised of 19 and 17 able-bodied participants respectively ( the first one is employed for pre-training ) were recorded for this work, using the Myo Armband. Finally, a use-case study employing eight able-bodied participants suggests that real-time feedback allows users to adapt their muscle activation strategy which reduces the degradation in accuracy normally experienced over time. 

Future works will focus on adapting and testing the proposed TL algorithm on upper-extremity amputees. This will provide additional challenges due to the greater muscle variability across amputees and the decrease in classification accuracy compared to able-bodied participants [ 35 ]. Additionally, tests for the application of the proposed TL algorithm for inter-session classification will be conducted as to be able to leverage labeled information for long-term classification. 

Due to the application of the multi-stream AdaBatch scheme, the source task in the present context is to learn the general mapping between muscle activity and gestures. 

For the evaluation dataset three rounds are available with the first round utilized for training (i.e. 140s per participant) and the last two for testing (i.e. 240s per participant). 

The most straightforward way of addressing this would be to numerically remove the relevant channels from the dataset used for pre-training. 

Hyperparameters for each classifier were selected by employing three fold cross-validation alongside random search, testing 50 different combinations of hyperparameters for each participant’s dataset for each classifier. 

1) Data Acquisition and Processing: Each participant was asked to hold a gesture for five seconds followed by three seconds of neutral gesture and to repeat this action five more times (total of six repetitions). 

As this paper’s main purpose is to present a deep learningbased TL approach to the problem of sEMG hand gesture recognition, contextualizing the performance of the proposed algorithms within the current state-of-the-art is essential. 

one could replace the learned scalar layers in the target network by convolutions or fully connected layers to bridge the dimensionality gap between potentially vastly different source and second networks. 

As such, in an effort to quantify the impact of muscle fatigue on the classifier’s performance, the average accuracy of the eight participants over the five minute session is computed as a function of time. 

These transitions are not part of the training dataset, because they are too time consuming to record as the number of possible transitions equals n2−n where n is the number of gestures. 

tests for the application of the proposed TL algorithm for inter-session classification will be conducted as to be able to leverage labeled information for long-term classification. 

When calculating the spectrogram of a signal, the information is structured in a Time x Frequency fashion (Time x Scale for CWT). 

Showing that deep learning algorithms can be efficiently trained, within the inherent constraints of sEMGbased hand gesture recognition, offers exciting new research avenues for this field. 

This suggests that the proposed TL algorithm enables the network to learn features that can generalize not only across participants but also for never-seenbefore gestures.