What future works have the authors mentioned in the paper "Deep learning for electromyographic hand gesture signal classification using transfer learning" ?

Future works will focus on adapting and testing the proposed TL algorithm on upper-extremity amputees. This will provide additional challenges due to the greater muscle variability across amputees and the decrease in classification accuracy compared to able-bodied participants [ 35 ]. Additionally, tests for the application of the proposed TL algorithm for inter-session classification will be conducted as to be able to leverage labeled information for long-term classification.

Why is the source task difficult in the present context?

Due to the application of the multi-stream AdaBatch scheme, the source task in the present context is to learn the general mapping between muscle activity and gestures.

How many rounds are available for the evaluation dataset?

For the evaluation dataset three rounds are available with the first round utilized for training (i.e. 140s per participant) and the last two for testing (i.e. 240s per participant).

What would be the easiest way to address this?

The most straightforward way of addressing this would be to numerically remove the relevant channels from the dataset used for pre-training.

How many repetitions were required to perform the gesture?

1) Data Acquisition and Processing: Each participant was asked to hold a gesture for five seconds followed by three seconds of neutral gesture and to repeat this action five more times (total of six repetitions).

What is the way to bridge the dimensionality gap between the two networks?

one could replace the learned scalar layers in the target network by convolutions or fully connected layers to bridge the dimensionality gap between potentially vastly different source and second networks.

How is the average accuracy of the participants calculated?

As such, in an effort to quantify the impact of muscle fatigue on the classifier’s performance, the average accuracy of the eight participants over the five minute session is computed as a function of time.

Why are these transitions not part of the training dataset?

These transitions are not part of the training dataset, because they are too time consuming to record as the number of possible transitions equals n2−n where n is the number of gestures.

How will the proposed TL algorithm be used?

tests for the application of the proposed TL algorithm for inter-session classification will be conducted as to be able to leverage labeled information for long-term classification.

How can deep learning algorithms be efficiently trained?

Showing that deep learning algorithms can be efficiently trained, within the inherent constraints of sEMGbased hand gesture recognition, offers exciting new research avenues for this field.

What is the significance of the proposed TL algorithm?

This suggests that the proposed TL algorithm enables the network to learn features that can generalize not only across participants but also for never-seenbefore gestures.

(Open Access) Deep Learning for Electromyographic Hand Gesture Signal Classification Using Transfer Learning (2019) | Ulysse Côté-Allard

Q: What are the contributions mentioned in the paper "Deep learning for electromyographic hand gesture signal classification using transfer learning" ?

This work ’ s hypothesis is that general, informative features can be learned from the large amounts of data generated by aggregating the signals of multiple users, thus reducing the recording burden while enhancing gesture recognition. Consequently, this paper proposes applying transfer learning on aggregated data from multiple users, while leveraging the capacity of deep learning algorithms to learn discriminant features from large datasets. Two datasets comprised of 19 and 17 able-bodied participants respectively ( the first one is employed for pre-training ) were recorded for this work, using the Myo Armband. Finally, a use-case study employing eight able-bodied participants suggests that real-time feedback allows users to adapt their muscle activation strategy which reduces the degradation in accuracy normally experienced over time.

Q: How many different combinations of hyperparameters were tested for each classifier?

Hyperparameters for each classifier were selected by employing three fold cross-validation alongside random search, testing 50 different combinations of hyperparameters for each participant’s dataset for each classifier.

Q: What is the main purpose of this paper?

As this paper’s main purpose is to present a deep learningbased TL approach to the problem of sEMG hand gesture recognition, contextualizing the performance of the proposed algorithms within the current state-of-the-art is essential.

Deep Learning for Electromyographic Hand Gesture

Signal Classiﬁcation Using Transfer Learning

Ulysse C

e-Allard, Cheikh Latyr Fall, Alexandre Drouin,

Alexandre Campeau-Lecours, Cl

ement Gosselin, Kyrre Glette, Franc¸ois Laviolette†, and Benoit Gosselin†

Abstract—In recent years, deep learning algorithms have

become increasingly more prominent for their unparalleled

ability to automatically learn discriminant features from large

amounts of data. However, within the ﬁeld of electromyography-

based gesture recognition, deep learning algorithms are seldom

employed as they require an unreasonable amount of effort from

a single person, to generate tens of thousands of examples.

This work’s hypothesis is that general, informative features can

be learned from the large amounts of data generated by aggre-

gating the signals of multiple users, thus reducing the recording

burden while enhancing gesture recognition. Consequently, this

paper proposes applying transfer learning on aggregated data

from multiple users, while leveraging the capacity of deep learn-

ing algorithms to learn discriminant features from large datasets.

Two datasets comprised of 19 and 17 able-bodied participants

respectively (the ﬁrst one is employed for pre-training) were

recorded for this work, using the Myo Armband. A third Myo

Armband dataset was taken from the NinaPro database and

is comprised of 10 able-bodied participants. Three different

deep learning networks employing three different modalities

as input (raw EMG, Spectrograms and Continuous Wavelet

Transform (CWT)) are tested on the second and third dataset.

The proposed transfer learning scheme is shown to systematically

and signiﬁcantly enhance the performance for all three networks

on the two datasets, achieving an ofﬂine accuracy of 98.31%

for 7 gestures over 17 participants for the CWT-based ConvNet

and 68.98% for 18 gestures over 10 participants for the raw

EMG-based ConvNet. Finally, a use-case study employing eight

able-bodied participants suggests that real-time feedback allows

users to adapt their muscle activation strategy which reduces the

degradation in accuracy normally experienced over time.

Index Terms—[h] Surface Electromyography, EMG, Transfer

Learning, Domain Adaptation, Deep Learning, Convolutional

Networks, Hand Gesture Recognition

I. INTRODUCTION

Robotics and artiﬁcial intelligence can be leveraged to

increase the autonomy of people living with disabilities. This is

accomplished, in part, by enabling users to seamlessly interact

with robots to complete their daily tasks with increased inde-

pendence. In the context of hand prosthetic control, muscle

activity provides an intuitive interface on which to perform

hand gesture recognition [1]. This activity can be recorded by

surface electromyography (sEMG), a non-invasive technique

Ulysse C

e-Allard*, Cheikh Latyr Fall and Benoit Gosselin are with

the Department of Computer and Electrical Engineering, Alexandre Drouin

and Franc¸ois Laviolette are with the Department of Computer Science and

Software Engineering, Alexandre Campeau-Lecours and Cl

ement Gosselin are

with the Department of Mechanical Engineering, Universit

e Laval, Qu

ebec,

ebec, Canada. Kyrre Glette is with RITMO and the Department of

Informatics, University of Oslo, Oslo, Norway.

*Contact author email: ulysse.cote-allard.1@ulaval.ca

† These authors share senior authorship.

widely adopted both in research and clinical settings. The

sEMG signals, which are non-stationary, represent the sum

of subcutaneous motor action potentials generated through

muscular contraction [1]. Artiﬁcial intelligence can then be

leveraged as the bridge between sEMG signals and the pros-

thetic behavior.

The literature on sEMG-based gesture recognition primarily

focuses on feature engineering, with the goal of characterizing

sEMG signals in a discriminative way [1], [2], [3]. Recently,

researchers have proposed deep learning approaches [4], [5],

[6], shifting the paradigm from feature engineering to feature

learning. Regardless of the method employed, the end-goal

remains the improvement of the classiﬁer’s robustness. One

of the main factors for accurate predictions, especially when

working with deep learning algorithms, is the amount of

training data available. Hand gesture recognition creates a

peculiar context where a single user cannot realistically be

expected to generate tens of thousands of examples in a

single sitting. Large amounts of data can however be obtained

by aggregating the recordings of multiple participants, thus

fostering the conditions necessary to learn a general mapping

of users’ sEMG signal. This mapping might then facilitate

the hand gestures’ discrimination task with new subjects.

Consequently, deep learning offers a particularly attractive

context from which to develop a Transfer Learning (TL)

algorithm to leverage inter-user data by pre-training a model

on multiple subjects before training it on a new participant.

As such, the main contribution of this work is to present a

new TL scheme employing a convolutional network (ConvNet)

to leverage inter-user data within the context of sEMG-

based gesture recognition. A previous work [7] has already

shown that learning simultaneously from multiple subjects

signiﬁcantly enhances the ConvNet’s performance whilst re-

ducing the size of the required training dataset typically seen

with deep learning algorithms. This paper expands upon the

aforementioned conference paper’s work, improving the TL

algorithm to reduce its computational load and improving

its performance. Additionally, three new ConvNet architec-

tures, employing three different input modalities, speciﬁcally

designed for the robust and efﬁcient classiﬁcation of sEMG

signals are presented. The raw signal, short-time Fourier

transform-based spectrogram and Continuous Wavelet Trans-

form (CWT) are considered for the characterization of the

sEMG signals to be fed to these ConvNets. To the best of

the authors’ knowledge, this is the ﬁrst time that CWTs are

employed as features for the classiﬁcation of sEMG-based

hand gesture recognition (although they have been proposed

for the analysis of myoelectric signals [8]). Another major

contribution of this article is the publication of a new sEMG-

based gesture classiﬁcation dataset comprised of 36 able-

bodied participants. This dataset and the implementation of

the ConvNets along with their TL augmented version are

made readily available

. Finally, this paper further expands

the aforementioned conference paper by proposing a use-case

experiment on the effect of real-time feedback on the online

performance of a classiﬁer without recalibration over a period

of fourteen days. Note that, due to the stochastic nature of the

algorithms presented in this paper, unless stated otherwise, all

experiments are reported as an average of 20 runs.

This paper is organized as follows. An overview of the

related work in hand gesture recognition through deep learning

and transfer learning/domain adaptation is given in Sec. II.

Sec. III presents the proposed new hand gesture recognition

dataset, with data acquisition and processing details alongside

an overview of the NinaPro DB5 dataset. A presentation

of the different state-of-the-art feature sets employed in this

work is given in Sec. IV. Sec. V thoroughly describes the

proposed networks’ architectures, while Sec. VI presents the

TL algorithm used to augment said architecture. Moreover,

comparisons with the state-of-the-art in gesture recognition

are given in Sec. VII. A real-time use-case experiment on the

ability of users to counteract signal drift from sEMG signals is

presented in Sec. VIII. Finally, results are discussed in Sec. IX.

II. RELATED WORK

sEMG signals can vary signiﬁcantly between subjects,

even when precisely controlling for electrode placement [9].

Regardless, classiﬁers trained from a user can be applied

to new participants achieving slightly better than random

performances [9] and high accuracy (85% over 6 gestures)

when augmented with TL on never before seen subjects [10].

As such, sophisticated techniques have been proposed to

leverage inter-user information. For example, research has

been done to ﬁnd a projection of the feature space that bridges

the gap between an original subject and a new user [11],

[12]. Several works have also proposed leveraging a pre-

trained model removing the need to simultaneously work with

data from multiple users [13], [14], [15]. These non-deep

learning TL approaches showed important performance gains

compared to their non-augmented versions. Although, some of

these gains might be due to the baseline’s poorly optimized

hyperparameters [16].

Short-Time Fourier Transform (STFT) have been sparsely

employed in the last decades for the classiﬁcation of sEMG

data [17], [18]. A possible reason for this limited interest in

STFT is that much of the research on sEMG-based gesture

recognition focuses on designing feature ensembles [2]. Be-

cause STFT on its own generates large amounts of features

and are relatively computationally expensive, they can be

challenging to integrate with other feature types. Addition-

ally, STFTs have also been shown to be less accurate than

Wavelet Transforms [17] on their own for the classiﬁcation of

sEMG data. Recently however, STFT features, in the form of

https://github.com/Giguelingueling/MyoArmbandDataset

spectrograms, have been applied as input feature space for the

classiﬁcation of sEMG data by leveraging ConvNets [4], [6].

CWT features have been employed for electrocardiogram

analysis [19], electroencephalography [20] and EMG signal

analysis, but mainly for lower limbs [21], [22]. Wavelet-

based features have been used in the past for sEMG-based

hand gesture recognition [23]. The features employed however,

are based on the Discrete Wavelet Transform [24] and the

Wavelet Packet Transform (WPT) [17] instead of the CWT.

This preference might be due to the fact that both DWT

and WPT are less computationally expensive than the CWT

and are thus better suited to be integrated into an ensemble

of features. Similarly to spectrograms however, CWT offers

an attractive image-like representation to leverage ConvNets

for sEMG signal classiﬁcation and can now be efﬁciently

implemented on embedded systems (see Appendix C). To the

best of the authors’ knowledge, this is the ﬁrst time that CWT

is utilized for sEMG-based hand gesture recognition.

Recently, ConvNets have started to be employed for hand

gesture recognition using single array [4], [5] and matrix [25]

of electrodes. Additionally, other authors applied deep learning

in conjunction with domain adaptation techniques [6] but

for inter-session classiﬁcation as opposed to the inter-subject

context of this paper. A thorough overview of deep learning

techniques applied to EMG classiﬁcation is given in [26]. To

the best of our knowledge, this paper, which is an extension

of [7], is the ﬁrst time inter-user data is leveraged through TL

for training deep learning algorithms on sEMG data.

III. SEMG DATASETS

A. Myo Dataset

One of the major contributions of this article is to provide a

new, publicly available, sEMG-based hand gesture recognition

dataset, referred to as the Myo Dataset. This dataset contains

two distinct sub-datasets with the ﬁrst one serving as the pre-

training dataset and the second as the evaluation dataset. The

former, which is comprised of 19 able-bodied participants,

should be employed to build, validate and optimize classi-

ﬁcation techniques. The latter, comprised of 17 able-bodied

participants, is utilized only for the ﬁnal testing. To the best

of our knowledge, this is the largest dataset published utilizing

the commercially available Myo Armband (Thalmic Labs) and

it is our hope that it will become a useful tool for the sEMG-

based hand gesture classiﬁcation community.

The data acquisition protocol was approved by the Comit

d’

Ethique de la Recherche avec des

etres humains de

l’Universit

e Laval (approbation number: 2017-026/21-02-

2016) and informed consent was obtained from all participants.

1) sEMG Recording Hardware: The electromyographic

activity of each subject’s forearm was recorded with the

Myo Armband; an 8-channel, dry-electrode, low-sampling rate

(200Hz), low-cost consumer-grade sEMG armband.

The Myo is non-intrusive, as the dry-electrodes allow

users to simply slip the bracelet on without any preparation.

Comparatively, gel-based electrodes require the shaving and

washing of the skin to obtain optimal contact between the

subject’s skin and electrodes. Unfortunately, the convenience

of the Myo Armband comes with limitations regarding the

quality and quantity of the sEMG signals that are collected.

Indeed, dry electrodes, such as the ones employed in the

Myo, are less accurate and robust to motion artifact than

gel-based ones [27]. Additionally, while the recommended

frequency range of sEMG signals is 5-500Hz [28] requiring

a sampling frequency greater or equal to 1000Hz, the Myo

Armband is limited to 200Hz. This information loss was shown

to signiﬁcantly impact the ability of various classiﬁers to

differentiate between hand gestures [29]. As such, robust and

adequate classiﬁcation techniques are needed to process the

collected signals accurately.

2) Time-Window Length: For real-time control in a closed

loop, input latency is an important factor to consider. A

maximum latency of 300ms was ﬁrst recommended in [30].

Even though more recent studies suggest that the latency

should optimally be kept between 100-250ms [31], [32],

the performance of the classiﬁer should take priority over

speed [31], [33]. As is the case in [7], a window size of

260ms was selected to achieve a reasonable number of samples

between each prediction due to the low frequency of the Myo.

3) Labeled Data Acquisition Protocol: The seven

hand/wrist gestures considered in this work are depicted in

Fig. 1. For both sub-datasets, the labeled data was created

by requiring the user to hold each gesture for ﬁve seconds.

The data recording was manually started by a researcher

only once the participant correctly held the requested gesture.

Generally, ﬁve seconds was given to the user between each

gesture. This rest period was not recorded and as a result,

the ﬁnal dataset is balanced for all classes. The recording of

the full seven gestures for ﬁve seconds is referred to as a

cycle, with four cycles forming a round. In the case of the

pre-training dataset, a single round is available per subject.

For the evaluation dataset three rounds are available with the

ﬁrst round utilized for training (i.e. 140s per participant) and

the last two for testing (i.e. 240s per participant).

Fig. 1. The 7 hand/wrist gestures considered in the Myo Dataset.

During recording, participants were instructed to stand up

and have their forearm parallel to the ﬂoor and supported by

themselves. For each of them, the armband was systematically

tightened to its maximum and slid up the user’s forearm, until

the circumference of the armband matched that of the forearm.

This was done in an effort to reduce bias from the researchers,

and to emulate the wide variety of armband positions that end-

users without prior knowledge of optimal electrode placement

might use (see Fig. 2). While the electrode placement was not

controlled for, the orientation of the armband was always such

that the blue light bar on the Myo was facing towards the hand

of the subject. Note that this is the case for both left and right

handed subjects. The raw sEMG data of the Myo is what is

made available with this dataset.

Fig. 2. Examples of the range of armband placements on the subjects’ forearm

Signal processing must be applied to efﬁciently train a

classiﬁer on the data recorded by the Myo armband. The data

is ﬁrst separated by applying sliding windows of 52 samples

(260ms) with an overlap of 235ms (i.e. 7x190 samples for

one cycle (5s of data)). Employing windows of 260ms allows

40ms for the pre-processing and classiﬁcation process, while

still staying within the 300ms target [30]. Note that utilizing

sliding windows is viewed as a form of data augmentation in

the present context (see Appendix B). This is done for each

gesture in each cycle on each of the eight channels. As such,

in the dataset, an example corresponds to the eight windows

associated with their respective eight channels. From there, the

processing depends on the classiﬁcation techniques employed

which will be detailed in Sec. IV and V.

B. NinaPro DB5

The NinaPro DB5 is a dataset built to benchmark sEMG-

based gesture recognition algorithms [34]. This dataset, which

was recorded with the Myo Armband, contains data from

10 able-bodied participants performing a total of 53 different

movements (including neutral) divided into three exercise sets.

The second exercise set, which contains 17 gestures + neutral

gesture, is of particular interest, as it includes all the gestures

considered so far in this work. The 11 additional gestures

which are presented in [35] include wrist pronation, wrist

supination and diverse ﬁnger extension amongst others. While

this particular dataset was recorded with two Myo Armband,

only the lower armband is considered as to allow direct

comparison to the preceding dataset.

1) Data Acquisition and Processing: Each participant was

asked to hold a gesture for ﬁve seconds followed by three

seconds of neutral gesture and to repeat this action ﬁve more

times (total of six repetitions). This procedure was repeated

for all the movements contained within the dataset. The ﬁrst

four repetitions serve as the training set (20s per gesture) and

the last two (10s per gesture) as the test set for each gesture.

Note that the rest movement (i.e. neutral gesture) was treated

identically as the other gestures (i.e. ﬁrst four repetitions for

training (12s) and the next two for testing (6s)).

All data processing (e.g. window size, window overlap) are

exactly as described in the previous sections.

IV. CLASSIC SEMG CLASSIFICATION

Traditionally, one of the most researched aspects of sEMG-

based gesture recognition comes from feature engineering

(i.e. manually ﬁnding a representation for sEMG signals that

allows easy differentiation between gestures). Over the years,

several efﬁcient combinations of features both in the time and

frequency domain have been proposed [36], [37], [38], [39].

This section presents the feature sets used in this work. See

Appendix D for a description of each feature.

A. Feature Sets

As this paper’s main purpose is to present a deep learning-

based TL approach to the problem of sEMG hand gesture

recognition, contextualizing the performance of the proposed

algorithms within the current state-of-the-art is essential. As

such, four different feature sets were taken from the litera-

ture to serve as a comparison basis. The four feature sets

will be tested on ﬁve of the most common classiﬁers em-

ployed for sEMG pattern recognition: Support Vector Machine

(SVM) [38], Artiﬁcial Neural Networks (ANN) [40], Ran-

dom Forest (RF) [38], K-Nearest Neighbors (KNN) [38] and

Linear Discriminant Analysis (LDA) [39]. Hyperparameters

for each classiﬁer were selected by employing three fold

cross-validation alongside random search, testing 50 different

combinations of hyperparameters for each participant’s dataset

for each classiﬁer. The hyperparameters considered for each

classiﬁer are presented in Appendix E.

As is often the case, dimensionality reduction is applied [1],

[3], [41]. LDA was chosen to perform feature projection as

it is computationally inexpensive, devoid of hyperparameters

and was shown to allow for robust classiﬁcation accuracy

for sEMG-based gesture recognition [39], [42]. A comparison

of the accuracy obtained with and without dimensionality

reduction on the Myo Dataset is given in Appendix F. This

comparison shows that in the vast majority of cases, the

dimensionality reduction both reduced the computational load

and enhanced the average performances of the feature sets.

The implementation employed for all the classiﬁers comes

from the scikit-learn (v.1.13.1) Python package [43]. The four

feature sets employed for comparison purposes are:

1) Time Domain Features (TD) [37]: This set of features,

which is probably the most commonly employed in the litera-

ture [29], often serves as the basis for bigger feature sets [1],

[39], [34]. As such, TD is particularly well suited to serve as

a baseline comparison for new classiﬁcation techniques. The

four features are: Mean Absolute Value (MAV), Zero Crossing

(ZC), Slope Sign Changes (SSC) and Waveform Length (WL).

2) Enhanced TD [39]: This set of features includes the TD

features in combination with Skewness, Root Mean Square

(RMS), Integrated EMG (IEMG), Autoregression Coefﬁcients

(AR) (P=11) and the Hjorth Parameters. It was shown to

achieve excellent performances on a setup similar to the one

employed in this article.

3) Nina Pro Features [38], [34]: This set of features was

selected as it was found to perform the best in the article

introducing the NinaPro dataset. The set consists of the the

following features: RMS, Marginal Discrete Wavelet Trans-

form (mDWT) (wavelet=db7, S=3), EMG Histogram (HIST)

(bins=20, threshold=3σ) and the TD features.

4) SampEn Pipeline [36]: This last feature combination

was selected among ﬁfty features that were evaluated and

ranked to ﬁnd the most discriminating ones. The SampEn

feature was ranked ﬁrst amongst all the others. The best multi-

features set found was composed of: SampEn(m=2, r=0.2σ),

Cepstral Coefﬁcient (order=4), RMS and WL.

V. DEEP LEARNING CLASSIFIERS OVERVIEW

ConvNets tend to be computationally expensive and thus

ill-suited for embedded systems, such as those required when

guiding a prosthetic. However, in recent years, algorithmic

improvements and new hardware architectures have allowed

for complex networks to run on very low power systems

(see Appendix C). As previously mentioned, the inherent

limitations of sEMG-based gesture recognition force the pro-

posed ConvNets to contend with a limited amount of data

from any single individual. To address the over-ﬁtting issue,

Monte Carlo Dropout (MC Dropout) [44], Batch Normaliza-

tion (BN) [45], and early stopping are employed.

A. Batch Normalization

BN is a technique that accelerates training and provides

some form of regularization with the aims of maintaining a

standard distribution of hidden layer activation values through-

out training [45]. BN accomplishes this by normalizing the

mean and variance of each dimension of a batch of examples.

To achieve this, a linear transformation based on two learned

parameters is applied to each dimension. This process is done

independently for each layer of the network. Once training is

completed, the whole dataset is fed through the network one

last time to compute the ﬁnal normalization parameters in a

layer-wise fashion. At test time, these parameters are applied

to normalize the layer activations. BN was shown to yield

faster training times whilst allowing better generalization.

B. Proposed Convolutional Network Architectures

Videos are a representation of how spatial information

(images) change through time. Previous works have combined

this representation with ConvNets to address classiﬁcation

tasks [46], [47]. One such successful algorithm is the slow-

fusion model [47] (see Fig. 3).

Fig. 3. Typical slow-fusion ConvNet architecture [47]. In this graph, the input

(represented by grey rectangles) is a video (i.e. a sequence of images). The

model separates the temporal part of the examples into disconnected parallel

layers, which are then slowly fused together throughout the network.

When calculating the spectrogram of a signal, the informa-

tion is structured in a Time x Frequency fashion (Time x Scale

for CWT). When the signal comes from an array of electrodes,

these examples can naturally be structured as Time x Spa-

tial x Frequency (Time x Spatial x Scale for CWT). As such,

the motivation for using a slow-fusion architecture based Con-

vNet in this work is due to the similarities between videos data

and the proposed characterization of sEMG signals, as both

representations have analogous structures (i.e. Time x Spa-

tial x Spatial for videos) and can describe non-stationary

information. Additionally, the proposed architectures inspired

by the slow-fusion model were by far the most successful of

the ones tried on the pre-training dataset.

1) ConvNet for Spectrograms: The spectrograms, which are

fed to the ConvNet, were calculated with Hann windows of

length 28 and an overlap of 20 yielding a matrix of 4x15. The

ﬁrst frequency band was removed in an effort to reduce base-

line drift and motion artifact. As the armband features eight

channels, eight such spectrograms were calculated, yielding a

ﬁnal matrix of 4x8x14 (Time x Channel x Frequency).

The implementation of the spectrogram ConvNet architec-

ture (see Appendix A, Fig. 8) was created with Theano [48]

and Lasagne [49]. As usual in deep learning, the architecture

was created in a trial and error process taking inspiration from

previous architectures (primarily [4], [6], [47], [7]). The non-

linear activation functions employed are the parametric expo-

nential linear unit (PELU) [50] and PReLU [51]. ADAM [52]

is utilized for the optimization of the ConvNet (learning

rate=0.00681292). The deactivation rate for MC Dropout is

set at 0.5 and the batch size at 128. Finally, to further reduce

overﬁtting, early stopping is employed by randomly removing

10% of the data from the training and using it as a validation

set at the beginning of the optimization process. Note that

learning rate annealing is applied with a factor of 5 when

the validation loss stops improving. The training stops when

two consecutive decays occurs with no network performance

amelioration on the validation set. All hyperparameter values

were found by a random search on the pre-training dataset.

2) ConvNet for Continuous Wavelet Transforms: The archi-

tecture for the CWT ConvNet, (Appendix A, Fig. 9), was built

in a similar fashion as the spectrogram ConvNet one. Both the

Morlet and Mexican Hat wavelet were considered for this work

due to their previous application in EMG-related work [53],

[54]. In the end, the Mexican Hat wavelet was selected, as it

was the best performing during cross-validation on the pre-

training dataset. The CWTs were calculated with 32 scales

yielding a 32x52 matrix. Downsampling is then applied at a

factor of 0.25 employing spline interpolation of order 0 to

reduce the computational load of the ConvNet during training

and inference. Following downsampling, similarly to the spec-

trogram, the last row of the calculated CWT was removed as to

reduce baseline drift and motion artifact. Additionally, the last

column of the calculated CWT was also removed as to provide

an even number of time-columns from which to perform the

slow-fusion process. The ﬁnal matrix shape is thus 12x8x7 (i.e.

Time x Channel x Scale). The MC Dropout deactivation rate,

batch size, optimization algorithm, and activation functions

remained unchanged. The learning rate was set at 0.0879923

(found by cross-validation).

3) ConvNet for raw EMG: A third ConvNet architecture

taking the raw EMG signal as input is also considered. This

network will help assess if employing time-frequency features

lead to sufﬁcient gains in accuracy performance to justify the

increase in computational cost. As the raw EMG represents

a completely different modality, a new type of architecture

must be employed. To reduce bias from the authors as much

as possible, the architecture considered is the one presented

in [55]. The raw ConvNet architecture can be seen in Ap-

pendix A, Fig. 10. This architecture was selected as it was

also designed to classify a hand gesture dataset employing the

Myo Armband. The architecture implementation (in PyTorch

v.0.4.1) is exactly as described in [55] except for the learning

rate (=1.1288378916846883e − 5) which was found by cross-

validation (tested 20 uniformly distributed values between

1e−6 to 1e−1 on a logarithm scale) and extending the length

of the window size as to match with the rest of this manuscript.

The raw ConvNet is further enhanced by introducing a second

convolutional and pooling layer as well as adding dropout,

BN, replacing RELU activation function with PReLU and

using ADAM (learning rate=0.002335721469090121) as the

optimizer. The enhanced raw ConvNet’s architecture, which

is shown in Appendix A, Fig. 11, achieves an average accu-

racy of 97.88% compared to 94.85% for the raw ConvNet.

Consequently, all experiments using raw emg as input will

employ the raw enhanced ConvNet.

VI. TRANSFER LEARNING

One of the main advantages of deep learning comes from

its ability to leverage large amounts of data for learning. As

it would be too time-consuming for a single individual to

record tens of thousands of examples, this work proposes to

aggregate the data of multiple individuals. The main challenge

thus becomes to ﬁnd a way to leverage data from multiple

users, with the objective of achieving higher accuracy with less

data. TL techniques are well suited for such a task, allowing

the ConvNets to generate more general and robust features

that can be applied to a new subject’s sEMG activity.

As the data recording was purposefully as unconstrained

as possible, the armband’s orientation from one subject to

another can vary widely. As such, to allow for the use of TL,

automatic alignment is a necessary ﬁrst step. The alignment

for each subject was made by identifying the most active

channel (calculated using the IEMG feature) for each gesture

on the ﬁrst subject. On subsequent subjects, the channels were

then circularly shifted until their activation for each gesture

matched those of the ﬁrst subject as closely as possible.

A. Progressive Neural Networks

Fine-tuning is the most prevalent TL technique in deep

learning [56], [57]. It consists of training a model on a

source domain (abundance of labeled data) and using the

trained weights as a starting point when presented with a

new task. However, ﬁne-tuning can suffer from catastrophic

forgetting [58], where relevant and important features learned

during pre-training are lost on the target domain (i.e. new

task). Moreover, by design, ﬁne-tuning is ill-suited when

signiﬁcant differences exist between the source and the target,

as it can bias the network into poorly adapted features for

Deep Learning for Electromyographic Hand Gesture Signal Classification Using Transfer Learning

Figures

Citations

A Comprehensive Survey on Transfer Learning

EMG Pattern Recognition in the Era of Big Data and Deep Learning

Deep Learning in Physiological Signal Data: A Survey.

Deep Learning for EMG-based Human-Machine Interaction: A Review

Real-Time Surface EMG Pattern Recognition for Hand Gestures Based on an Artificial Neural Network.

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Dropout: a simple way to prevent neural networks from overfitting

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Related Papers (5)

Deep Learning with Convolutional Neural Networks Applied to Electromyography Data: A Resource for the Classification of Movements for Prosthetic Hands

Electromyography data for non-invasive naturally-controlled robotic hand prostheses

Myoelectric control systems—A survey

A new strategy for multifunction myoelectric control

EMG feature evaluation for improving myoelectric pattern recognition robustness

Frequently Asked Questions (15)

Q1. What are the contributions mentioned in the paper "Deep learning for electromyographic hand gesture signal classification using transfer learning" ?

Q2. What future works have the authors mentioned in the paper "Deep learning for electromyographic hand gesture signal classification using transfer learning" ?

Q3. Why is the source task difficult in the present context?

Q4. How many rounds are available for the evaluation dataset?

Q5. What would be the easiest way to address this?

Q6. How many different combinations of hyperparameters were tested for each classifier?

Q7. How many repetitions were required to perform the gesture?

Q8. What is the main purpose of this paper?

Q9. What is the way to bridge the dimensionality gap between the two networks?

Q10. How is the average accuracy of the participants calculated?

Q11. Why are these transitions not part of the training dataset?

Q12. How will the proposed TL algorithm be used?

Q13. What is the simplest way to calculate the spectrogram of a signal?

Q14. How can deep learning algorithms be efficiently trained?

Q15. What is the significance of the proposed TL algorithm?