1
Scientific RepoRts | 7:41011 | DOI: 10.1038/srep41011
www.nature.com/scientificreports
Genetic algorithm for the
optimization of features and
neural networks in ECG signals
classication
Hongqiang Li
1
, Danyang Yuan
1
, Xiangdong Ma
1
, Dianyin Cui
1
& Lu Cao
2
Feature extraction and classication of electrocardiogram (ECG) signals are necessary for the
automatic diagnosis of cardiac diseases. In this study, a novel method based on genetic algorithm-
back propagation neural network (GA-BPNN) for classifying ECG signals with feature extraction using
wavelet packet decomposition (WPD) is proposed. WPD combined with the statistical method is
utilized to extract the eective features of ECG signals. The statistical features of the wavelet packet
coecients are calculated as the feature sets. GA is employed to decrease the dimensions of the
feature sets and to optimize the weights and biases of the back propagation neural network (BPNN).
Thereafter, the optimized BPNN classier is applied to classify six types of ECG signals. In addition, an
experimental platform is constructed for ECG signal acquisition to supply the ECG data for verifying the
eectiveness of the proposed method. The GA-BPNN method with the MIT-BIH arrhythmia database
achieved a dimension reduction of nearly 50% and produced good classication results with an accuracy
of 97.78%. The experimental results based on the established acquisition platform indicated that the
GA-BPNN method achieved a high classication accuracy of 99.33% and could be eciently applied in
the automatic identication of cardiac arrhythmias.
An electrocardiogram (ECG) is a complete representation of the electrical activity of the heart on the surface
of the human body, and it is extensively applied in the clinical diagnosis of heart diseases
1–3
. Many studies have
developed arrhythmia recognition approaches that utilize automatic analysis and diagnosis systems based on
ECG signals
4–7
, in which feature extraction and classication are particularly important for the analysis and
diagnosis of cardiac diseases. Numerous techniques for classifying ECG signals have been proposed in recent
years. A modied articial bee colony algorithm was established for ECG heartbeat classication to classify time
domain features, and good results were achieved
8
. An automatic ECG classication method using BPNN com-
bined with wave characteristics was presented to distinguish and diagnose heart diseases
9
. A technique based on
time domain features and support vector machine was applied to an ECG dataset to analyze and classify cardiac
arrhythmias
10
. Although ECG features in the time domain can be easily obtained, these features rely excessively
on waveform detection and are easily aected by noise. Transform methods are also widely applied in feature
extraction because of their good time–frequency property. Discrete biorthogonal wavelet decomposition was
utilized for extracting ECG features, and a radial basis function neural network was used for ECG classication
11
.
A combined neural network model was designed for the classication of ECG beats; this model was trained and
tested using discrete wavelet transform on the extracted features
12
. Wavelet algorithm was applied in extracting
features, and fuzzy neuro learning vector quantisation (FLVQ) was used as the classier for arrhythmia beats
13
.
An ECG beat classication method was presented, wherein discrete cosine transform converted RR intervals
and random forest was used as the classier
14
. Feature extraction using discrete wavelet transform and multiclass
support vector machines was employed for the classication of four types of ECG beats
15
. Moreover, combining
several methods is a common strategy in ECG feature extraction and classication. Features obtained by inde-
pendent component analysis, together with the use of the RR interval as the feature vector, were entered into
neural networks for ECG beats classication
16
. Cross-correlation was utilized as a formidable feature extraction
1
Tianjin Key Laboratory of Optoelectronic Detection Technology and Systems, School of Electronics and Information
Engineering, Tianjin Polytechnic University, Tianjin 300387, China.
2
Tianjin Chest Hospital, Tianjin 300222, China.
Correspondence and requests for materials should be addressed to H.L. (email: lihongqiang@tjpu.edu.cn)
Received: 08 July 2016
Accepted: 14 December 2016
Published: 31 January 2017
OPEN
www.nature.com/scientificreports/
2
Scientific RepoRts | 7:41011 | DOI: 10.1038/srep41011
tool and the least squares support vector machine (LS-SVM) was employed as an automated ECG beat classier
17
.
A combined method based on stacked generalisation was proposed for classifying ECG beats; in this method,
multilayer perceptron classiers were utilized as the base classiers trained by the back propagation algorithm
18
.
Higher-order statistics (HOSs) of ECG signals and three time interval features were fed as features into a bee
algorithm–radial basis function classier to classify ve types of ECG beats
19
. HOSs of WPD coecients were
used as the features for ECG heartbeats classication, and the obtained features were classied by a k-nearest
neighbor classier
20
.
In the present study, we used the WPD combined with the statistical method (WPD-statistical method) to
extract useful features. en, we applied the GA-BPNN method to lter the extracted features and classify the six
types of ECG signals. Prior to feature extraction, a method based on the improved threshold of the liing wavelet
was applied to remove the noise from ECG signals in preprocessing
21,22
. en, GA-BPNN method was employed
to select representative features and optimize the BPNN classier. e ltered features were inputted into the
optimized BPNN classier for classication. In this study, the ECG signals derived from the MIT-BIH arrhyth-
mia database
23
were classied into six categories, namely, normal beat (N), le bundle branch block beat (L),
right bundle branch block beat (R), atrial premature beat (A), paced beat (P), and premature ventricular contrac-
tion (V). We also constructed an experimental platform of ECG acquisition to supply six types of ECG signals for
verifying the eectiveness of the proposed method. Figure1 presents the overall block diagram of the proposed
method for ECG signal classication.
Results
Data sets from MIT-BIH database. Complete experimental analysis was conducted to evaluate the per-
formance of the proposed approach. In this study, six types of ECG signals were obtained from the MIT-BIH
arrhythmia database, and the sampling rate was 360 Hz
23
. We used a segment of 1000 points from each type
containing relevant ECG signal information and selected 360 samples for ECG classication. e sampling data
collected in this study from the MIT-BIH arrhythmia database are listed in Table1.
Feature extraction based on WPD-statistical method. In this study, we extracted feature vectors by
using WPD-statistical method. We selected the db6 wavelet as the mother wavelet. e ECG signal segments were
decomposed into 4 levels as shown in Fig.2. en, by employing the statistical method, 16 wavelet packet coe-
cients (WPCs) in the fourth level of WPD were calculated to obtain the ECG features. Each ECG signal segment
contained 16 WPCs. As such, the feature matrix consisted of 48 (16 × 3) dimensions, and the as-extracted features
were used for ECG feature selection and classication.
Feature selection and the BPNN structure optimization using GA. Aer extracting the features using
the WPD-statistical method, we obtained a 180 × 48 training feature matrix and a 180 × 48 testing feature matrix.
To improve the classication eciency and decrease calculation, redundant features were essential to remove.
erefore, e GA-BPNN method was employed to lter representative features for ECG signals classication.
Furthermore, the initial weights and biases of the BPNN were optimized by GA because their randomness would
aect the testing result. e parameters of GA were set as follows: the number of individual was 48; the popu-
lation size was 20; and the maximum generation was 100. e tness curve of GA is illustrated in Fig.3. Aer a
series of iterations, the average tness and the best tness were gradually improved, and a set of input arguments
were ltered by GA optimization. Aer 100 iterations, the ltered feature numbers were as follows: 1, 5, 8, 10, 12,
Figure 1. e block diagram of the proposed method for ECG signals classication. e classication
method consists of preprocessing, feature extraction, GA optimization and classication. Preprocessing is
performed to remove noise from the original ECG signals. Feature extraction is conducted to obtain ECG
features using the WPD-statistical method. GA optimization is employed to reduce the feature dimensions
and to optimize the weights and biases of BPNN. Classication refers to classifying ECG signals into six types,
namely, N, L, R, P, V and A.
www.nature.com/scientificreports/
3
Scientific RepoRts | 7:41011 | DOI: 10.1038/srep41011
Type MIT-BIH e training set e testing set
N 100, 105, 215 30 30
L 109, 111, 214 30 30
R 118, 124, 212 30 30
P 102, 107, 217 30 30
V 106, 223 30 30
A 207, 209, 232 30 30
To t a l 180 180
Table 1. e ECG data is sampled from MIT-BIH database. Each type of ECG signals had 30 samples for
the training set and 30 samples for the testing set. Samples of N were obtained from records 100, 105 and 215.
Samples of L were derived from records 109, 111 and 214. Samples of R were obtained from records 118, 124
and 212. Samples of P were obtained from records 102, 107 and 217. We obtained samples of V from records 106
and 223 and those of A from records 207, 209 and 232.
0 500 1000
-0.01
0
0.01
S1408
0 500 1000
-0.01
0
0.01
S1409
0 500 1000
-0.02
0
0.02
S1410
0 500 1000
-0.01
0
0.01
S1411
0 500 1000
-0.01
0
0.01
S1412
0 500 1000
-0.01
0
0.01
S1413
0 500 1000
-0.01
0
0.01
S1414
0 500 1000
-0.01
0
0.01
S1415
0 500 1000
-2
0
2
S1
Amplitude
0 500 1000
-2
0
2
S2
Amplitud
e
0 500 1000
-5
0
5
S1400
0 500 1000
-0.5
0
0.5
S1401
0 500 1000
-0.2
0
0.2
S1402
0 500 1000
-0.5
0
0.5
S1403
0 500 1000
-0.02
0
0.02
S1404
0 500 1000
-0.02
0
0.02
S1405
0 500 1000
-0.1
0
0.1
S1406
0 500 1000
-0.05
0
0.05
S1407
Figure 2. Results of ECG signal decomposition using WPD. S1 and S2 refer to the original and preprocessed
ECG signals, respectively. S1400–S1415 represent the 16 WPCs.
www.nature.com/scientificreports/
4
Scientific RepoRts | 7:41011 | DOI: 10.1038/srep41011
13, 14, 17, 18, 20, 22, 23, 26, 27, 29, 30, 32, 33, 34, 35, 36, 39, 40, 45, and 46. e dimensions of the feature sets were
reduced to approximately 50% by utilizing GA.
The ECG classication results of the BPNN classiers. e ltered feature sets were inputted into
the optimized BPNN classier. A 180 × 25 feature matrix was used as the training set to train the optimal BPNN
model, and a 180 × 25 feature matrix was utilized as the testing set for classication and prediction. e training
parameters of the BPNN classier used in this study were as follows: e momentum back propagation algorithm
was applied to train the BPNN classier. e structure of the BPNN classier consisted of one input layer, two
hidden layers and one output layer. Logistic functions were used in the hidden layers. A total of 48 input layer
nodes, 50 hidden layer nodes and 6 output layer nodes were set. e maximum iteration was 1000 epochs, the
minimum error goal was set as 0.01 and the learning rate was 0.05.
We also used a single BPNN classier to classify the features extracted by the WPD-statistical method, and
the results were compared with those obtained by the GA-BPNN method. e average modeling time of the
optimized BPNN classier was only 3.1652 s, whereas that of the single BPNN model was 8.0231 s, indicating that
the modeling time was signicantly reduced by GA optimization. e classication results of the two classiers
are presented in Figs4 and 5. Labels 1 to 6 represent N, L, R, P, V and A. Labels ‘°’ and ‘*’ denote the training and
testing sets, respectively. As shown in Figs4 and 5, the training sets of the two classiers were classied correctly.
Only four samples of the testing set were incorrectly classied by the GA-BPNN method, whereas the single
Figure 3. Fitness curve of GA. Average tness and best tness were gradually increased via a series of
iterations. When the evolution algebra was 100, the average tness and the best tness reached the maximum
value; that was, the sum of square error of the test set obtained the least value.
Figure 4. Classication results of the single BPNN classier. e classication accuracy of the training set
was 100%. Six types of ECG signals in the testing set had dierent classication results. Samples of L, V and A
were correctly classied. Two samples of N were classied to L. Four samples of R and one sample of P were
classied into V. Accordingly, the classication accuracy of N, L, R, P, V, and A were 93.33%, 100%, 86.67%,
96.67%, 100%, and 100%, respectively.
www.nature.com/scientificreports/
5
Scientific RepoRts | 7:41011 | DOI: 10.1038/srep41011
BPNN classier incorrectly classied seven samples of the testing set. Four statistical indices, namely, sensitivity
(Se), specicity (Sp), positive predictive value (PPV) and classication accuracy (A
CC
) were calculated for analysis
and comparison to evaluate the performance of the two classiers better. ese statistical indices were dened in
following equations:
=
+
×Se
TP
TP TN
100%
(1)
=
+
×Sp
TN
TN FP
100%
(2)
=
+
×PPV
TP
TP FP
100%
(3)
=
+
+++
×=
−
×Acc
TP TN
TP TN FP FN
NN
N
100% 100%
(4)
TE
T
where TP, TN, FP and FN denote true positive, true negative, false positive and false negative, respectively. N
T
represents the number of correctly classied ECG signals, whereas N
E
indicates the number of incorrectly clas-
sied ECG signals. e performance statistics of the two classiers are shown in Tables2 and 3. e GA-BPNN
method achieved a higher classication accuracy of 97.78% than the classication accuracy of 96.11% obtained
by the single BPNN classier, which suggested the proposed method based on the GA-BPNN classier could
Figure 5. Classication results of the GA-BPNN classier. e classication accuracy of the training set was
100%. In testing set classication, there were dierent results in six types of ECG signals. Samples of L and V
were correctly classied. One sample of N was wrongly classied into L. One sample of R was categorized into V.
One sample of P was classied into R. One sample of A was wrongly categorized into N. us, the classication
accuracy of N, L, R, P, V, and A were 96.67%, 100%, 96.67%, 96.67%, 100%, and 96.67%, respectively.
Type Se Sp PPV
N 93.33% 100% 100%
L 100% 98.62% 93.75%
R 86.67% 100% 100%
P 96.67% 100% 100%
V 100% 96.62% 85.71%
A 100% 100% 100%
Average 96.11% 99.21% 96.58%
A
CC
96.11%
Table 2. e performance statistics of the single BPNN classier. Six types of ECG signals had dierent
performance in the classication results. A has best performance statistics of sensitivity, specicity and positive
predictive value. L performed well with a sensitivity of 100%, a specicity of 98.62% and a positive predictive
value of 93.75%. N, R and P had good performance in specicity and positive predictive value, but the sensitivity
of N was lower than other types of sensitivities. Moreover, V had poor performance in positive predictive value.