scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Appropriate Use of Approximate Entropy and Sample Entropy with Short Data Sets

TL;DR: The results demonstrate that both ApEn and SampEn are extremely sensitive to parameter choices, especially for very short data sets, N ≤ 200, and should be used with extreme caution when choosing parameters for experimental studies with both algorithms.
Abstract: Approximate entropy (ApEn) and sample entropy (SampEn) are mathematical algorithms created to measure the repeatability or predictability within a time series. Both algorithms are extremely sensitive to their input parameters: m (length of the data segment being compared), r (similarity criterion), and N (length of data). There is no established consensus on parameter selection in short data sets, especially for biological data. Therefore, the purpose of this research was to examine the robustness of these two entropy algorithms by exploring the effect of changing parameter values on short data sets. Data with known theoretical entropy qualities as well as experimental data from both healthy young and older adults was utilized. Our results demonstrate that both ApEn and SampEn are extremely sensitive to parameter choices, especially for very short data sets, N ≤ 200. We suggest using N larger than 200, an m of 2 and examine several r values before selecting your parameters. Extreme caution should be used when choosing parameters for experimental studies with both algorithms. Based on our current findings, it appears that SampEn is more reliable for short data sets. SampEn was less sensitive to changes in data length and demonstrated fewer problems with relative consistency.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
18 May 2018-Sensors
TL;DR: From these results, two new sets of recommended EMG features (along with a novel feature, L-scale) are identified that provide better performance for these emerging low-sampling rate systems.
Abstract: Specialized myoelectric sensors have been used in prosthetics for decades, but, with recent advancements in wearable sensors, wireless communication and embedded technologies, wearable electromyographic (EMG) armbands are now commercially available for the general public. Due to physical, processing, and cost constraints, however, these armbands typically sample EMG signals at a lower frequency (e.g., 200 Hz for the Myo armband) than their clinical counterparts. It remains unclear whether existing EMG feature extraction methods, which largely evolved based on EMG signals sampled at 1000 Hz or above, are still effective for use with these emerging lower-bandwidth systems. In this study, the effects of sampling rate (low: 200 Hz vs. high: 1000 Hz) on the classification of hand and finger movements were evaluated for twenty-six different individual features and eight sets of multiple features using a variety of datasets comprised of both able-bodied and amputee subjects. The results show that, on average, classification accuracies drop significantly ( p.

201 citations


Cites background from "The Appropriate Use of Approximate ..."

  • ...[54] suggested that SampEn is extremely sensitive to parameter choices in very short datasets (<200 data points) and recommended the number of data points to be larger than 200, and as large as possible with respect to the practical constraints of the application....

    [...]

Journal ArticleDOI
TL;DR: The DistEn would be a promising measure for prompt clinical examination of cardiovascular function and had relatively lower sensitivity to the predetermined parameters and showed stability even for quantifying the complexity of extremely short series.
Abstract: Complexity of heartbeat interval series is typically measured by entropy. Recent studies have found that sample entropy (SampEn) or fuzzy entropy (FuzzyEn) quantifies essentially the randomness, which may not be uniformly identical to complexity. Additionally, these entropy measures are heavily dependent on the predetermined parameters and confined to data length. Aiming at improving the robustness of complexity assessment for short-term RR interval series, this study developed a novel measure—distribution entropy (DistEn). The DistEn took full advantage of the inherent information underlying the vector-to-vector distances in the state space by probability density estimation. Performances of DistEn were examined by theoretical data and experimental short-term RR interval series. Results showed that DistEn correctly ranked the complexity of simulated chaotic series and Gaussian noise series. The DistEn had relatively lower sensitivity to the predetermined parameters and showed stability even for quantifying the complexity of extremely short series. Analysis further showed that the DistEn indicated the loss of complexity in both healthy aging and heart failure patients (both p < 0.01), whereas neither the SampEn nor the FuzzyEn achieved comparable results (all p ≥ 0.05). This study suggested that the DistEn would be a promising measure for prompt clinical examination of cardiovascular function.

184 citations


Cites background from "The Appropriate Use of Approximate ..."

  • ...Larger values for m were not tested, since it was commonly set at 2, 3, or 4 and other values had rarely been selected [40]....

    [...]

  • ...It has been suggested that they are still highly unstable in short series [40] and lack consistency due to their great sensitivity to the predetermined parameters, especially to the threshold value r (similarity criterion) [16, 19]....

    [...]

Journal ArticleDOI
TL;DR: Experimental results validate the effectiveness of the methodology and demonstrate that proposed algorithm can be applied to recognize the different categories and severities of rolling bearings.

176 citations

Journal ArticleDOI
TL;DR: Experimental results in five mental tasks show that the combination strategies can effectively improve the classification performance when the order of autoregressive model is greater than 5, and the second strategy is superior to the first one in terms of the classification accuracy.
Abstract: Classification of electroencephalogram (EEG) signals is an important task in the brain computer interface system. This paper presents two combination strategies of feature extraction on EEG signals. In the first strategy, Autoregressive coefficients and approximate entropy are calculated respectively, and the features are obtained by assembling them. In the second strategy, the EEG signals are first decomposed into sub-bands by wavelet packet decomposition. Wavelet packet coefficients are then sent to the autoregressive model to calculate autoregressive coefficients, which are used as features extracted from the original EEG signals. These features are fed to support vector machine for classifying the EEG signals. The classification accuracy has been used for evaluating the classification performance. Experimental results in five mental tasks show that the combination strategies can effectively improve the classification performance when the order of autoregressive model is greater than 5, and the second strategy is superior to the first one in terms of the classification accuracy.

162 citations

Journal ArticleDOI
TL;DR: A methodology for classifying wake, rapid-eye-movement (REM) sleep, and non-REM (NREM) light and deep sleep on a 30 s epoch basis and achieving a Cohen's kappa coefficient of 0.49 and an accuracy of 69% in the classification of wake, REM, light, anddeep sleep is described.
Abstract: Automatic sleep stage classification with cardiorespiratory signals has attracted increasing attention In contrast to the traditional manual scoring based on polysomnography, these signals can be measured using advanced unobtrusive techniques that are currently available, promising the application for personal and continuous home sleep monitoring This paper describes a methodology for classifying wake, rapid-eye-movement (REM) sleep, and non-REM (NREM) light and deep sleep on a 30 s epoch basis A total of 142 features were extracted from electrocardiogram and thoracic respiratory effort measured with respiratory inductance plethysmography To improve the quality of these features, subject-specific Z-score normalization and spline smoothing were used to reduce between-subject and within-subject variability A modified sequential forward selection feature selector procedure was applied, yielding 80 features while preventing the introduction of bias in the estimation of cross-validation performance PSG data from 48 healthy adults were used to validate our methods Using a linear discriminant classifier and a ten-fold cross-validation, we achieved a Cohen's kappa coefficient of 049 and an accuracy of 69% in the classification of wake, REM, light, and deep sleep These values increased to kappa = 056 and accuracy = 80% when the classification problem was reduced to three classes, wake, REM sleep, and NREM sleep

148 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, the authors present the first algorithms that allow the estimation of non-negative Lyapunov exponents from an experimental time series, which provide a qualitative and quantitative characterization of dynamical behavior.

8,128 citations

Journal ArticleDOI
TL;DR: A new and related complexity measure is developed, sample entropy (SampEn), and a comparison of ApEn and SampEn is compared by using them to analyze sets of random numbers with known probabilistic character, finding SampEn agreed with theory much more closely than ApEn over a broad range of conditions.
Abstract: Entropy, as it relates to dynamical systems, is the rate of information production. Methods for estimation of the entropy of a system represented by a time series are not, however, well suited to analysis of the short and noisy data sets encountered in cardiovascular and other biological studies. Pincus introduced approximate entropy (ApEn), a set of measures of system complexity closely related to entropy, which is easily applied to clinical cardiovascular and other time series. ApEn statistics, however, lead to inconsistent results. We have developed a new and related complexity measure, sample entropy (SampEn), and have compared ApEn and SampEn by using them to analyze sets of random numbers with known probabilistic character. We have also evaluated cross-ApEn and cross-SampEn, which use cardiovascular data sets to measure the similarity of two distinct time series. SampEn agreed with theory much more closely than ApEn over a broad range of conditions. The improved accuracy of SampEn statistics should make them useful in the study of experimental clinical cardiovascular and other biological time series.

6,088 citations

Journal ArticleDOI
TL;DR: Analysis of a recently developed family of formulas and statistics, approximate entropy (ApEn), suggests that ApEn can classify complex systems, given at least 1000 data values in diverse settings that include both deterministic chaotic and stochastic processes.
Abstract: Techniques to determine changing system complexity from data are evaluated. Convergence of a frequently used correlation dimension algorithm to a finite value does not necessarily imply an underlying deterministic model or chaos. Analysis of a recently developed family of formulas and statistics, approximate entropy (ApEn), suggests that ApEn can classify complex systems, given at least 1000 data values in diverse settings that include both deterministic chaotic and stochastic processes. The capability to discern changing complexity from such a relatively small amount of data holds promise for applications of ApEn in a variety of contexts.

5,055 citations

Journal ArticleDOI
TL;DR: The MSE method is applied to the analysis of coding and noncoding DNA sequences and it is found that the latter have higher multiscale entropy, consistent with the emerging view that so-called "junk DNA" sequences contain important biological information.
Abstract: Traditional approaches to measuring the complexity of biological signals fail to account for the multiple time scales inherent in such time series. These algorithms have yielded contradictory findings when applied to real-world datasets obtained in health and disease states. We describe in detail the basis and implementation of the multiscale entropy (MSE) method. We extend and elaborate previous findings showing its applicability to the fluctuations of the human heartbeat under physiologic and pathologic conditions. The method consistently indicates a loss of complexity with aging, with an erratic cardiac arrhythmia (atrial fibrillation), and with a life-threatening syndrome (congestive heart failure). Further, these different conditions have distinct MSE curve profiles, suggesting diagnostic uses. The results support a general "complexity-loss" theory of aging and disease. We also apply the method to the analysis of coding and noncoding DNA sequences and find that the latter have higher multiscale entropy, consistent with the emerging view that so-called "junk DNA" sequences contain important biological information.

2,101 citations

Book
01 Jan 1997
TL;DR: Overview: The Dynamics of Complex Systems-Examples, Questions, Methods and Concepts Introduction and Preliminaries
Abstract: Overview: The Dynamics of Complex Systems-Examples, Questions, Methods and Concepts Introduction and Preliminaries Neural Networks I: Subdivision and Hierarchy Neural Networks II: Models of Mind Protein Folding I: Size Scaling of Time Protein Folding II: Kinetic Pathways Life I: Evolution-Origin of Complex Organisms Life II: Developmental Biology-Complex by Design Human Civilization I: Defining Complexity Human Civilization II: A Complex(ity) Transition.

1,703 citations


"The Appropriate Use of Approximate ..." refers background in this paper

  • ...Given the time series f(n) = f(1), f(2), ....

    [...]

  • ...Given the time series g(n) = g(1), g(2), ....

    [...]