Showing papers on "Cepstrum published in 2019"

PDF

Open Access

Journal Article•DOI•

A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform

[...]

C. Okan Sakar¹, Gorkem Serbes², Aysegul Gunduz³, Hunkar Can Tunc¹, Hatice Nizam³, Betul Erdogdu Sakar¹, Melih Tutuncu³, Tarkan Aydin¹, M. Erdem Isenkul³, Hulya Apaydin³ - Show less +6 more•Institutions (3)

Bahçeşehir University¹, Yıldız Technical University², Istanbul University³

01 Jan 2019-Applied Soft Computing

TL;DR: The results show that TQWT performs better or comparable to the state-of-the-art speech signal processing techniques used in PD classification, and Mel-frequency cepstral and the tunable-Q wavelet coefficients, which give the highest accuracies, contain complementary information inPD classification problem resulting in an improved system when combined using a filter feature selection technique.

...read moreread less

303 citations

Posted Content•

Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

[...]

Thomas Drugman¹, Baris Bozkurt², Thierry Dutoit¹•Institutions (2)

University of Mons¹, İzmir Institute of Technology²

30 Dec 2019-arXiv: Sound

TL;DR: In this article, the authors investigated the possibility of using complex cepstrum for glottal flow estimation on a large-scale database and showed that the proposed method has the potential to be used for voice quality analysis.

...read moreread less

Abstract: Complex cepstrum is known in the literature for linearly separating causal and anticausal components. Relying on advances achieved by the Zeros of the Z-Transform (ZZT) technique, we here investigate the possibility of using complex cepstrum for glottal flow estimation on a large-scale database. Via a systematic study of the windowing effects on the deconvolution quality, we show that the complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation when specific windowing criteria are met. It is also shown that this complex cepstral decomposition gives similar glottal estimates as obtained with the ZZT method. However, as complex cepstrum uses FFT operations instead of requiring the factoring of high-degree polynomials, the method benefits from a much higher speed. Finally in our tests on a large corpus of real expressive speech, we show that the proposed method has the potential to be used for voice quality analysis.

...read moreread less

66 citations

Journal Article•DOI•

Bearing Fault Classification Based on Convolutional Neural Network in Noise Environment

[...]

Qinyu Jiang¹, Faliang Chang¹, Bowen Sheng¹•Institutions (1)

Shandong University¹

27 May 2019-IEEE Access

TL;DR: A new bearing fault classification method based on convolutional neural networks (CNNs) is presented, demonstrated to have strong ability of classification under the interference of factory noise and the Gaussian noise.

...read moreread less

Abstract: Bearing fault diagnosis is an important technique in industrial production as bearings are one of the key components in rotating machines. In bearing fault diagnosis, complex environmental noises will lead to inaccurate results. To address the problem, bearing fault classification methods should be capable of noise resistance and be more robust. In previous studies, researchers mainly focus on noise-free condition, measured signal and signal with simulated noise, many effective approaches have been proposed. But in real-world working condition, strong and complex noises are often leads to inaccurate results. According to the situation, this work focuses on bearing fault classification under the influence of factory noise and the white Gaussian noise. In order to eliminate the noise interference and take the possible connection between signal frames into consideration, this paper presents a new bearing fault classification method based on convolutional neural networks (CNNs). By using the sensitivity to impulse of spectral kurtosis (SK), noises are repressed by the proposed filtering approach based on the SK. Mel-frequency cepstral coefficients (MFCC) and delta cepstrum are extracted as the feature by the reason of satisfactory performance in sound recognition. And in consideration of the connection between frames, a feature arrangement method is presented to transfer feature vectors to feature images, so the advantages of the CNNs in the fields of image processing can be exploited in the proposed method. The proposed method is demonstrated to have strong ability of classification under the interference of factory noise and the Gaussian noise by experiments.

...read moreread less

53 citations

Journal Article•DOI•

Enhanced speech emotion detection using deep neural networks

[...]

S. Lalitha¹, Shikha Tripathi¹, Deepa Gupta¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Sep 2019-International Journal of Speech Technology

TL;DR: Comparative analysis reveals that considerable improvement in the performance of emotion recognition is obtained using DNN with the identified combination of perceptual features.

...read moreread less

Abstract: This paper focusses on investigation of the effective performance of perceptual based speech features on emotion detection. Mel frequency cepstral coefficients (MFCC’s), perceptual linear predictive cepstrum (PLPC), Mel frequency perceptual linear prediction cepstrum (MFPLPC), bark frequency cepstral coefficients (BFCC), revised perceptual linear prediction coefficient’s (RPLP) and inverted Mel frequency cepstral coefficients (IMFCC) are the perception features considered. The algorithm using these auditory cues is evaluated with deep neural networks (DNN). The novelty of the work involves analysis of the perceptual features to identify predominant features that contain significant emotional information about the speaker. The validity of the algorithm is analysed on publicly available Berlin database with seven emotions in 1-dimensional space termed categorical and 2-dimensional continuous space consisting of emotions in valence and arousal dimensions. Comparative analysis reveals that considerable improvement in the performance of emotion recognition is obtained using DNN with the identified combination of perceptual features.

...read moreread less

46 citations

Proceedings Article•DOI•

Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames

[...]

Adib Ashfaq A. Zamil¹, Sajib Hasan¹, Showmik Md. Jannatul Baki¹, Jawad Md. Adam¹, Isra Zaman¹ - Show less +1 more•Institutions (1)

American International University-Bangladesh¹

01 Jan 2019

TL;DR: Mel Frequency Cepstrum Coefficient features were extracted from speech signals to detect the underlying emotion of the speech and this approach provides an efficient solution to classifying different emotions using speech signals.

...read moreread less

Abstract: Understanding human emotion is a complicated task for humans themselves, however, this did not stop the researchers from trying to make machines capable of understanding human emotions. Many approaches have been followed, using speech signals to detect emotions has been popular among these approaches. In this study, Mel Frequency Cepstrum Coefficient (MFCC) features were extracted from speech signals to detect the underlying emotion of the speech. Extracted features were used to classify different emotions using LMT classifier. For each frame of a speech signal, 13-dimensional feature vectors were extracted and Logistic Model Tree (LMT) models were trained using these features. For classifying an unknown speech signal, the 13-dimensional frame features are first extracted from the signal and each frame is classified using the trained model. Using a voting mechanism on the classified frames, the emotion of the speech signal is detected. Experimental results on two datasets- Berlin Database of Emotional Speech (Emo-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) show that our approach works very well in classifying certain emotions while it struggles to discern the differences between some pairs of emotions. Among the trained models, the maximum accuracy achieved was 70% in detecting 7 different emotions. Considering the small dimension size of the feature vectors used, this approach provides an efficient solution to classifying different emotions using speech signals.

...read moreread less

45 citations

Journal Article•DOI•

A survey of the application of the cepstrum to structural modal analysis

[...]

Robert B. Randall¹, Jérôme Antoni², Wade A. Smith¹•Institutions (2)

University of New South Wales¹, Institut national des sciences Appliquées de Lyon²

01 Mar 2019-Mechanical Systems and Signal Processing

TL;DR: The history, current situation and potential future development of the application of cepstral analysis to structural modal analysis is described, this seemingly being greatly under-utilised.

...read moreread less

32 citations

Proceedings Article•DOI•

Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019

[...]

Xingliang Cheng¹, Mingxing Xu¹, Thomas Fang Zheng¹•Institutions (1)

Tsinghua University¹

01 Nov 2019

TL;DR: A simple 2D-convolution multi-branch network architecture for replay detection, which can model the distortion both in the time and frequency domains and performance can be further improved by combining both magnitude-based and phase-based feature.

...read moreread less

Abstract: Automatic Speaker Verification (ASV) technology is vulnerable to various kinds of spoofing attacks, including speech synthesis, voice conversion, and replay. Among them, the replay attack is easy to implement, posing a more severe threat to ASV. The constant-Q cepstrum coefficient (CQCC) feature is effective for detecting the replay attacks, but it only utilizes the magnitude of constant-Q transform (CQT) and discards the phase information. Meanwhile, the commonly used Gaussian mixture model (GMM) cannot model the reverberation present in far-field recordings. In this paper, we incorporate the CQT and modified group delay function (MGD) in order to utilize the phase of CQT. Also, we present a simple 2D-convolution multi-branch network architecture for replay detection, which can model the distortion both in the time and frequency domains. The experiment shows that the proposed CQT-based MGD feature outperforms traditional MGD feature, and performance can be further improved by combining both magnitude-based and phase-based feature. Our best fusion system achieves 0.0096 min-tDCF and 0.39% EER on ASVspoof 2019 Physical Access evaluation set. Comparing with the CQCC-GMM baseline system provided by the organizer, the min-tDCF is relatively reduced by 96.09% and EER is relatively reduced by 96.46%. Our system is submitted to the ASVspoof 2019 Physical Access sub-challenge and won 1st place.

...read moreread less

26 citations

Journal Article•DOI•

Nonintrusive heart rate measurement using ballistocardiogram signals: a comparative study

[...]

Ibrahim Sadek¹, Ibrahim Sadek², Jit Biswas¹•Institutions (2)

Singapore University of Technology and Design¹, Centre national de la recherche scientifique²

01 Apr 2019-Signal, Image and Video Processing

TL;DR: A comparative study using various algorithms, i.e., wavelet analysis, cepstrum, fast Fourier transform, and autocorrelation function for heart rate measurement, which achieved relatively good results despite the remarkable amount of motion artifact produced owing to the frequent body movements and/or vibrations of the massage chair during stress relief massage.

...read moreread less

Abstract: Nonintrusive monitoring and long-term monitoring of vital signs are essential requirements for early diagnosis and prevention due to many reasons, one of the most important being improving the quality of life. In this paper, we present a comparative study using various algorithms, i.e., wavelet analysis, cepstrum, fast Fourier transform, and autocorrelation function for heart rate measurement. The heart rate was measured from noisy ballistocardiogram signals acquired from 50 subjects in a sitting position using a massage chair. The signals were unobtrusively collected from a microbend fiber-optic sensor embedded within the headrest of the chair and then transmitted to a computer through a Bluetooth connection. The multiresolution analysis of the maximal overlap discrete wavelet transform was implemented for heart rate measurement. The error between the proposed method and the reference electrocardiogram is estimated in beats per minute using the mean absolute error in which the system achieved relatively good results ( $$10.12\pm 4.69$$ ) despite the remarkable amount of motion artifact produced owing to the frequent body movements and/or vibrations of the massage chair during stress relief massage. In contrast, the error between the proposed method and the reference signal was very large when other algorithms, i.e., cepstrum, fast Fourier transform, and autocorrelation function, were implemented for heart rate measurement.

...read moreread less

25 citations

Journal Article•DOI•

Iris feature extraction through wavelet mel-frequency cepstrum coefficients

[...]

Soubhagya Sankar Barpanda¹, Banshidhar Majhi¹, Panjak Kumar Sa¹, Arun Kumar Sangaiah², Sambit Bakshi¹ - Show less +1 more•Institutions (2)

National Institute of Technology, Rourkela¹, VIT University²

01 Feb 2019-Optics and Laser Technology

TL;DR: The proposed method is based on the wavelet derived from the popular biorthogonal Cohen-Daubechies-Feauveau 9/7 filter bank and has superior frequency selectivity, symmetric, and better time-frequency localization.

...read moreread less

Abstract: In this paper, a novel technique based on wavelet cepstrum feature is discussed for iris recognition system. The proposed method is based on the wavelet derived from the popular biorthogonal Cohen-Daubechies-Feauveau 9/7 filter bank. Moreover, being biorthogonal in nature it has superior frequency selectivity, symmetric, and better time-frequency localization. The suggested scheme deals with computing the two level detail coefficients from the normalized iris template. Then these detailed coefficients are then divided into non-uniform bins in a logarithmic manner. This helps in reducing the dimension of the wavelet coefficients followed by assigning non-uniform weights to the different frequency components. Then the discrete cosine transform of the same is computed, from which the energy feature is extracted. The proposed technique is experimentally validated with publicly available databases: CASIAv3, UBIRISv1, and IITD. The performance of the proposed approach is found be superior to that of the state-of-the-art methods.

...read moreread less

25 citations

Journal Article•DOI•

Low frequency frame-wise normalization over constant-Q transform for playback speech detection

[...]

Jichen Yang¹, Rohan Kumar Das¹•Institutions (1)

National University of Singapore¹

01 Jun 2019-Digital Signal Processing

TL;DR: In this article, low frequency frame-wise normalization (LFFN) is proposed as one of the modules in feature extraction process that is hypothesized to help in capturing the artifacts from the playback speech.

...read moreread less

24 citations

Journal Article•DOI•

Bearing fault diagnosis in rotating machinery based on cepstrum pre-whitening of vibration and acoustic emission

[...]

David I. Ibarra-Zarate¹, Oscar Tamayo-Pazos¹, Antonio Vallejo-Guevara¹•Institutions (1)

Monterrey Institute of Technology and Higher Education¹

01 Oct 2019-The International Journal of Advanced Manufacturing Technology

TL;DR: According to the comparison between vibration and AE from 9 tests of experimental system results, vibration has a better result than AE, specifically in the inner race and rolling element faults, and for the remaining 3 tests that correspond to outer race fault, AE has a best result.

...read moreread less

Abstract: In this study, an experimental system was built to acquire vibration and acoustic emission (AE) signals from faulted bearings methodology based on cepstrum pre-whitening (CPW), tested for vibration signals, and was applied for both types of signals to compare and enhance results on machining condition monitoring. The methodology was applied to 9 vibration and 9 AE signals from the experimental system database. For the 18 analyzed signals, in 5 the identification of fault components was easily made, in 12 the fault identification was possible, and in 1 the identification was not completed. The comparison between vibration and AE from 9 tests of experimental system results in 6, vibration has a better result than AE, specifically in the inner race and rolling element faults, for the remaining 3 tests that correspond to outer race fault, AE has a better result.

...read moreread less

Journal Article•DOI•

The spectral amplitude modulation: A nonlinear filtering process for diagnosis of rolling element bearings

[...]

Ali Moshrefzadeh¹, Alessandro Fasana¹, Jérôme Antoni²•Institutions (2)

Polytechnic University of Turin¹, Institut national des sciences Appliquées de Lyon²

01 Oct 2019-Mechanical Systems and Signal Processing

TL;DR: An empirical and automated nonlinear filtering process will be proposed in which different components of a signal are decomposed based on their powers to seek the presence of bearings characteristic frequencies and can be seen as complementary to the narrowband amplitude demodulation techniques.

...read moreread less

Journal Article•DOI•

Mitigating the effects of granular scattering using cepstrum analysis in terahertz time-domain spectral imaging

[...]

Omar B. Osman¹, M. Hassan Arbab¹•Institutions (1)

Stony Brook University¹

16 May 2019-PLOS ONE

TL;DR: This study investigates the granular scattering effect in identification of chemicals with THz spectral absorption features and proposes a signal processing technique in the so-called “quefrency” domain to improve the ability to resolve these spectral features in the diffuse scattered THz images.

...read moreread less

Abstract: Terahertz (THz) imaging is a widely used technique in the study and detection of many chemicals and biomolecules in polycrystalline form because the spectral absorption signatures of these target materials often lie in the THz frequencies. When the size of dielectric grain boundaries are comparable to the THz wavelengths, spectral features can be obscured due to electromagnetic scattering. In this study, we first investigate this granular scattering effect in identification of chemicals with THz spectral absorption features. We then will propose a signal processing technique in the so-called "quefrency" domain to improve the ability to resolve these spectral features in the diffuse scattered THz images. We created a pellet with α-lactose monohydrate and riboflavin, two biologically significant materials with well-known vibrational spectral resonances, and buried the pellet in a highly scattering medium. THz transmission measurements were taken at all angles covering the half focal plane. We show that, while spectral features of lactose and riboflavin cannot be distinguished in the scattered image, application of cepstrum filtering can mitigate these scattering effects. By employing our quefrency-domain signal processing technique, we were able to unambiguously detect the dielectric resonance of lactose in the diffused scattering geometries. Finally we will discuss the limitation of the new proposed technique in spectral identification of chemicals.

...read moreread less

Journal Article•DOI•

Road Vehicle Detection and Classification Using Magnetic Field Measurement

[...]

Xiao Chen¹, Xiaoying Kong¹, Min Xu¹, Kumbesan Sandrasegaran¹, Jiangbin Zheng² - Show less +1 more•Institutions (2)

University of Technology, Sydney¹, Northwestern Polytechnical University²

04 Apr 2019-IEEE Access

TL;DR: A road vehicle recognition and classification approach for intelligent transportation systems using a roadside installed low-cost magnetometer and associated data collection system and a 3-dimensional map algorithm using Vector Quantization to classify vehicle magnetic features to 4 typical types of vehicles in Australian suburbs.

...read moreread less

Abstract: This paper presents a road vehicle recognition and classification approach for intelligent transportation systems. This approach uses a roadside installed low-cost magnetometer and associated data collection system. The system measures the magnetic field changing, detects passing vehicles, and recognizes vehicle types. We introduce Mel Frequency Cepstral Coefficients (MFCC) to analyze vehicle magnetic signals and extract it as vehicle feature with the representation of cepstrum, frame energy, and gap cepstrum of magnetic signals. We design a 3-dimensional map algorithm using Vector Quantization (VQ) to classify vehicle magnetic features to 4 typical types of vehicles in Australian suburbs: sedan, van, truck, and bus. In order to train an accurate classifier, training samples are selected using the Dynamic Time Warping (DTW). The verification experiments show that our approach achieves a high level of accuracy for vehicle detection and classification.

...read moreread less

Journal Article•DOI•

A cepstrum analysis-based classification method for hand movement surface EMG signals

[...]

Erdem Yavuz¹, Can Eyupoglu²•Institutions (2)

Bursa Technical University¹, Turkish Air Force Academy²

07 Aug 2019-Medical & Biological Engineering & Computing

TL;DR: The experimental results demonstrate that the proposed method surpasses most of the previous studies in point of classification accuracy and establishes applicability and efficacy of cepstrum-based features in classifying sEMG signals of hand movements.

...read moreread less

Abstract: It is of great importance to effectively process and interpret surface electromyogram (sEMG) signals to actuate a robotic and prosthetic exoskeleton hand needed by hand amputees. In this paper, we have proposed a cepstrum analysis-based method for classification of basic hand movement sEMG signals. Cepstral analysis technique primarily used for analyzing acoustic and seismological signals is effectively exploited to extract features of time-domain sEMG signals by computing mel-frequency cepstral coefficients (MFCCs). The extracted feature vector consisting of MFCCs is then forwarded to feed a generalized regression neural network (GRNN) so as to classify basic hand movements. The proposed method has been tested on sEMG for Basic Hand movements Data Set and achieved an average accuracy rate of 99.34% for the five individual subjects and an overall mean accuracy rate of 99.23% for the collective (mixed) dataset. The experimental results demonstrate that the proposed method surpasses most of the previous studies in point of classification accuracy. Discrimination ability of the cepstral features exploited in this study is quantified using Kruskal-Wallis statistical test. Evidenced by the experimental results, this study explores and establishes applicability and efficacy of cepstrum-based features in classifying sEMG signals of hand movements. Owing to the non-iterative training nature of the artificial neural network type adopted in the study, the proposed method does not demand much time to build up the model in the training phase. Graphical abstract.

...read moreread less

Journal Article•DOI•

Cepstrum-based damage identification in structures with progressive damage:

[...]

Ulrike Dackermann¹, Wade A. Smith¹, Mehrisadat Makki Alamdari¹, Jianchun Li², Robert B. Randall¹ - Show less +1 more•Institutions (2)

University of New South Wales¹, University of Technology, Sydney²

01 Jan 2019-Structural Health Monitoring-an International Journal

TL;DR: A new framework to identify and assess progressive structural damage is developed and a new damage feature outperforms the conventional principle component analysis–based feature, and the comprehensive test framework including extensive progressive damage cases validates the proposed technique.

...read moreread less

Abstract: This article aims at developing a new framework to identify and assess progressive structural damage. The method relies solely on output measurements to establish the frequency response functions o...

...read moreread less

Journal Article•DOI•

Improved local cepstrum and its applications for gearbox and rolling bearing fault detection

[...]

Xining Zhang, Rongtong Zhou, Wenwen Zhang

21 Mar 2019-Measurement Science and Technology

Journal Article•DOI•

Wheeze type classification using non-dyadic wavelet transform based optimal energy ratio technique.

[...]

Sezer Ulukaya¹, Gorkem Serbes², Yasemin P. Kahya•Institutions (2)

Trakya University¹, Yıldız Technical University²

01 Jan 2019-Computers in Biology and Medicine

TL;DR: It is concluded that time and frequency domain characteristics of wheezes are not steady and hence, tunable time-scale representations are more successful in discriminating polyphonic and monophonic wheeze types when compared with conventional fixed resolution representations.

...read moreread less

Proceedings Article•DOI•

Fault Diagnosis of Induction Motor Bearing Using Cepstrum-based Preprocessing and Ensemble Learning Algorithm

[...]

Kangkan Bhakta¹, Niloy Sikder¹, Abdullah-Al Nahid¹, M. M. Manjurul Islam²•Institutions (2)

Khulna University¹, University of Ulsan²

01 Feb 2019

TL;DR: An ensemble learning method named Gradient Boosting (GB) is proposed to previse future fault classes based on the data obtained from analyzing the recorded fault data and can detect and prefigure different types of bearing faults with a staggering 99.58% accuracy.

...read moreread less

Abstract: Monitoring the condition of rolling element bearing and diagnosing their faults are cumbrous jobs. Fortunately, we have machines to do the burdensome task for us. The contemporary development in the field of machine learning allows us not only to extract features from fault signals accurately but to analyze them and predict future bearing faults almost accurately as well in a systematic manner. Utilizing an ensemble learning method named Gradient Boosting (GB) our paper proposes a technique to previse future fault classes based on the data obtained from analyzing the recorded fault data. To demonstrate the cogency of the method, we applied it on the REB fault data provided by the Case Western Reserve University (CWRU) Lab. Employing this supervised learning algorithm after preprocessing the fault signals using real cepstrum analysis, we can detect and prefigure different types of bearing faults with a staggering 99.58% accuracy.

...read moreread less

Proceedings Article•DOI•

Inaudible Speech Watermarking Based on Self-compensated Echo-hiding and Sparse Subspace Clustering

[...]

Shengbei Wang¹, Weitao Yuan¹, Jianming Wang¹, Masashi Unoki²•Institutions (2)

Tianjin Polytechnic University¹, Japan Advanced Institute of Science and Technology²

12 May 2019

TL;DR: The method reported here realizes an inaudible echo-hiding based speech watermarking by using sparse subspace clustering (SSC) and the evaluation results verify the feasibility and effectiveness of this method.

...read moreread less

Abstract: The method reported here realizes an inaudible echo-hiding based speech watermarking by using sparse subspace clustering (SSC). Speech signal is first analyzed with SSC to obtain its sparse and low-rank components. Watermarks are embedded as the echoes of the sparse component for robust extraction. Self-compensated echoes consisting of two independent echo kernels are designed to have similar delay offsets but opposite amplitudes. A one-bit watermark is embedded by separately performing the echo kernels on the sparse and low-rank components. As a result, the sound distortion caused by one echo signal can be quickly compensated by the other echo signal, which enables better inaudibility. Since the embedded echoes have the same sparsity as the sparse component, watermarks can be extracted with a basic cepstrum analysis even if the echo kernels are not directly performed on the original speech. The evaluation results verify the feasibility and effectiveness of this method.

...read moreread less

Journal Article•DOI•

Fault diagnosis for gearbox based on EMD-MOMEDA

[...]

Xin Zhang, Jianmin Zhao, Xianglong Ni, Fucheng Sun, Hongyu Ge - Show less +1 more

01 Aug 2019-International Journal of Systems Assurance Engineering and Management

TL;DR: Experimental results show that the effectiveness of the proposed MOMEDA method for fault detection of parallel shaft gearbox is better than that of traditional methods.

...read moreread less

Abstract: In this paper, a new method for fault detection of parallel shaft gearbox based on the Empirical Mode Decomposition (EMD) and Multipoint Optimal Minimum Entropy Deconvolution (MOMEDA) is proposed. MOMEDA can overcome the shortcomings of minimum entropy deconvolution (MED) and Maximum Correlated Kurtosis Deconvolution (MCKD), and it is introduced to extract the fault cycle of gearbox signals. The vibration signals of gearbox are complex, including fault signals, noise signals and deterministic signals such as gear meshing component. Fault signal is often buried in these other components, which increases the difficulty of gearbox fault detection. Thus the EMD is proposed to decompose the signal and extract the fault impact components from the signal. The parallel shaft gearbox preset fault experiment is carried out to verify the effectiveness of method. In addition, some traditional methods, such as Fourier transform, cepstrum analysis, MED and MCKD, are used to compare with the proposed methods. Experimental results show that the effectiveness of the proposed method is better than that of traditional methods.

...read moreread less

Journal Article•DOI•

Low SNR speech enhancement with DNN based phase estimation

[...]

Samba Raju Chiluveru¹, Manoj Tripathy¹•Institutions (1)

Indian Institute of Technology Roorkee¹

01 Mar 2019-International Journal of Speech Technology

TL;DR: The proposed method uses Deep Neural Network based regression model to estimate clean phase and clean amplitude for speech reconstruction and the overall quality of speech improved for factory noise, restaurant noise, car noise, airport noise and babble noise.

...read moreread less

Abstract: In low Signal-to-Noise Ratio environment phase information is one of the important factor and therefore this article consider the importance of clean phase in single channel speech enhancement technique. The proposed method uses Deep Neural Network based regression model to estimate clean phase and clean amplitude for speech reconstruction. Experiments are conducted over five different noises such as factory, restaurant, car, airport and babble at different levels and result are evaluated using objective quality measures like Perceptual Evaluation of Speech Quality, Weighted Spectral Slope, Cepstrum Distance, frequency weighted segmented Signal-to-Noise Ratio and Log Likelihood Ratio. The overall quality of speech improved for factory noise by $$12\%$$ , restaurant noise by $$8\%$$ , car noise by $$13\%$$ , airport noise by $$10\%$$ and babble noise by $$14\%$$ respectively.

...read moreread less

Journal Article•DOI•

Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition

[...]

Shi Yanyan¹, Bai Jing¹, Xue Peiyun¹, Dianxi Shi•Institutions (1)

Taiyuan University of Technology¹

22 May 2019-IEEE Access

TL;DR: The experimental results show that the proposed feature set not only display a high recognition rate and excellent anti-noise performance in speech recognition, but can also fully characterize the auditory and energy information in the speech signals.

...read moreread less

Abstract: Environmental noise can pose a threat to the stable operation of current speech recognition systems. It is therefore essential to develop a front feature set that is able to identify speech under low signal-to-noise ratio. In this paper, a robust fusion feature is proposed that can fully characterize speech information. To obtain the cochlear filter cepstral coefficients (CFCC), a novel feature is first extracted by the power-law nonlinear function, which can simulate the auditory characteristics of the human ear. Speech enhancement technology is then introduced into the front end of feature extraction, and the extracted feature and their first-order difference are combined in new mixed features. An energy feature Teager energy operator cepstral coefficient (TEOCC) is also extracted, and combined with the above-mentioned mixed features to form the fusion feature sets. Principal component analysis (PCA) is then applied to feature selection and optimization of the feature set, and the final feature set is used in a non-specific persons, isolated words, and small-vocabulary speech recognition system. Finally, a comparative experiment of speech recognition is designed to verify the advantages of the proposed feature set using a support vector machine (SVM). The experimental results show that the proposed feature set not only display a high recognition rate and excellent anti-noise performance in speech recognition, but can also fully characterize the auditory and energy information in the speech signals.

...read moreread less

Journal Article•DOI•

Research on depression detection algorithm combine acoustic rhythm with sparse face recognition

[...]

Jian Zhao¹, Weiwen Su¹, Jian Jia¹, Chao Zhang¹, Tingting Lu¹ - Show less +1 more•Institutions (1)

Northwest University (China)¹

01 Jul 2019-Cluster Computing

TL;DR: A multi-modal fusion algorithm based on speech signal and facial image sequence for depression diagnosis, which can easily apply to the hardware and software on the existing hospital instruments with low cost is an accurate and effective method for diagnosing depression.

...read moreread less

Abstract: Due to the existence of false positive rate of the traditional depression diagnosis method, this paper proposes a multi-modal fusion algorithm based on speech signal and facial image sequence for depression diagnosis. Introduced spectrum subtraction to enhance depressed speech signal, and use cepstrum method to extract pitch frequency features with large variation rate and formant features with significant difference, the short time energy and Mel-frequency cepstral coefficients characteristic parameters for different emotion speeches are analyzed in both time domain and frequency domain, and establish a model for training and identification. Meanwhile, this paper implements the orthogonal match pursuit algorithm to obtain a sparse linear combination of face test samples, and cascade with voice and facial emotions based proportion. The experimental results show that the recognition rate based on the depression detection algorithm of fusion speech and facial emotions has reached 81.14%. Compared to the existing doctor’s accuracy rate of 47.3%, the accuracy can bring extra 71.54% improvement by combining with the proposed method of this paper. Additionally, it can easily apply to the hardware and software on the existing hospital instruments with low cost. Therefore, it is an accurate and effective method for diagnosing depression.

...read moreread less

Journal Article•DOI•

Noise-Robust Voice Conversion Using High-Quefrency Boosting via Sub-Band Cepstrum Conversion and Fusion

[...]

Xiaokong Miao, Meng Sun, Xiongwei Zhang, Yimin Wang

23 Dec 2019-Applied Sciences

TL;DR: The experimental results showed that the proposed method significantly improved the naturalness and similarity of the converted voice compared to the baselines, even with the noisy inputs of source speakers.

...read moreread less

Abstract: This paper presents a noise-robust voice conversion method with high-quefrency boosting via sub-band cepstrum conversion and fusion based on the bidirectional long short-term memory (BLSTM) neural networks that can convert parameters of vocal tracks of a source speaker into those of a target speaker. With the implementation of state-of-the-art machine learning methods, voice conversion has achieved good performance given abundant clean training data. However, the quality and similarity of the converted voice are significantly degraded compared to that of a natural target voice due to various factors, such as limited training data and noisy input speech from the source speaker. To address the problem of noisy input speech, an architecture of voice conversion with statistical filtering and sub-band cepstrum conversion and fusion is introduced. The impact of noises on the converted voice is reduced by the accurate reconstruction of the sub-band cepstrum and the subsequent statistical filtering. By normalizing the mean and variance of the converted cepstrum to those of the target cepstrum in the training phase, a cepstrum filter was constructed to further improve the quality of the converted voice. The experimental results showed that the proposed method significantly improved the naturalness and similarity of the converted voice compared to the baselines, even with the noisy inputs of source speakers.

...read moreread less

Proceedings Article•DOI•

Multi-Channel sEMG Based Human Lower Limb Motion Intention Recognition Method*

[...]

Yunfei Tao, Yuping Huang, Jigui Zheng, Jing Chen, Zhaojing Zhang, Yajing Guo, Li Pengfei - Show less +3 more

08 Jul 2019

TL;DR: Multi-channel sEMG based human lower limb motion intention recognition method is reliable and effective, and the average motion recognition rate of the improved method from 86.3%±8.24% to 93.6%±2.6%.

...read moreread less

Abstract: The paper presents a multi-channel sEMG based human lower limb motion intention recognition method, aiming at solving the problem of human lower limb motion intention recognition when using an exoskeletal robot. The cepstrum distance is used to automatically detect the endpoints of the sEMG signal for each motion. There are extracted the time domain and frequency domain characteristic parameters of the multi-channel sEMG signal, which are used to merge and constructe a joint feature matrix. The joint feature matrix is reduced by the principal component analysis (PCA) method, and a low-dimensional matrix of each motion is obtained. Traditional back propagation (BP) neural network model is optimized by the use of particle swarm optimization (PSO) algorithm. The low-dimensional matrix of each motion of the human lower limb is identified by the optimized BP neural network model. The average motion recognition rate of the improved method from 86.3%±8.24% to 93.6%±2.6% compared with the classical BP neural network algorithm in the recognition experiment. Multi-channel sEMG based human lower limb motion intention recognition method is reliable and effective.

...read moreread less

Proceedings Article•DOI•

Autonomous Framework for Person Identification by Analyzing Vocal Sounds and Speech Patterns

[...]

Bilal Hassan¹, Ramsha Ahmed², Bo Li¹, Omar Hassan³, Taimur Hassan⁴ - Show less +1 more•Institutions (4)

Beihang University¹, University of Science and Technology Beijing², Case Western Reserve University³, University of the Sciences⁴

01 Apr 2019

TL;DR: An autonomous algorithm for the person identification by analyzing their vocal sounds and speech patterns and correctly identifies the speaker with the accuracy, specificity and sensitivity of 83.33%, 86.67% and 80% respectively is proposed.

...read moreread less

Abstract: Speech processing has emerged as one of the important and crucial domain over the past decade. Many researchers have worked on voice recognition and verification. Some of the reported work has been done in the field of biometrics. However, this paper proposes an autonomous algorithm for the person identification by analyzing their vocal sounds and speech patterns. First, the input voice signal is introduced to our proposed system from which the low frequency contents are extracted using finite response low pass filter based on hamming window. Then the proposed system performs a cepstral analysis and extracts two distinct features from the signal spectrum i.e. the maximum pitch frequency and maximum cepstrum value. The 2D extracted feature set is passed on to the multi-level classification system constructed on supervised Support Vector Machine (SVM), which first discriminates between the person's gender and then classifies the person based on the gender. Total 120 samples were used to train the proposed classification system and the proposed system correctly identifies the speaker with the accuracy, specificity and sensitivity of 83.33%, 86.67% and 80% respectively.

...read moreread less

Journal Article•DOI•

DNN-Based Cepstral Excitation Manipulation for Speech Enhancement

[...]

Samy Elshamy¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

01 Nov 2019-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The new approach exceeds the performance of a formerly introduced classical signal processing-based cepstral excitation manipulation (CEM) method in terms of noise attenuation by about 1.5 dB and shows that this gain also holds true when comparing serial combinations of envelope and excitation enhancement.

...read moreread less

Abstract: This contribution aims at speech model-based speech enhancement by exploiting the source-filter model of human speech production. The proposed method enhances the excitation signal in the cepstral domain by making use of a deep neural network (DNN). We investigate two types of target representations along with the significant effects of their normalization. The new approach exceeds the performance of a formerly introduced classical signal processing-based cepstral excitation manipulation (CEM) method in terms of noise attenuation by about 1.5 dB. We show that this gain also holds true when comparing serial combinations of envelope and excitation enhancement. In the important low-SNR conditions, no significant trade-off for speech component quality or speech intelligibility is induced, while allowing for substantially higher noise attenuation. In total, a traditional purely statistical state-of-the-art speech enhancement system is outperformed by more than 3 dB noise attenuation.

...read moreread less

Proceedings Article•DOI•

Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder.

[...]

Tamás Gábor Csapó¹, Mohammed Salah Al-Radhi¹, Géza Németh¹, Gábor Gosztolya², Tamás Grósz², László Tóth², Alexandra Markó³ - Show less +3 more•Institutions (3)

Budapest University of Technology and Economics¹, University of Szeged², Eötvös Loránd University³

15 Sep 2019

TL;DR: In this article, the authors used a simple continuous F0 tracker which does not apply a strict voiced / unvoiced decision, and used a convolutional neural network to predict continuous vocoder parameters (ContF0, Maximum Voiced Frequency and Mel-Generalized Cepstrum).

...read moreread less

Abstract: Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text-to-speech synthesizers were shown to produce higher quality speech when using a continuous pitch estimate, which takes non-zero pitch values even when voicing is not present. Therefore, in this paper on UTI-based SSI, we use a simple continuous F0 tracker which does not apply a strict voiced / unvoiced decision. Continuous vocoder parameters (ContF0, Maximum Voiced Frequency and Mel-Generalized Cepstrum) are predicted using a convolutional neural network, with UTI as input. The results demonstrate that during the articulatory-to-acoustic mapping experiments, the continuous F0 is predicted with lower error, and the continuous vocoder produces slightly more natural synthesized speech than the baseline vocoder using standard discontinuous F0.

...read moreread less

Posted Content•

Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder.

[...]

Tamás Gábor Csapó¹, Mohammed Salah Al-Radhi¹, Géza Németh¹, Gábor Gosztolya², Tamás Grósz², László Tóth², Alexandra Markó³ - Show less +3 more•Institutions (3)

Budapest University of Technology and Economics¹, University of Szeged², Eötvös Loránd University³

24 Jun 2019-arXiv: Sound

TL;DR: The results demonstrate that during the articulatory-to-acoustic mapping experiments, the continuous F0 is predicted with lower error, and the continuous vocoder produces slightly more natural synthesized speech than the baseline vocoder using standard discontinuous F0.

...read moreread less