scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 2018"


Journal ArticleDOI
TL;DR: The concept of cepstrum is applied to eliminate the wave-shape function influence on the TF analysis, and a new algorithm, named de-shape synchrosqueezing transform (de-shape SST), is proposed.
Abstract: We propose to combine cepstrum and nonlinear time–frequency (TF) analysis to study multiple component oscillatory signals with time-varying frequency and amplitude and with time-varying non-sinusoidal oscillatory pattern. The concept of cepstrum is applied to eliminate the wave-shape function influence on the TF analysis, and we propose a new algorithm, named de-shape synchrosqueezing transform (de-shape SST). The mathematical model, adaptive non-harmonic model, is introduced and the de-shape SST algorithm is theoretically analyzed. In addition to simulated signals, several different physiological, musical and biological signals are analyzed to illustrate the proposed algorithm.

76 citations


Journal ArticleDOI
TL;DR: Deep Neural Networks are developed taking Fourier coefficients, Mel-Frequency Cepstrum data, and Wavelet features as input for differentiating normal from malignant current measurements, enabling the classifier to reach 99.95% accuracy for binary classification and 95.61% for multi-device classification.

51 citations


Journal ArticleDOI
TL;DR: Correlation analysis is conducted to quantify the feature discrimination abilities, and the results show that “frequency spectrum of state’, “energy”, and “entropy” are top domains to contribute effective features.
Abstract: This paper proposes a method using multidomain features and support vector machine (SVM) for classifying normal and abnormal heart sound recordings. The database was provided by the PhysioNet/CinC Challenge 2016. A total of 515 features are extracted from nine feature domains, i.e., time interval, frequency spectrum of states, state amplitude, energy, frequency spectrum of records, cepstrum, cyclostationarity, high-order statistics, and entropy. Correlation analysis is conducted to quantify the feature discrimination abilities, and the results show that "frequency spectrum of state", "energy", and "entropy" are top domains to contribute effective features. A SVM with radial basis kernel function was trained for signal quality estimation and classification. The SVM classifier is independently trained and tested by many groups of top features. It shows the average of sensitivity, specificity, and overall score are high up to 0.88, 0.87, and 0.88, respectively, when top 400 features are used. This score is competitive to the best previous scores. The classifier has very good performance with even small number of top features for training and it has stable output regardless of randomly selected features for training. These simulations demonstrate that the proposed features and SVM classifier are jointly powerful for classifying heart sound recordings.

40 citations


Journal ArticleDOI
TL;DR: Pitch-adaptive front-end signal processing in deriving the Mel-frequency cepstral coefficient features is explored to reduce the sensitivity to pitch variation and the effectiveness of existing speaker normalization techniques remain intact even with the use of proposed pitch- Adaptive MFCCs.

35 citations


Journal ArticleDOI
TL;DR: It is shown that the performance of the proposed frequency warped cepstral coefficients outperforms MFCC based on both simulated and measured data sets for four-class and eight-class human activity classification problems.
Abstract: Micro-Doppler signature analysis and speech processing share a common approach as both rely on the extraction of features from the signal's time-frequency distribution for classification. As a result, features, such as the mel-frequency cepstrum coefficients (MFCCs), which have shown success in speech processing, have been proposed for use in micro-Doppler classification. MFCCs were originally designed to take into account the auditory properties of the human ear by filtering the signal using a filter bank spaced according to the mel-frequency scale. However, the physics underlying radar micro-Doppler is unrelated to that of human hearing or speech. This work shows that the mel-scale filter bank results in the loss of frequency components significant to the classification of radar micro-Doppler. A novel method for frequency-warped cepstral feature design is proposed as a means for optimizing the efficacy of features in a data-driven fashion specifically for micro-Doppler analysis. It is shown that the performance of the proposed frequency warped cepstral coefficients outperforms MFCC based on both simulated and measured data sets for four-class and eight-class human activity classification problems.

33 citations


Journal ArticleDOI
03 Jul 2018
TL;DR: Wavelet-domain algorithms are focused on decomposing the signal into different components, hence the component which shows an agreement with the vital signs can be selected i.e., the selected component contains only information about the heart cycles or respiratory cycles, respectively.
Abstract: Time-domain algorithms are focused on detecting local maxima or local minima using a moving window, and therefore finding the interval between the dominant J-peaks of ballistocardiogram (BCG) signal. However, this approach has many limitations due to the nonlinear and nonstationary behavior of the BCG signal. This is because the BCG signal does not display consistent J-peaks, which can usually be the case for overnight, in-home monitoring, particularly with frail elderly. Additionally, its accuracy will be undoubtedly affected by motion artifacts. Second, frequency-domain algorithms do not provide information about interbeat intervals. Nevertheless, they can provide information about heart rate variability. This is usually done by taking the fast Fourier transform or the inverse Fourier transform of the logarithm of the estimated spectrum, i.e., cepstrum of the signal using a sliding window. Thereafter, the dominant frequency is obtained in a particular frequency range. The limit of these algorithms is that the peak in the spectrum may get wider and multiple peaks may appear, which might cause a problem in measuring the vital signs. At last, the objective of wavelet-domain algorithms is to decompose the signal into different components, hence the component which shows an agreement with the vital signs can be selected i.e., the selected component contains only information about the heart cycles or respiratory cycles, respectively. An empirical mode decomposition is an alternative approach to wavelet decomposition, and it is also a very suitable approach to cope with nonlinear and nonstationary signals such as cardiorespiratory signals. Apart from the above-mentioned algorithms, machine learning approaches have been implemented for measuring heartbeats. However, manual labeling of training data is a restricting property.

30 citations


Journal ArticleDOI
TL;DR: An ECG based driver distraction detection system using Mel-frequency cepstrum representation and convolutional neural networks (CNN) and a recipe to extract Mel frequency filter bank coefficients in time and frequency domains is presented.

22 citations


Journal ArticleDOI
TL;DR: Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step.
Abstract: This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step.

22 citations


Journal ArticleDOI
TL;DR: A novel simple but discriminative algorithm using minimum number of physiological signals and time-varying singular value decomposition (TSVD) approach that provided a computationally efficient and robust characterization of the signals in the presence of individual differences and noises.

22 citations


Journal ArticleDOI
TL;DR: A novel source cell-phone identification system suitable for both clean and noisy environments using spectral distribution features of constant Q transform (CQT) domain and multi-scene training method and Experimental results show that the features proposed have superior performance.
Abstract: With the widespread availability of cell-phone recording devices, source cell-phone identification has become a hot topic in multimedia forensics. At present, the research on the source cell-phone identification in clean conditions has achieved good results, but that in noisy environments is not ideal. This paper proposes a novel source cell-phone identification system suitable for both clean and noisy environments using spectral distribution features of constant Q transform (CQT) domain and multi-scene training method. Based on the analysis, it is found that the identification difficulty lies in different models of cell-phones of the same brand, and their tiny differences are mainly in the middle and low frequency bands. Therefore, this paper extracts spectral distribution features from the CQT domain, which has a higher frequency resolution in the mid-low frequency. To evaluate the effectiveness of the proposed feature, four classification techniques of Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Network (CNN) and Recurrent Neuron Network-Long Short-Term Memory Neural Network (RNN-BLSTM) are used to identify the source recording device. Experimental results show that the features proposed in this paper have superior performance. Compared with Mel frequency cepstral coefficient (MFCC) and linear frequency cepstral coefficient (LFCC), it enhances the accuracy of cell-phones within the same brand, whether the speech to be tested comprises clean speech files or noisy speech files. In addition, the CNN classification effect is outstanding. In terms of models, the model is established by the multi-scene training method, which improves the distinguishing ability of the model in the noisy environment than single-scenario training method. The average accuracy rate in CNN for clean speech files on the CKC speech database (CKC-SD) and TIMIT Recaptured Database (TIMIT-RD) databases increased from 95.47% and 97.89% to 97.08% and 99.29%, respectively. For noisy speech files with seen noisy types and unseen noisy types, the performance was greatly improved, and most of the recognition rates exceeded 90%. Therefore, the source identification system in this paper is robust to noise.

21 citations


Journal ArticleDOI
TL;DR: Experiments showed that speech quality is significantly improved by the proposed mel-cepstrum-based quantization noise shaping method, which effectively masks the white noise introduced by the quantization typically used in neural-network-based speech waveform synthesis systems.
Abstract: This paper presents a mel-cepstrum-based quantization noise shaping method for improving the quality of synthetic speech generated by neural-network-based speech waveform synthesis systems. Since mel-cepstral coefficients closely match the characteristics of human auditory perception, the proposed method effectively masks the white noise introduced by the quantization typically used in neural-network-based speech waveform synthesis systems. The paper also describes a computationally efficient implementation of the proposed method using the structure of the mel-log spectrum approximation filter. Experiments using the WaveNet generative model, which is a state-of-the-art model for neural-network-based speech waveform synthesis, showed that speech quality is significantly improved by the proposed method.

Journal ArticleDOI
TL;DR: The source-filter model of human speech production is employed in combination with a hidden Markov model and/or a deep neural network approach to estimate clean envelope-representing coefficients in the cepstral domain to improve the quality of the speech component and obtain considerable SNR improvement.
Abstract: In this paper, we propose and compare various techniques for the estimation of clean spectral envelopes in noisy conditions. The source-filter model of human speech production is employed in combination with a hidden Markov model and/or a deep neural network approach to estimate clean envelope-representing coefficients in the cepstral domain. The cepstral estimators for speech spectral envelope-based noise reduction are both evaluated alone and also in combination with the recently introduced cepstral excitation manipulation CEM technique for a priori SNR estimation in a noise reduction framework. Relative to the classical MMSE short time spectral amplitude estimator, we obtain more than 2 dB higher noise attenuation, and relative to our recent CEM technique still 0.5 dB more, in both cases maintaining the quality of the speech component and obtaining considerable SNR improvement.

Journal ArticleDOI
TL;DR: In infrared depth sensors, they provide data which can be in various ways preprocessed to form a basis for reliable fall detection, and can be promising tools for unobtrusive fall detection.

Patent
27 Mar 2018
TL;DR: In this paper, a voice enhancing method based on a multiresolution auditory cepstrum system and a deep convolutional neural network is proposed, which consists of three steps: establishing new characteristic parameters, namely MR-GFCC, capable of distinguishing voice from noise; secondly, establishing a self-adaptivemasking threshold on based on ideal soft masking (IRM) and ideal binary masking(IBM) according to noise variations; further training an established seven-layer neural network by using new extracted characteristic parameters and first/second derivatives thereof and the self
Abstract: The invention discloses a voice enhancing method based on a multiresolution auditory cepstrum system and a deep convolutional neural network. The voice enhancing method comprises the following steps:firstly, establishing new characteristic parameters, namely multiresolution auditory cepstrum coefficient (MR-GFCC), capable of distinguishing voice from noise; secondly, establishing a self-adaptivemasking threshold on based on ideal soft masking (IRM) and ideal binary masking (IBM) according to noise variations; further training an established seven-layer neural network by using new extracted characteristic parameters and first/second derivatives thereof and the self-adaptive masking threshold as input and output of the deep convolutional neural network (DCNN); and finally enhancing noise-containing voice by using the self-adaptive masking threshold estimated by the DCNN. By adopting the method, the working mechanism of human ears is sufficiently utilized, voice characteristic parameters simulating a human ear auditory physiological model are disposed, and not only is a relatively great deal of voice information maintained, but also the extraction process is simple and feasible.

Journal ArticleDOI
TL;DR: The experiments reveal that the features obtained from the HS, in combination with the MFCCs, enhances the performance of the TDSV system, and are found to be consistently more effective than cepstral/energy feature obtain from the raw IMFs, under noisy conditions.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: Using LSTM for recognize Indonesian speech digit, the MFCC feature extraction gets better accuracy result of 96.58% compared to the LPC feature extraction which amounts to 93.79 %.
Abstract: This paper presents Indonesian speech digit of decimal number (0–9) recognition using Deep Learning Long-Short Term Memory (LSTM). The LPC (Linear Predictive Coding) and MFCC (Mel-Frequency Cepstrum) feature extraction was used as an input on the LSTM model and the level of recognition accuracy was compared. The LPC feature extract speech feature based on a pitch or fundamental frequency, while MFCC extract speech feature based on the sound spectrum. We used 7990 speech digits consisted of 12 LPC coefficients and 12 MFCC coefficients as training data, while 790 data was used to classify on LSTM that had been trained. The results show that using LSTM for recognize Indonesian speech digit, the MFCC feature extraction gets better accuracy result of 96.58% compared to the LPC feature extraction which amounts to 93.79 %.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: In this paper, the authors proposed a model to diagnose whether a patient suffers from one of the three vocal disorders on the FEMH 2018 challenge, i.e., vocal dysarthric, vocal hypertonic, and vocal tachycardia.
Abstract: Vocal disorders have affected several patients all over the world. Due to the inherent difficulty of diagnosing vocal disorders without sophisticated equipment and trained personnel, a number of patients remain undiagnosed. To alleviate the monetary cost of diagnosis, there has been a recent growth in the use of data analysis to accurately detect and diagnose individuals for a fraction of the cost. We propose a cheap, efficient and accurate model to diagnose whether a patient suffers from one of three vocal disorders on the FEMH 2018 challenge.

01 Jun 2018
TL;DR: In this article, Deep Neural Networks (DNNs) were used for detecting and disrupting electronic arc faults, leveraging Internet of Things connectivity, artificial intelligence, and adaptive learning, and achieved 99.95% accuracy for binary classification and 95.61% for multi-device classification, with trigger-to-trip latency under 220 ms.
Abstract: We examine methods for detecting and disrupting electronic arc faults, proposing an approach leveraging Internet of Things connectivity, artificial intelligence, and adaptive learning. We develop Deep Neural Networks (DNNs) taking Fourier coefficients, Mel-Frequency Cepstrum data, and Wavelet features as input for differentiating normal from malignant current measurements. We further discuss how hardware-accelerated signal capture facilitates real-time classification, enabling our classifier to reach 99.95% accuracy for binary classification and 95.61% for multi-device classification, with trigger-to-trip latency under 220 ms . Finally, we discuss how IoT supports aggregate and user-specific risk models and suggest how future versions of this system might effectively supervise multiple circuits.

Proceedings ArticleDOI
15 Apr 2018
TL;DR: In a subjective comparison category rating test, the proposed ABE solution significantly outperforms the competing ABE baseline and was found to improve NB speech quality by 0.80 CMOS points, while the computation time is reduced to about 3 % compared to the ABE baseline.
Abstract: In this work, we present a simple deep neural network (DNN)-based regression approach to artificial speech bandwidth extension (ABE) in the frequency domain for estimating missing speech components in the range 4 … 7 kHz The upper band (UB) spectral magnitudes are found by first estimating the UB cepstrum by means of a DNN regression and subsequent conversion to the spectral domain, leading to a more efficient and generalizing model training rather than estimating highly redundant UB magnitudes directly As second novelty the phase information for the estimated upper band spectral magnitudes is generated by spectrally shifting the NB phase Apart from framing, this very simple approach does not introduce additional algorithmic delay A cross-database and cross-language task is defined for training and evaluation of the ABE framework In a subjective comparison category rating test, the proposed ABE solution significantly outperforms the competing ABE baseline and was found to improve NB speech quality by 080 CMOS points, while the computation time is reduced to about 3 % compared to the ABE baseline

Journal ArticleDOI
TL;DR: The VAS and CPP cut-off points of OS of voice disorder demonstrated a high power to discriminate between different severities of voice Disorder and were suggested cut-offs points for clinical use.
Abstract: Purpose: The aims of this study were to: (1) determine the visual analogue scale (VAS) and cepstrum peak prominence (CPP) cut-off points on the ratings of numerical scale (NS) related to the severi...

Proceedings ArticleDOI
03 Apr 2018
TL;DR: Variational mode decomposition (VMD) is used for extracting relevant information of speech signal and outperforms the Mel cepstral coefficient (MFCC) in this paper.
Abstract: This paper presents the analysis and classification of Parkinson disease. When a people suffering from Parkinson disease their vocal fold and vocal tract is affected severely and thus speech characteristics are alter during phonation. In this paper variational mode decomposition (VMD) is used for extracting relevant information of speech signal. VMD decomposes the speech signal into modes or sub signal. Various statistical features (mean, variance, skewness and kurtosis), energy and energy entropy are used for Parkinson disease detection. From the experiment, VMD based feature outperforms the Mel cepstral coefficient (MFCC). The proposed feature shows the classification accuracy 96.29%.

Proceedings ArticleDOI
18 Jul 2018
TL;DR: The proposed algorithmic method for differentiating a normal heart sound from an abnormal one using the PCG sound data has achieved an accuracy of 95% in correctly classifying a heart sound PCG signal as normal and abnormal.
Abstract: Cardiovascular diseases are very common these days and there arises a need for regular diagnosis of humans Phonocardiogram is an effective diagnostic tool for analysing the heart sound It helps in providing better information regarding clinical condition of the heart This paper proposes an algorithmic method for differentiating a normal heart sound from an abnormal one using the PCG sound data Cepstrum analysis has been performed on both types of signals and features are extracted from the heart sound The extracted features are trained and tested with the help of a support vector machine classifier The proposed method has achieved an accuracy of 95% in correctly classifying a heart sound PCG signal as normal and abnormal

Proceedings ArticleDOI
01 Aug 2018
TL;DR: Experiments show that compared with the traditional single MFCC features and Wavelet Packet Decomposition Energy features, the MFCC fusion features and SVM method have higher classification accuracy under the same noise environment.
Abstract: In order to effectively improve the expressiveness and classification accuracy of fault signal characteristics of equipment, this paper proposed a fault diagnosis method based on Mel-Frequency Cepstrum Coefficient (MFCC) fusion and Support Vector Machines (SVM). First of all, the MFCC features, the Wavelet Packet Decomposition Energy features and the Zero-crossing rate (ZCR) features of the signal are separately extracted. Then, linearly combining the three features based on the MFCC features to obtain the MFCC fusion features. And the SVM classifier is used to classify the faults. Experiments show that compared with the traditional single MFCC features and Wavelet Packet Decomposition Energy features, the MFCC fusion features and SVM method have higher classification accuracy under the same noise environment.

Patent
19 Jan 2018
TL;DR: In this article, a similarity calculation method based on multiple sound characteristics is proposed, which belongs to the technical field of audio signal processing, and consists of the following steps of firstly, preprocessing a sound signal, wherein the pre-processing process comprises performing pre-emphasis, framing, and function windowing; and then extracting the time domain characteristics, the frequency domain characteristics and the cepstrum domain characteristics of the sound signal.
Abstract: The invention relates to a similarity calculation method based on multiple sound characteristics, and belongs to the technical field of audio signal processing. The method comprises the following steps of firstly, preprocessing a sound signal, wherein the pre-processing process comprises performing pre-emphasis, framing, and function windowing; and then extracting the time domain characteristics,the frequency domain characteristics and the cepstrum domain characteristics of the sound signal, wherein the time domain characteristics comprise a short-time average zero-crossing rate and a short-time self-correlation function; the frequency-domain characteristics comprise a short-time power spectral density function; the cepstrum domain characteristics comprise a Mel frequency cepstral coefficient and a linear prediction cepstrum coefficient; performing the similarity value calculation on the extracted audio characteristics. The similarity value calculated by each characteristic parameteris obtained by performing mutual correlation on the audio characteristics to be tested.

Proceedings ArticleDOI
01 Aug 2018
TL;DR: A novel detection system that characterizes the impact of secondary tasks of calling and texting on the driver based on the spectrogram and MEL Cepstrum representation of the GSR signals and convolutional neural networks (CNN) modeling and achieves a high accuracy of detecting the state of inattention.
Abstract: Driver distraction is one of the major causes of road accidents which can lead to severe physical injuries and deaths. Statistics indicate the need of a reliable driver distraction system, which can monitor the driver's distraction and alert the driver before there is a chance of disasters on the road continuously an ubiquitously. Therefore, early detection of driver distraction can help decrease the cost of roadway disasters. Physiological signals such as electrocardiogram (ECG) and electroencephalogram (EEG) have been extensively used for driver state monitoring at the physiological level. More recently, galvanic skin response (GSR) analysis, which is a minimally intrusive technology, has been investigated to develop monitoring systems which alerts divers early. In this paper, we propose a novel detection system that characterizes the impact of secondary tasks of calling and texting on the driver based on the spectrogram and MEL Cepstrum representation of the GSR signals and convolutional neural networks (CNN) modeling. The proposed detection system decomposes the GSR signals in 2D time-frequency representation to decode spectro-temporal patterns. We further isolate the spectral envelope and then extract Mel frequency filter bank coefficients in time and frequency. Our proposed deep CNN structure is designed to automatically learn reliable discriminative visual patterns in the $2\mathbf{D}$ spectrogram and Mel cepstrum space. Passing the layers of the CNN, the low level features transform to high level features representing the impact of the secondary tasks. The aforementioned process replaces the traditional ad hoc hand-crafted feature extraction when working with a high dimensional time-series dataset. The classification accuracy of the proposed prediction algorithm is evaluated based on a set of recorded GSR signals from 7 driver subjects during a naturalistic driving. The experimental results demonstrate that the proposed algorithm achieves a high accuracy of detecting the state of inattention, 93.28%.

Journal ArticleDOI
TL;DR: Findings indicated the role of phase information in the ECG-based cognitive load estimation by Integrating dynamic and static characteristics of the cepstral coefficients in a multivariate approach improved the performance of the system.

Patent
21 Dec 2018
TL;DR: In this article, the authors proposed a voice noise reduction method for a conference terminal based on the neural network model, which comprises steps that S1, an audio file is collected by the conference terminal device to generate a digital audio signal in the time domain; S2, the digital audio signals are framed, and short-time Fourier transform is performed; S3, the amplitude spectrum of the frequency domain is mapped into a frequency band, and a Mel-frequency cepstral coefficient is further solved.
Abstract: The invention provides a voice noise reduction method for a conference terminal based on the neural network model. The method comprises steps that S1, an audio file is collected by the conference terminal device to generate a digital audio signal in the time domain; S2, the digital audio signal is framed, and short-time Fourier transform is performed; S3, the amplitude spectrum of the frequency domain is mapped into a frequency band, and a Mel-frequency cepstral coefficient is further solved; S4, first-order and second-order differential coefficients are calculated through utilizing the Mel-frequency cepstral coefficient, a pitch correlation coefficient is calculated on each frequency band, and pitch period features and VAD features are further extracted; S5, input characteristic parameters of an audio are used as the input of the neural network model, the neural network is trained offline, the frequency band gain generating the noise reduction speech is learned, and the trained weightis solidified; S6, the neural network model is utilized to learn, the frequency band gain is generated, the outputted frequency band gain is mapped to the spectrum, the phase information is added, and a noise reduction speech signal is reduced through inverse Fourier transform. The method is advantaged in that real-time noise reduction can be achieved.

Patent
19 Jan 2018
TL;DR: In this article, a speech recognition method based on neural network stacking autoencoder multi-feature fusion was proposed, which has higher recognition accuracy compared with the traditional single feature extraction method.
Abstract: The invention relates to a speech recognition method based on neural network stacking autoencoder multi-feature fusion. Firstly, the original sound data is framed and windowed, and the typical time-domain linear predictive cepstrum coefficient feature and the frequency-domain Mel frequency cepstrum coefficient feature are respectively extracted from the framed and windowed data; then the extractedfeatures are spliced, the initial feature representation vector of acoustic signals is constructed and the training feature library is created; then the multi-layer neural network stacking autoencoder is used for feature fusion and learning; the multi-layer autoencoder adopts the over-limit learning machine algorithm to learn training; and finally the extracted features are trained using the over-limit learning machine algorithm to get the classifier model; and the constructed model is finally used to test sample classification and identification. The method adopts multi-feature fusion basedon the over-limit learning machine multi-layer neural network stacking autoencoder, which has higher recognition accuracy compared with the traditional single feature extraction method.

Patent
06 Nov 2018
TL;DR: In this article, a deep learning-based unusual speech distinguishing method is proposed, which comprises the following steps: input speech is acquired, resampling, pre-emphasis and framing and windowing preprocessing are carried out on the input speech, and pre-processed speech is obtained; a mel-frequency cepstral coefficient (MFCC) characteristic vector is extracted for the preprocessed input speech; the speech segments with different frames are regularized to a fixed number of frames, and each speech segment obtains a corresponding melfrequency cEPstral coefficients characteristic
Abstract: The invention discloses a deep learning-based unusual speech distinguishing method. The method comprises the following steps: input speech is acquired, resampling, pre-emphasis and framing and windowing preprocessing are carried out on the input speech, and preprocessed speech is obtained; a mel-frequency cepstral coefficient (MFCC) characteristic vector is extracted for the preprocessed speech; the speech segments with different frames are regularized to a fixed number of frames, and each speech segment obtains a corresponding mel-frequency cepstral coefficient characteristic vector; a convolutional depth confidence network is built; the mel-frequency cepstral coefficient characteristic vectors are inputted to the convolutional depth confidence network for training, and the states of input speech are classified; and according to a classification result, a hidden Markov model is called for template matching and a speech recognition result is obtained. Multiple nonlinear transform layers of the convolutional depth confidence network are used, the inputted MFCC characteristics are mapped to higher-dimensional space, the hidden Markov model is then used to carry out modeling on different states of speech, and the speech recognition accuracy is improved.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: Experiments show that the algorithm can identify the manual signal, the mechanical signal and the vehicle signal in the research of the vibration signal recognition of the optical fiber pre-warning system (OFPS).
Abstract: Oil and gas resources pipelines, boundary lines and other places need to monitor their safety status in real time. The fiber early warning system becomes a good choice for its high sensitivity, corrosion resistance and concealment. The system provides early warning of the detection of fiber vibration signals. In this paper, an improved Mel frequency cepstrum coefficient (MFCC) method is proposed for the cepstrum characteristics recognition of different typical optical fiber vibration signals. Firstly, we pre-process the intrusion signals and obtain its power spectral density (PSD) to quantify the difference of frequency spectrum in respective intrusions. Secondly, the adaptive filter bank is designed according to the distribution of signal power spectrum to improve the conventional MFCC method. Through the analysis of the characteristic parameters, the MFCC coefficients are obtained. Finally, the Mean-crossing rates (MCR) of MFCC are calculated and the appropriate thresholds are selected to classify the typical vibration signals. Compared with the traditional MFCC, this improved MFCC method realizes adaptive division of frequency band according to the distribution of signal power spectrum. Experiments show that the algorithm can identify the manual signal, the mechanical signal and the vehicle signal in the research of the vibration signal recognition of the optical fiber pre-warning system (OFPS).