scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 2017"


Journal ArticleDOI
TL;DR: In this paper, it was shown that even though it is not possible to apply the complex cepstrum to stationary signals, it is possible to extract the modal part of the response (with a small extra damping of each mode corresponding to the window) and combine this with the original phase to obtain edited time signals.

187 citations


Journal ArticleDOI
TL;DR: A model of rotating machine signals is introduced which sheds light on the various components to be expected in the squared envelope spectrum, and a critical comparison is made of three sophisticated methods, namely, the improved synchronous average, the cepstrum prewhitening, and the generalized synchronousaverage, used for suppressing the deterministic part.

125 citations


Journal ArticleDOI
TL;DR: In this paper, a pre-whitening technique was proposed to group the deterministic multi-harmonic signal content in a cepstral peak at the corresponding quefrency, making it more suitable for removing the discrete frequency peaks.

74 citations


Journal ArticleDOI
TL;DR: In this paper, the authors presented a new feature extraction method by combining the empirical mode decomposition (EMD) and autocorrelation local cepstrum (ALC) for fault diagnosis of sophisticated multistage gearbox.

73 citations


Journal ArticleDOI
TL;DR: In this paper, the amplitude and frequency modulated signals emanating from the bearing were analyzed using four steps, i.e., standardisation, empirical mode decomposition, principal component analysis (PCA), envelope and cepstral envelope techniques.
Abstract: The kinematics of the bearing is erratic and random in nature and requires timely attention to avoid any catastrophic failure. In this study, the authors have proposed and analysed the amplitude and frequency modulated signals emanating from the bearing using four steps, i.e. standardisation, empirical mode decomposition, principal component analysis (PCA), envelope and cepstral envelope techniques. First, the standardised frequency modulated signals are decomposed into stationary non-linear modes called intrinsic mode functions (IMFs). In this approach, PCA is applied on the decomposed IMFs to produce uncorrelated signals. The uncorrelated signals whose value is above the average kurtosis are recombined to form a modified signal. The modified signal incurred from the approach is followed by spectrum, envelope, cepstrum, and cepstral envelope techniques to identify the features. It is observed this proposed combined approach effectively and adaptively identifies the inner/ball faults, shaft rotating frequency and corresponding harmonics in ease with least utilisation of IMFs.

27 citations


Journal ArticleDOI
TL;DR: The proposed spatial cepstrum does not require the positions of the microphones and is robust against the synchronization mismatch of channels, thus ensuring its suitability for use with a distributed microphone array.
Abstract: In this paper, with the aim of using the spatial information obtained from a distributed microphone array employed for acoustic scene analysis, we propose a robust and efficient method, which is called the spatial cepstrum. In our approach, similarly to the cepstrum, which is widely used as a spectral feature, the logarithm of the amplitude in multichannel observation is converted to a feature vector by a linear orthogonal transformation. This linear orthogonal transformation is achieved by principal component analysis PCA in general. Moreover, we also show that for a circularly symmetric microphone arrangement with an isotropic sound field, PCA is identical to the inverse discrete Fourier transform and the spatial cepstrum exactly corresponds to the cepstrum. The proposed approach does not require the positions of the microphones and is robust against the synchronization mismatch of channels, thus ensuring its suitability for use with a distributed microphone array. Experimental results obtained using actual environmental sounds verify the validity of our approach even when a smaller feature dimension than the original one is used, which is achieved by dimensionality reduction through PCA. Additionally, experimental results also indicate that the robustness of the proposed method is satisfactory for observations that have the synchronization mismatch of channels.

25 citations


Journal ArticleDOI
TL;DR: Investigation of different feature types for voice quality classification using multiple classifiers showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions from the acoustic and GIF waveforms.
Abstract: The goal of this study was to investigate the performance of different feature types for voice quality classification using multiple classifiers. The study compared the COVAREP feature set; which included glottal source features, frequency warped cepstrum, and harmonic model features; against the mel-frequency cepstral coefficients (MFCCs) computed from the acoustic voice signal, acoustic-based glottal inverse filtered (GIF) waveform, and electroglottographic (EGG) waveform. Our hypothesis was that MFCCs can capture the perceived voice quality from either of these three voice signals. Experiments were carried out on recordings from 28 participants with normal vocal status who were prompted to sustain vowels with modal and nonmodal voice qualities. Recordings were rated by an expert listener using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), and the ratings were transformed into a dichotomous label (presence or absence) for the prompted voice qualities of modal voice, breathiness, strain, and roughness. The classification was done using support vector machines, random forests, deep neural networks, and Gaussian mixture model classifiers, which were built as speaker independent using a leave-one-speaker-out strategy. The best classification accuracy of 79.97% was achieved for the full COVAREP set. The harmonic model features were the best performing subset, with 78.47% accuracy, and the static+dynamic MFCCs scored at 74.52%. A closer analysis showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions from the acoustic and GIF waveforms. Reduced classification performance was exhibited by the EGG waveform.

20 citations


Journal ArticleDOI
TL;DR: A method for determining the optimal bandwidth based on a mean squared error (MSE) criterion is given and an example of a damped oscillatory process in which the approximate optimal bandwidth can be written as a function of the damping parameter.
Abstract: A systematic method for bandwidth parameter selection is desired for Thomson multitaper spectrum estimation. We give a method for determining the optimal bandwidth based on a mean squared error (MSE) criterion. When the true spectrum has a second-order Taylor series expansion, one can express quadratic local bias as a function of the curvature of the spectrum, that can be estimated by using a simple spline approximation. This is combined with a variance estimate, obtained by jackknifing over individual spectrum estimates, to produce an estimated MSE for the log spectrum estimate for each choice of time-bandwidth product. The bandwidth that minimizes the estimated MSE then gives the desired spectrum estimate. Additionally, the bandwidth obtained by using our method is also optimal for cepstrum estimates. We give an example of a damped oscillatory (Lorentzian) process in which the approximate optimal bandwidth can be written as a function of the damping parameter. The true optimal bandwidth agrees well with that given by minimizing estimated the MSE in these examples.

18 citations


Patent
31 May 2017
TL;DR: In this paper, after speech separation treatment based on hearing characteristics on pretreated noise-containing mixed noise, the authors extracted a frequency cepstrum coefficient and a perceptual linear prediction coefficient of a signal, by utilizing the background distinction degree of the noise, analyzed the frequency cepsum coefficient under different noise environments so that the feature fusion is completed, and finally, in a pre-established vocal print feature template library, carrying out mode matching on the features after fusion by adopting a gaussian mixture model-universal background model (GMM-UBM) according to the human
Abstract: The embodiment of the invention provides a vocal print feature recognition method and a vocal print feature recognition system The method comprises the following specific steps: after speech separation treatment based on hearing characteristics on pretreated noise-containing mixed noise, extracting a frequency cepstrum coefficient and a perceptual linear prediction coefficient of a signal, by utilizing the background distinction degree of the noise, analyzing the frequency cepstrum coefficient and the perceptual linear prediction coefficient under different noise environments so that the feature fusion is completed, and finally, in a pre-established vocal print feature template library, carrying out mode matching on the features after fusion by adopting a gaussian mixture model-universal background model (GMM-UBM) According to the vocal print feature recognition method, the human auditory system features and the traditional vocal print recognition method are combined, the problem that the vocal print recognition ratio under noise is low is solved from the point of bionics, and the vocal print feature recognition accuracy and system robustness under the noise environment are effectively promoted

15 citations


Proceedings ArticleDOI
Li Su1
01 Dec 2017
TL;DR: In this article, the equivalence relation between a generalized cepstrum and a DNN in terms of their structures and functionality is demonstrated for the task of multi- pitch estimation, and a new feature designed in the same fashion is proposed for pitch salience function.
Abstract: This paper presents a new way to understand how deep neural networks (DNNs) work by applying homomorphic signal processing techniques. Focusing on the task of multi- pitch estimation (MPE), this paper demonstrates the equivalence relation between a generalized cepstrum and a DNN in terms of their structures and functionality. Such an equivalence relation, together with pitch perception theories and the recently established rectified-correlations-on-a-sphere (RECOS) filter analysis, provide an alternative way in explaining the role of the nonlinear activation function and the multi-layer structure, both of which exist in a cepstrum as well as in a DNN. To validate the efficacy of this new approach, a new feature designed in the same fashion is proposed for pitch salience function. The new feature outperforms the one-layer spectrum in the MPE task and, as predicted, it addresses the issue of the missing fundamental effect and also achieves better robustness to noise.

14 citations


Patent
Sun Cheng, Zhao Zhuo, Zhang Yin, Li Li, Shou Lili 
20 Jun 2017
TL;DR: In this paper, a vibration event mode identification method is proposed, which includes acquiring and obtaining the original vibration signals, including the vibration signal and the non-vibration signal, via a vibration sensor, separating the vibration signals in the original vibrations, denoising the vibrations, and performing feature extraction on the denoised vibration signal.
Abstract: The invention discloses a vibration event mode identification method. The method includes acquiring and obtaining the original vibration signals, including the vibration signal and the non-vibration signal, via a vibration sensor; separating the vibration signal in the original vibration signals; denoising the vibration signal; performing feature extraction on the denoised vibration signal and obtaining the characteristic vector including three characteristics of: A performing wavelet packet decomposition on the time-frequency domain and obtaining the energy characteristic vector, B performing the cepstrum analysis and extracting the cepstrum parameter characteristics, and C extracting the signal characteristic on the time domain; and establishing an identification model composed of two grades of classifiers, wherein the first grade of classifier is a classifier based on SVM and divides the vibration events into non-intrusive events and intrusion events by taking the characteristic vector extracted from the vibration signal as the input, and the second grade of classifier performs the identification based on the artificial nerve network for the intrusion events to obtain the classification result.

Patent
14 Jul 2017
TL;DR: In this paper, a recording device clustering method based on Gaussian mean super vectors and spectral clustering is proposed, which comprises the steps that the Melch frequency cepstrum coefficient MFCC characteristic which characterizes the recording device characteristic is extracted from a speech sample; the MFCC characteristics of all speech samples are used as input, and a common background model UBM is trained through an expectation maximization EM algorithm, and UBM parameters are updated through a maximum posteriori probability MAP algorithm to acquire the Gaussian mixture model GMM of each speech sample, and the
Abstract: The invention provides a recording device clustering method based on Gaussian mean super vectors and spectral clustering. The method comprises the steps that the Melch frequency cepstrum coefficient MFCC characteristic which characterizes the recording device characteristic is extracted from a speech sample; the MFCC characteristics of all speech samples are used as input, and a common background model UBM is trained through an expectation maximization EM algorithm; the MFCC characteristic of each speech sample is used as input, and UBM parameters are updated through a maximum posteriori probability MAP algorithm to acquire the Gaussian mixture model GMM of each speech sample; the mean vector of all Gaussian components of each GMM is spliced in turn to form a Gaussian mean super vector; a spectral clustering algorithm is used to cluster the Gaussian mean super vectors of all speech samples; the number of recording devices is estimated; and the speech samples of the same recording device are merged. According to the invention, the speech samples collected by the same recording device can be found out without knowing the prior knowledge of the type, the number and the like of the recording devices, and the application scope of the method is wide.

Journal ArticleDOI
TL;DR: A new attack strategy to spoof phase-based SSDs with the objective of increasing the security of voice verification systems by enabling the development of more generalized SSDs using a complex cepstrum vocoder as a postprocessor.
Abstract: State-of-the-art speaker verification systems are vulnerable to spoofing attacks. To address the issue, high-performance synthetic speech detectors (SSDs) for existing spoofing methods have been proposed. Phase-based SSDs that exploit the fact that most of the parametric speech coders use minimum-phase filters are particularly successful when synthetic speech is generated with a parametric vocoder. Here, we propose a new attack strategy to spoof phase-based SSDs with the objective of increasing the security of voice verification systems by enabling the development of more generalized SSDs. As opposed to other parametric vocoders, the complex cepstrum approach uses mixed-phase filters, which makes it an ideal candidate for spoofing the phase-based SSDs. We propose using a complex cepstrum vocoder as a postprocessor to existing techniques to spoof the speaker verification system as well as the phase-based SSDs. Once synthetic speech is generated with a speech synthesis or a voice conversion technique, for each synthetic speech frame, a natural frame is selected from a training database using a spectral distance measure. Then, complex cepstrum parameters of the natural frame are used for resynthesizing the synthetic frame. In the proposed method, complex cepstrum-based resynthesis is used as a postprocessor. Hence, it can be used in tandem with any synthetic speech generator. Experimental results showed that the approach is successful at spoofing four phase-based SSDs across nine parametric attack algorithms. Moreover, performance at spoofing the speaker verification system did not substantially degrade compared to the case when no postprocessor is employed.

Proceedings ArticleDOI
01 Nov 2017
TL;DR: The Mel-frequency cepstrum is gives good results for isolating phonetic characteristics of speech signals and a practical solution of the problem of faster processing is the use of parallel computing algorithms.
Abstract: A Speech recognition is one of the important process of information technology. Speech recognition plays a key role in many systems like voice control, IP-telephony, personal identification, recognition of individual words and phrases, accepting applications for reference services and searching system. There are many researching companies in this area, which developing and improving methods, algorithms and applications for the segmentation of the speech signal and for the calculation of parametric indicators of the selected fragments of the speech signals. In the preliminary stages of speech processing is being implemented algorithms for the allocation of phonetic characteristics, which are subjected to syntactic and semantic analysis in subsequent stages. In isolating the phonetic characteristics of the input speech signals the calculation cepstral characteristics one of the important processes in speech recognition. The Mel-frequency cepstrum is gives good results for isolating phonetic characteristics of speech signals. The calculation of Mel-frequency cepstral coefficients takes a lot of time in speech recognition process. This is clearly evident in real time systems like IP-telephony. The calculation of Mel-frequency cepstral coefficients takes a lot of time in speech recognition process. This is clearly evident in real time systems like a IP-telephony. For the solving these problem we need to create a stream computing. A practical solution of the problem of faster processing is the use of parallel computing algorithms. The hardware platform implementation of parallel algorithms for calculation of Mel-frequency cepstral coefficients can be multi-core processors.

Proceedings ArticleDOI
01 Dec 2017
TL;DR: The main approach is to isolate the speech recognition by Cepstrum and vector quantization and the result show that all digit gives good performance.
Abstract: Speech recognition is a process to identify the speaker on the basis of individual information within the speech wave Recent development has made the voice recognition in the security system In this paper the implementation of speech digit recognition system is discussed This technique is mainly used in person voice identification and control access like banking by telephone, voice dialing and database access services The zero to nine digit utterances for speech data was collected The speech digit recognition mainly involves two parts, one is the feature extraction and other one is the feature matching The main approach is to isolate the speech recognition by Cepstrum and vector quantization Cepstrum technique is used for feature extraction and vector quantization is used for feature matching The result show that all digit gives good performance The proposed speech digit recognition algorithm is implemented by using MATLAB software

Book ChapterDOI
12 Sep 2017
TL;DR: Four neural network architectures are investigated and applied using this continuous vocoder to model F0, MVF, and Mel-Generalized Cepstrum for more natural sounding speech synthesis and show that the proposed framework converges faster and gives state-of-the-art speech synthesis performance while outperforming the conventional feed-forward DNN.
Abstract: In our earlier work in statistical parametric speech synthesis, we proposed a vocoder using continuous F0 in combination with Maximum Voiced Frequency (MVF), which was successfully used with a feed-forward deep neural network (DNN). The advantage of a continuous vocoder in this scenario is that vocoder parameters are simpler to model than traditional vocoders with discontinuous F0. However, DNNs have a lack of sequence modeling which might degrade the quality of synthesized speech. In order to avoid this problem, we propose the use of sequence-to-sequence modeling with recurrent neural networks (RNNs). In this paper, four neural network architectures (long short-term memory (LSTM), bidirectional LSTM (BLSTM), gated recurrent network (GRU), and standard RNN) are investigated and applied using this continuous vocoder to model F0, MVF, and Mel-Generalized Cepstrum (MGC) for more natural sounding speech synthesis. Experimental results from objective and subjective evaluations have shown that the proposed framework converges faster and gives state-of-the-art speech synthesis performance while outperforming the conventional feed-forward DNN.

Proceedings ArticleDOI
01 Dec 2017
TL;DR: An unsupervised learning approach to automation of hammering test for diagnosis of concrete structures among others is presented and sound samples are clustered using fuzzy clustering while incorporating physical spatial information and Mel-Frequency Cepstrum is used in order to reproduce human hearing when conductinghammering test.
Abstract: Hammering test is a popular non-destructive testing method which automation is highly demanded for efficient diagnosis of concrete structures. The objective is to correctly determine if a hammering sound originated from a defect in the structure or not. In this paper, we present an unsupervised learning approach to automation of hammering test for diagnosis of concrete structures among others. Sound samples are clustered using fuzzy clustering while incorporating physical spatial information and Mel-Frequency Cepstrum is used in order to reproduce human hearing when conducting hammering test. Experiments using concrete test blocks showed good results, both in single and multiple defects cases.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: The speech recognition system implemented to control Robot Arm for perform assignment pick and place the object and tested using trained and not trained respondent show the best agreement result to identifying the speech recognition.
Abstract: In this study describe the implementation of speech recognition to pick and place an object using 5 DoF Robot Arm based on Arduino Microcontroller. To identify the speech used Mel-Frequency Cepstrum Coefficients (MFCC) method to get feature extraction and K-Nearest Neighbors (KNN) method to learn and identify the speech recognition based on Python 2.7. The database of speech use 12 feature for KNN process, then tested using trained (85%) and not trained (80%) respondent show the best agreement result to identifying the speech recognition. Finally, the speech recognition system implemented to control Robot Arm for perform assignment pick and place the object.

Proceedings ArticleDOI
15 May 2017
TL;DR: A low-cost system which classifies different road conditions (asphalt, gravel, snowy and stony road) using acoustic signal processing is proposed to estimate road/tire friction forces in the active safety systems.
Abstract: In this study, a low-cost system which classifies different road conditions (asphalt, gravel, snowy and stony road) using acoustic signal processing is proposed. Thus it is aimed to estimate road/tire friction forces in the active safety systems. Classical acoustic signal processing methods which are linear predictive coding (LPC), power spectrum (PSC) and mel-frequency cepstrum coefficients (MFCC) are used with minimum variance and maximum distance principle in this system. The classification process is also executed by support vector machine (SVM).

Patent
04 Jul 2017
TL;DR: In this paper, a music identifying method and system consisting of acquiring a to be-identified music clip, extracting a mel-frequency cepstrum coefficient of each frame of the to-be-identified audio, forming an audio feature vector according to the above coefficients of the audio, combing feature vectors of audio to achieve a feature matrix of the music clip and comparing the feature matrix with a sampling music in a music library, and outputting the music information.
Abstract: The invention discloses a music identifying method and system. The method comprises steps of acquiring a to-be-identified music clip, extracting a mel-frequency cepstrum coefficient of each frame of the to-be-identified music clip, a mel-frequency cepstrum coefficient first difference, a linear prediction cepstrum coefficient and a sensing linear prediction coefficient, forming an audio feature vector according to the above coefficients of the audio, combing feature vectors of each frame audio to achieve a feature matrix of the to-be-identified music clip, comparing the feature matrix of the to-be-identified music clip and a feature matrix of a sampling music in a music library to achieve a maximum-similarity feature matrix, acquiring music information of a maximum sampling music feature matrix, and outputting the music information. The maximum-similarity feature matrix belongs to a sampling music having the maximum similarity to the to-be-identified music. The music identifying method and system have great noise resisting property and great identifying efficiency and ideal identifying effect.

01 Jan 2017
TL;DR: This research focused on the improvement of cepstrum analysis in order to extract information about the leak, pipe feature, and its location from pressure transient signal as post-processing method.
Abstract: Nowadays, pipeline system is one of the powerful technologies to be implemented in the real world. It is very essential for transporting fluid especially water from one point to the next point. But the pipeline system will also defect as leaks due to many reasons. Pressure Transient signal is a newly developed method to detect and localize leak phenomena since the signal has information about that phenomenon .The basic principal is the fact of water spouting out of a leak in pressurized pipe that generates a signal, and the signal may contain information to whether a leak exists and where it is located. To extract this signal, many signal analysis methods were implemented by researchers such as cross-correlation, genetic algorithm, and wavelets transform. Cepstrum analysis is proposed as a method to extract leak and pipe feature information from pressure transient signal by considering this method to analyse non-stationary data. Since in the real test, the originality and pure data are hard to be captured due to noise generated from environment and the noise level ratio is very low, pre-processing method as a filtering technique is implemented to analyse the real signal before the signal goes through cepstrum analysis as post-processing method. This research focused on the improvement of cepstrum analysis in order to extract information about the leak, pipe feature,and its location. In this research, cepstrum analysis was proposed as Post-Processing method. Discrete Wavelets Transform (DWT) and Principal Component Analysis (PCA) were proposed as Pre-Processing Methods

Proceedings ArticleDOI
01 Nov 2017
TL;DR: The results in experiments show that BC speech processed by the proposed method becomes more similar to AC speech than the conventional method.
Abstract: In this paper, we propose a bone-conducted (BC) speech enhancement method using a deep neural network (DNN). Focusing on that the vocal tract components appear only on the low-order cepstrum, we restore the low-order cepstrum of BC speech so as to be close to that of AC speech. The results in experiments show that BC speech processed by the proposed method becomes more similar to AC speech than the conventional method.

Book ChapterDOI
TL;DR: The experimental results demonstrate that use of optimization technique in cepstrum filtering improve the PSNR of the restored image, but encounters some ringing effect.
Abstract: Objectives: In this paper, the image restoration technique is designed based on the Genetic algorithm (GA) and cepstrum filtering Methods/Statistical Analysis: We used the cepstrum method to find out motion blur parameters angle and length of spectrum according to the observed image for the cepstrum filtering Findings: The GA as an optimization strategy can adjust the parameters and provide an appropriate value of theta and length Optimized values of theta and length help to compute the PSF which is close to the real value of PSF as it increases the PSNR The experimental results demonstrate that use of optimization technique in cepstrum filtering improve the PSNR of the restored image, but encounters some ringing effect

Journal ArticleDOI
TL;DR: Experiments on real periodic vibration signals generated by an electric hammer under different collecting distances and transmission medias are conducted to show the superiority of the proposed distance estimation method in this paper.

Journal ArticleDOI
TL;DR: Cepstrum analysis on PD signals combined with artificial neural network (ANN) is proposed to classify the PD types from different PD sources simultaneously under noisy condition, and it is found that CepStrum–ANN yields the highest classification accuracy for noisy PD signals than the other methods tested.
Abstract: In high-voltage equipment insulation, multiple partial discharge (PD) sources may exist at the same time. Therefore, it is important to identify PDs from different PD sources under noisy condition in insulations, with the highest accuracy. Although many studies on classifying different PD types in insulation have been performed, some signal processing methods have not been used in the past for this application. Thus, in this work, Cepstrum analysis on PD signals combined with artificial neural network (ANN) is proposed to classify the PD types from different PD sources simultaneously under noisy condition. Measurement data from different sources of artificial PD signals were recorded from insulation materials. Feature extractions were performed on the recorded signals, including Cepstrum analysis, discrete wavelet transform, discrete Fourier transform, and wavelet packet transform for comparison between the different methods. The features extracted were used to train the ANN. To investigate the classification accuracy under noisy signals, the remaining data were corrupted with artificial noise. The noisy data were classified using the ANN, which had been trained by noise-free PD signals. It is found that Cepstrum–ANN yields the highest classification accuracy for noisy PD signals than the other methods tested. © 2016 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

Journal ArticleDOI
23 Sep 2017-Sensors
TL;DR: A monaural sound localization system based on the reflective structure around the microphone and the software toolchain from propagation physics and algorithm simulation realizes the optimal 3D-printed structure.
Abstract: The asymmetric structure around the receiver provides a particular time delay for the specific incoming propagation. This paper designs a monaural sound localization system based on the reflective structure around the microphone. The reflective plates are placed to present the direction-wise time delay, which is naturally processed by convolutional operation with a sound source. The received signal is separated for estimating the dominant time delay by using homomorphic deconvolution, which utilizes the real cepstrum and inverse cepstrum sequentially to derive the propagation response’s autocorrelation. Once the localization system accurately estimates the information, the time delay model computes the corresponding reflection for localization. Because of the structure limitation, two stages of the localization process perform the estimation procedure as range and angle. The software toolchain from propagation physics and algorithm simulation realizes the optimal 3D-printed structure. The acoustic experiments in the anechoic chamber denote that 79.0% of the study range data from the isotropic signal is properly detected by the response value, and 87.5% of the specific direction data from the study range signal is properly estimated by the response time. The product of both rates shows the overall hit rate to be 69.1%.

Journal ArticleDOI
TL;DR: A musical-instrument classification method that couples the optimized phase-space reconstruction (OPSR) with a flexible neural tree (FNT) is proposed that gives superior performance over other comparable algorithms and can classify 12 musical instruments with an accuracy of 98.2 %.
Abstract: Traditional musical-instrument classification methods mainly use regions in the time or/and frequency characteristics, cepstrum characteristics, and MPEG-7 characteristics, and they often lead to erroneous classification. Therefore, there is need to develop a more suitable method that is more applicable to the nonlinear characteristics of musical-instrument signals and can avoid the abovementioned problems. In this paper, a musical-instrument classification method that couples the optimized phase-space reconstruction (OPSR) with a flexible neural tree (FNT) is proposed. As per nonlinear dynamic theory, a principal component analysis and correlation coefficient are used to optimize the phase-space reconstruction (PSR) method. Multidimensional PSR results for different musical-instrument signals are extracted as the main components, and the dimensionality is reduced by the OPSR method. A probability density function (PDF) is introduced in the feature extraction step to differentiate musical instruments according to the phase-space-reconstructible characteristics. A FNT is adopted as a classifier to tackle the variability in musical-instrument signals and to improve the adaptive ability of various target classification problems. Experimental testing has been conducted to show that the proposed OPSR–PDF–FNT algorithm gives superior performance over other comparable algorithms and can classify 12 musical instruments with an accuracy of 98.2 %.

Patent
15 Dec 2017
TL;DR: In this paper, a Gammatone filter is used to effectively simulate characteristics of a basal membrane of a human ear, voice signals are subjected to frequency division processing, and two-ear cross-correlation delay is estimated under a reverberation environment.
Abstract: The invention provides a bi-ear time delay estimating method based on frequency division and improved generalized cross correlation in reverberation environment, and relates to the field of sound source positioning. A Gammatone filter is used to effectively simulate characteristics of a basal membrane of a human ear, voice signals are subjected to frequency division processing, and two-ear cross-correlation delay is estimated under a reverberation environment. Compared with a generalized cross correlation delay estimating method, the method can estimate time delay more accurately. The sound source positioning system has better robustness. A Gammatone filter is used to conduct frequency dividing processing for bi-ear signals, and each sub-band signal is subjected to inverse transformation to a time domain after reverberation processing of cepstrum and pre-filtering. Each sub-band signal of left and right ears are subjected to generalized cross correlation operation, an improved phase transformation weight function is employed in a generalized cross correlation algorithm to obtain cross correlation value of each sub-band for summing operation, and the bi-ear time difference corresponding to maximal cross correlation value is obtained.

Journal ArticleDOI
TL;DR: A representation of the spectral characteristics of the vocal tract system in the form of Hilbert envelope of the numerator of group delay (HNGD) spectrum is explored and it is observed that better performances are achieved on the HNGD spectrum than the DFT spectrum.
Abstract: This work explores the characteristics of speech in terms of the spectral characteristics of vocal tract system for deriving features effective for clean speech and speech with background music classification. A representation of the spectral characteristics of the vocal tract system in the form of Hilbert envelope of the numerator of group delay (HNGD) spectrum is explored for the task. This representation complements the existing methods of computing the spectral characteristics in terms of the temporal resolution. This spectrum has an additive and high resolution property which gives a better representation of the formants especially the higher ones. A feature is extracted from the HNGD spectrum which is known as the spectral contrast across the sub-bands and this feature essentially represents the relative spectral characteristics of the vocal tract system. The vocal tract system is also represented approximately in terms of the mel frequency cepstral coefficients (MFCCs) which represent the average spectral characteristics. The MFCCs and the sum of the spectral contrast on HNGD can be used as features to represent the average and relative spectral characteristics of the vocal tract system, respectively. These features complement each other and can be combined in a multidimensional framework to provide good discrimination between clean speech and speech with background music segments. The spectral contrast on HNGD spectrum is compared to the spectral contrast on discrete fourier transform (DFT) spectrum, which also represents the relative spectral characteristics of the vocal tract system. It is observed that better performances are achieved on the HNGD spectrum than the DFT spectrum. The features are classified using classifiers like Gaussian mixture models and support vector machines.

Proceedings ArticleDOI
01 Nov 2017
TL;DR: The results are projected to indicate that the combined MFCC feature with its SDC component with size 52 has provided the better results using GMM as a modeling technique, as compared to the basic MFPLP feature of size 13 using clustering as a modeled technique.
Abstract: Identifying the spoken language from the speech is the emerging research area. For this task of language identification, experiments are implemented with two approaches such as Vector quantization (VQ) based clustering and Gaussian mixture modelling (GMM) with Mel frequency linear predictive cepstrum (MFPLPC), Mel frequency cepstrum (MFCC) and their shifted delta cepstral (SDC) features. Hypothesized language is identified based on minimum of averages and maximum log likelihood value corresponding to the model using minimum distance and Maximum a posteriori probability (MAP) classifiers. Better performance is observed for the basic feature MFPLP and VQ based clustering. The results are projected to indicate that the combined MFCC feature with its SDC component with size 52 has provided the better results using GMM as a modeling technique. Similarly, the combined MFPLP feature with its SDC component of size 52 provides next higher results as compared to the basic MFPLP feature of size 13 using clustering as a modeling technique. Overall performance of the system obtained is 99.81%. The database considered in this work contains speech utterances in seven classical and phonetically rich speaker specific Indian languages such as Bengali, Hindi, Kannada, Malayalam, Marathi, Tamil and Telugu.