scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 2008"


Proceedings ArticleDOI
12 May 2008
TL;DR: A novel feature set for speaker recognition that is based on the voice source signal that is robust to LPC analysis errors and low-frequency phase distortion and compares favourably to other proposed voice source feature sets.
Abstract: We propose a novel feature set for speaker recognition that is based on the voice source signal. The feature extraction process uses closed-phase LPC analysis to estimate the vocal tract transfer function. The LPC spectrum envelope is converted to cepstrum coefficients which are used to derive the voice source features. Unlike approaches based on inverse-filtering, our procedure is robust to LPC analysis errors and low-frequency phase distortion. We have performed text-independent closed-set speaker identification experiments on the TIMIT and the YOHO databases using a standard Gaussian mixture model technique. Compared to using mel- frequency cepstrum coefficients, the misclassification rate for the TIMIT database reduced from 1.51% to 0.16% when combined with the proposed voice source features. For the YOHO database the mis- classification rate decreased from 13.79% to 10.07%. The new feature vector also compares favourably to other proposed voice source feature sets.

89 citations


Journal ArticleDOI
TL;DR: The proposed method is an extension of the homomorphic deconvolution, which is used here only to compute the initial estimate of the point-spread function, and gives stable results of clearly higher spatial resolution and better defined tissue structures than in the input images and than the results of the Homomorphic deconVolution alone.
Abstract: A new approach to 2-D blind deconvolution of ultrasonic images in a Bayesian framework is presented. The radio-frequency image data are modeled as a convolution of the point-spread function and the tissue function, with additive white noise. The deconvolution algorithm is derived from statistical assumptions about the tissue function, the point-spread function, and the noise. It is solved as an iterative optimization problem. In each iteration, additional constraints are applied as a projection operator to further stabilize the process. The proposed method is an extension of the homomorphic deconvolution, which is used here only to compute the initial estimate of the point-spread function. Homomorphic deconvolution is based on the assumption that the point-spread function and the tissue function lie in different bands of the cepstrum domain, which is not completely true. This limiting constraint is relaxed in the subsequent iterative deconvolution. The deconvolution is applied globally to the complete radiofrequency image data. Thus, only the global part of the point-spread function is considered. This approach, together with the need for only a few iterations, makes the deconvolution potentially useful for real-time applications. Tests on phantom and clinical images have shown that the deconvolution gives stable results of clearly higher spatial resolution and better defined tissue structures than in the input images and than the results of the homomorphic deconvolution alone.

68 citations


Journal ArticleDOI
TL;DR: The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic music signal and automatic sound-to-MIDI conversion and compared with manually annotated MIDI data.
Abstract: This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic music signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic music signals and compared with manually annotated MIDI data.

60 citations


Journal Article
TL;DR: An integrated system for ECG diagnosis that combines combines coefficient method for feature extraction from long-term ECG signals and artificial neural network (ANN) models for the classification is presented in this paper.
Abstract: An integrated system for ECG diagnosis that combines combines coefficient method for feature extraction from long-term ECG signals and artificial neural network (ANN) models for the classification is presented in this paper. Unlike the previous methods using only one single heartbeat for analysis, we analyze a meaningful segment ECG data, usually containing 5-6 heartbeats, to obtain the corresponding cepstrum coefficients and classify the cardiac systems through ANN models. Utilizing the proposed method, one con identify the characteristics hiding inside an ECG signal and then classify the signal as well as diagnose the abnormalities. To evaluate this method, various types of ECG data from the MIT/BIH database were used for verification. The experimental results showed that the accuracy of diagnosing cardiac disease was above 97.5%. The proposed method successfully extracted the corresponding feature vectors, distinguished the difference and classified ECG signals.

38 citations


01 Jan 2008
TL;DR: Algorithms for real-time pitch detection that generalise well over a range of single ‘voiced’ musical instruments and some novel ways of visualising the pitch and vibrato information are presented.
Abstract: Precise pitch is important to musicians. We created algorithms for real-time pitch detection that generalise well over a range of single ‘voiced’ musical instruments. A high pitch detection accuracy is achieved whilst maintaining a fast response using a special normalisation of the autocorrelation (SNAC) function and its windowed version, WSNAC. Incremental versions of these functions provide pitch values updated at every input sample. A robust octave detection is achieved through a modified cepstrum, utilising properties of human pitch perception and putting the pitch of the current frame within the context of its full note duration. The algorithms have been tested thoroughly both with synthetic waveforms and sounds from real instruments. A method for detecting note changes using only pitch is also presented. Furthermore, we describe a real-time method to determine vibrato parameters higher level information of pitch variations, including the envelopes of vibrato speed, height, phase and centre offset. Some novel ways of visualising the pitch and vibrato information are presented. Our project ‘Tartini’ provides music students, teachers, performers and researchers with new visual tools to help them learn their art, refine their technique and advance their fields.

37 citations


Patent
02 Dec 2008
TL;DR: In this article, a method and apparatus for vibration-based automatic condition monitoring of a wind turbine, comprising the steps of determining a set of vibration measurement values of the wind turbine; calculating a frequency spectrum of the set of vibrations; selecting at least one quefrency in the frequency spectrum; and detecting an alarm condition based upon an amplitude of the cepstrum at the selected queferency, and a wind turbines therefor.
Abstract: Method and apparatus for vibration-based automatic condition monitoring of a wind turbine, comprising the steps of: determining a set of vibration measurement values of the wind turbine; calculating a frequency spectrum of the set of vibration measurement values; calculating a cepstrum of the frequency spectrum; selecting at least one quefrency in the cepstrum, and detecting an alarm condition based upon an amplitude of the cepstrum at the selected quefrency, and a wind turbine therefor.

31 citations


Patent
18 Jun 2008
TL;DR: In this paper, an identification method of fundamental frequency for detection of the cable of a stayed-cable bridge, which gets a first fundamental frequency by an autopower spectrum module, a second fundamental value by a cepstrum module, and then determines whether the quotient of the following two values is less than or equal to the set threshold: 1.
Abstract: The invention discloses an identification method of fundamental frequency for detection of the force of the cable of a stayed-cable bridge, which gets a first fundamental frequency by an autopower spectrum module, a second fundamental frequency by a cepstrum module, and then determines whether the quotient of the following two values is less than or equal to the set threshold: 1. the absolute value of the difference of the first fundamental frequency and the second fundamental frequency; 2. a half of the sum of the first fundamental frequency and the second fundamental frequency. In this way, whether the fundamental frequency of the pull cable is a half of the first fundamental frequency and the second fundamental frequency is determined. By adopting the method, the accuracy and the precision of the fundamental frequencies got are significantly improved. In addition, as the vibration acceleration response time interval signal got by an acceleration sensor passes through a signal conditioning module for filtration and smooth processing firstly, the environmental noise in the vibration acceleration response time interval signal is effectively suppressed; the antijamming capability of the stayed-cable bridge is improved so that the fundamental frequency is identified more clearly and accurately.

30 citations


Journal ArticleDOI
TL;DR: The study demonstrates that the SIFT algorithm has the potential to be a reliable and robust method for the estimation of MTBS in the presence of a small signal-to-noise ratio, a large spacing variation between regular scatterers, and a large scattering strength ratio of diffuse scatterer to regular ones.
Abstract: Ultrasonic backscatter signals provide useful information relevant to bone tissue characterization Trabecular bone microstructures have been considered as quasi-periodic tissues with a collection of regular and diffuse scatterers This paper investigates the potential of a novel technique using a simplified inverse filter tracking (SIFT) algorithm to estimate mean trabecular bone spacing (MTBS) from ultrasonic backscatter signals In contrast to other frequency-based methods, the SIFT algorithm is a time-based method and utilizes the amplitude and phase information of backscatter echoes, thus retaining the advantages of both the autocorrelation and the cepstral analysis techniques The SIFT algorithm was applied to backscatter signals from simulations, phantoms, and bovine trabeculae in vitro The estimated MTBS results were compared with those of the autoregressive (AR) cepstrum and quadratic transformation (QT) The SIFT estimates are better than the AR cepstrum estimates and are comparable with the QT values The study demonstrates that the SIFT algorithm has the potential to be a reliable and robust method for the estimation of MTBS in the presence of a small signal-to-noise ratio, a large spacing variation between regular scatterers, and a large scattering strength ratio of diffuse scatterers to regular ones

26 citations


Proceedings ArticleDOI
07 Jul 2008
TL;DR: Experimental results and theoretical verification show that the proposed steganalysis algorithm can steadily achieve a correct classification rate of 85%.
Abstract: Audio steganalysis has attracted more attentions recently. Echo steganalysis is one of the most challenging research fields. In this paper, an effective steganalysis method based on statistical moments of peak frequency is proposed. Combined with power cepstrum, it statistically analyzes the peak frequency using short window extracting, and then calculates the eight high order center moments of peak frequency as feature vector. The SVM classifier is utilized in classification. All of the 1200 audio signals are trained and tested in out extensive experiment work. With randomly selected 600 audio signals for training and remaining 600 audio signals for testing, and with various embedding parameters combinations such as hiding segment length, attenuation coefficient, echo delay for hiding, the proposed steganalysis algorithm can steadily achieve a correct classification rate of 85%. Experimental results and theoretical verification show that this method is an effective method of audio echo steganalysis.

25 citations


Journal ArticleDOI
TL;DR: The sound analysis software PsySound3, which was written by the authors, is demonstrated, which includes a range of DSP‐based analysis techniques, as well as implementations of psychoacoustical algorithms often associated with sound quality.
Abstract: This paper demonstrates the sound analysis software PsySound3, which was written by the authors. The software currently includes a range of DSP‐based analysis techniques (e.g., spectrum, cepstrum, autocorrelation, Hilbert transform, sound level meter emulator), as well as implementations of psychoacoustical algorithms often associated with sound quality (e.g., loudness, sharpness, loudness fluctuation, roughness, pitch, binaural attributes). In some cases, PsySound3 makes available multiple models of the one auditory attribute ‐ for example it implements dynamic and static loudness models using Erb‐ and Bark‐based auditory filters. The program is extensible, and so has the potential to allow researchers to share their analysis models using a common interface. PsySound3 is written in Matlab, and is also available as a stand‐alone program. The software is freely available from www.psysound.org.

23 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: Experimental results shows that the proposed audio watermarking scheme is not only imperceptible, but also robust against various common signal processing attacks such as noise adding, re-sampling, low-pass filtering,Re-quantization, MP3 compression and cropping.
Abstract: A novel audio watermarking algorithm is proposed in this paper for audio copyright protection. The method is based on cepstrum domain transform. This algorithm embeds the watermark data into original audio signal using mean quantization of cepstrum coefficients. Experimental results shows that our audio watermarking scheme is not only imperceptible, but also robust against various common signal processing attacks such as noise adding, re-sampling, low-pass filtering, re-quantization, MP3 compression and cropping. In addition, the algorithm can extract the watermark without the help of original audio signal, and its performance is better than cepstrum domain audio watermarking scheme based on statistical mean manipulation.

Proceedings ArticleDOI
20 Jun 2008
TL;DR: An effective steganalysis method based on statistical moments of peak frequency is proposed Combined with power cepstrum, it statistically analyzes the peak frequency using short window extracting, and then calculates the second order and third order center moments of thepeak frequency as feature vector.
Abstract: Audio steganalysis has attracted more attentions recently. Echo steganalysis is one of the most challenging research fields. In this paper, an effective steganalysis method based on statistical moments of peak frequency is proposed. Combined with power cepstrum, it statistically analyzes the peak frequency using short window extracting, and then calculates the second order and third order center moments of the peak frequency as feature vector. The Bayes classifier is utilized in classification. All of the 1200 audio signals are trained and tested in our extensive experiment work. With randomly selected 600 audios for training and remaining 600 audios for testing, and with various embedding parameters combinations such as hiding segment length, attenuation coefficient, echo delay for hiding, the proposed steganalysis algorithm can steadily achieve a correct classification rate of 80%, thus indicating significant advancement in steganalysis.

Journal ArticleDOI
TL;DR: This paper shows that the use of the cepstrum to determine the pitch of a signal does not work on periodic signals.
Abstract: In a paper by A. Michael Noll [J. Acoust. Soc. Am. 41, 293–309 (1967)], the use of the cepstrum was proposed to determine the pitch of a signal. This paper shows that such a method does not work on periodic signals.

Journal Article
TL;DR: Robust Speech Recognition Using Perceptual Wavelet Denoising and Mel-frequency Product Spectrum Cepstral Coefficient Features is presented.
Abstract: Robust Speech Recognition Using Perceptual Wavelet Denoising and Mel-frequency Product Spectrum Cepstral Coefficient Features

Proceedings ArticleDOI
08 Dec 2008
TL;DR: Compared with high performance feature extraction method MFCC (Mel-frequency cepstrum coefficient), the proposed Haar-like filtering can be approximately 85.77% efficient in terms of the amount of add and multiply calculations while capable of achieving the error rate of only 3.03% relative to MFCC.
Abstract: Haar-like filtering based speech detection is proposed as a new and very low calculation cost method for sensornet applications. The simple Haar-like filters having variable filter width and shift width are trained to learn appropriate filter parameters from the training samples to detect speech. Our method yielded speech/nonspeech classification accuracy of 96.93% for the input length of 0.1s. Compared with high performance feature extraction method MFCC (Mel-frequency cepstrum coefficient), the proposed Haar-like filtering can be approximately 85.77% efficient in terms of the amount of add and multiply calculations while capable of achieving the error rate of only 3.03% relative to MFCC.

Journal ArticleDOI
TL;DR: It is shown that the kepstrum approach contributes to speech enhancement and noise cancellation with an improved performance in signal-to-noise ratio (SNR) by using only two microphones with a small physical dimension in size.

01 Jan 2008
TL;DR: Results presented in this paper show that cepstral trajectories corresponding to lower (3-14 Hz) modulation frequencies provide best discrimination.
Abstract: This paper proposes using modulation cepstrum coefficients instead of cepstral coefficients for extracting metadata information such as age and gender. These coefficients are extracted by applying discrete cosine transform to a time-sequence of cepstral coefficients. Lower order coefficients of this transformation represent smooth cepstral trajectories over time. Results presented in this paper show that cepstral trajectories corresponding to lower (3-14 Hz) modulation frequencies provide best discrimination. The proposed system achieves 50.2% overall accuracy for this 7-class task while accuracy of human labelers on a subset of evaluation material used in this work is 54.7%.

Proceedings Article
01 Jan 2008
TL;DR: A time division switching circuit with time slot interchange uses an input shift register to convert one-frame binary coded input data of time division multiplex type from an incoming line into a parallel bit output.
Abstract: The majority of speech signal analysis procedures for automatic pathology detection mostly rely on parameters extracted from time-domain processing. Moreover, calculation of these parameters often requires prior pitch period estimation; therefore, their validity heavily depends on the robustness of pitch detection. Within this paper, an alternative approach based on cepstral-domain processing is presented which has the advantage of not requiring pitch estimation, thus providing a gain in both simplicity and robustness. While the proposed scheme is similar to solutions based on Mel-frequency cepstral parameters, already present in literature, it has an easier physical interpretation while achieving similar performance standards.

Proceedings ArticleDOI
12 May 2008
TL;DR: The results demonstrate that the Mel frequency based true envelope estimator achieves superior envelope estimation with significantly reduced model order.
Abstract: In this work we consider the problem of spectral envelope estimation using spectra with perceptually warped frequency axis. The goal of this work is the reduction of the order of the spectral envelope model which will facilitate the use of these envelopes for training of voice conversion systems. We adapt the true-envelope estimator to Mel-frequency representations and adapt a recently proposed cepstral model order selection criterion taking into account the distortion of the frequency axis. We evaluate the modified order selection procedure using a perceptual framework for the evaluation of envelope estimation errors. The experimental evaluation carried out with real speech confirms our modifications. The results demonstrate that the Mel frequency based true envelope estimator achieves superior envelope estimation with significantly reduced model order.

Proceedings ArticleDOI
12 May 2008
TL;DR: It is proved that the direction of cepstrum vectors strongly depends on vocal tract length and that this dependency is represented as rotation in the n dimensional cepStrum space.
Abstract: IN this paper, we prove that the direction of cepstrum vectors strongly depends on vocal tract length and that this dependency is represented as rotation in the n dimensional cepstrum space. In speech recognition studies, vocal tract length normalization (VTLN) techniques are widely used to cancel age- and gender-differences. In VTLN, a frequency warping is often carried out and it can be implemented as a linear transformation in a cepstrum space; c = Ac. However, the geometric properties of this transformation matrix A have not been well discussed. In this study, its properties are made clear using n dimensional geometry and it is shown that the matrix rotates any cepstrum vector similarly and apparently. Experimental results using resynthesized speech demonstrate that cepstrum vectors extracted from a speaker of 180 [cm] in height and those from another speaker of 120 [cm] in height are reasonably orthogonal. This result makes clear one of the reasons why children's speech is very difficult for conventional speech recognizers to deal with adequately.

Proceedings ArticleDOI
17 Mar 2008
TL;DR: To meet the problem of the energy consumption of sensor nodes and privacy concerns for wearers and non-wearers, "siglet" sensing is proposed and speech siglet detection is studied.
Abstract: "Business Microscope" is a tool which provides knowledge workers with a bird-eye view of their daily communication. To meet the problem of the energy consumption of sensor nodes and privacy concerns for wearers and non-wearers, "siglet" sensing is proposed. Siglet sensing is a way to capture very short and noise-like signals by sensors operating on a low duty ratio. To extract the useful information on workers' communication, speech siglet detection is studied. The LBG trained speech and workplace nonspeech models with Mel frequency cepstrum coefficients (MFCCs) as feature vectors are utilized. A hierarchical pruning technique is studied to reduce the calculation cost of the matching process to nearly 25% and refine the classification accuracy. Our approach achieved average speech and nonspeech classification accuracy of 99.96% on 0. Is long test siglets.

Proceedings ArticleDOI
18 Jun 2008
TL;DR: This scheme attempts to simulate a biological model using the averaged cepstrum, where human perception tends to pick up the areas of large cepstral changes.
Abstract: Driven by the demand of information retrieval, video editing and human-computer interface, in this paper we propose a novel spectral feature for music and speech discrimination. This scheme attempts to simulate a biological model using the averaged cepstrum, where human perception tends to pick up the areas of large cepstral changes. The cepstrum data that is away from the mean value will be exponentially reduced in magnitude. We conduct experiments of music/speech discrimination by comparing the performance of the proposed feature with that of previously proposed features in classification. The dynamic time warping based classification verifies that the proposed feature has the best quality of music/speech classification in the test database.

Journal ArticleDOI
TL;DR: Experimental results show that acoustic speech features can be predicted from MFCC vectors with good accuracy, and an alternative scheme that substitutes the higher-order MFCCs with acoustic features for transmission delivers accurate acoustic features but at the expense of a significant reduction in speech recognition accuracy.
Abstract: The aim of this work is to develop methods that enable acoustic speech features to be predicted from mel-frequency cepstral coefficient (MFCC) vectors as may be encountered in distributed speech recognition architectures. The work begins with a detailed analysis of the multiple correlation between acoustic speech features and MFCC vectors. This confirms the existence of correlation, which is found to be higher when measured within specific phonemes rather than globally across all speech sounds. The correlation analysis leads to the development of a statistical method of predicting acoustic speech features from MFCC vectors that utilizes a network of hidden Markov models (HMMs) to localize prediction to specific phonemes. Within each HMM, the joint density of acoustic features and MFCC vectors is modeled and used to make a maximum a posteriori prediction. Experimental results are presented across a range of conditions, such as with speaker-dependent, gender-dependent, and gender-independent constraints, and these show that acoustic speech features can be predicted from MFCC vectors with good accuracy. A comparison is also made against an alternative scheme that substitutes the higher-order MFCCs with acoustic features for transmission. This delivers accurate acoustic features but at the expense of a significant reduction in speech recognition accuracy.

Journal Article
TL;DR: A novel register array based low power FFT processor for Mel Frequency Cepstral Coefficient (MFCC) is proposed, which can reduce more power consumption and is very attractive for the speech feature extraction of MFCC.
Abstract: Fast Fourier Transform (FFT) plays an important role in the field of digital signal processing. High performance FFT processors are widely used in different application, such as speech processing, image processing, and communication system. In this paper, we proposed a novel register array based low power FFT processor for Mel Frequency Cepstral Coefficient (MFCC). Compared with [9-12], this novel architecture can reduce more power consumption. This approach is very attractive for the speech feature extraction of MFCC.

Proceedings ArticleDOI
12 May 2008
TL;DR: Comparisons of the method previously proposed of estimating fundamental frequency (F0) based on complex cepstrum analysis with nine typical methods over huge speech-sound datasets in both artificial and realistic reverberant environments demonstrated that it was much better than the previously reported methods in terms of robustness and providing accurate F0 estimates.
Abstract: This paper reports comparative evaluations of the method we previously proposed of estimating fundamental frequency (F0) based on complex cepstrum analysis with nine typical methods over huge speech-sound datasets in both artificial and realistic reverberant environments (in room acoustics). They involve several classic algorithms (Cepstrum, AMDF, TPC, and modified autocorrelation) and a few modern algorithms (TEMPO, YIN, and PHIA). The comparative results revealed that the percentage correct rates of the estimated FOs using them were drastically reduced as the reverberation time increased while Fo estimated with the proposed method was completely robust and accurate. They also demonstrated that homomorphic analysis and the concept of a source-filter model were relatively effective for estimating Fo. The results also demonstrated that it was much better than the previously reported methods in terms of robustness and providing accurate F0 estimates in both artificial and realistic reverberant environments.

Proceedings ArticleDOI
16 May 2008
TL;DR: In this article, the impulse response of the sounder in the cepstrum domain, which shows physical features of targets, can be obtained from the radiated-noise of targets using LPC and Mel Cepstrums.
Abstract: A passive sonar target can be regarded as a sounder from the viewpoint of sonar operators. The impulse response of the sounder in the cepstrum domain, which shows physical features of targets, can be obtained from the radiated-noise of targets using LPC cepstrum and Mel cepstrum. A set of cepstrum-domain features is extracted based on the above two cepstrums of the impulse response. The neural network target classifier was designed using cepstrum-domain features. The classification experiments were carried out for three different kinds of targets based on practical data. The experimental results show that the feature extraction method based on two cepstrums are useful.

Journal Article
TL;DR: A new method of feature extraction, viz., Teager Energy Based Mel Frequency Cepstral Coefficients (T-MFCC) is developed for identification of perceptually similar languages and an LID system is presented for Hindi and Urdu.
Abstract: Language Identification (LID) refers to the task of identifying an unknown language from the test utterances. In this paper, a new method of feature extraction, viz., Teager Energy Based Mel Frequency Cepstral Coefficients (T-MFCC) is developed for identification of perceptually similar languages. Finally, an LID system is presented for Hindi and Urdu (perceptually similar Indian languages) to demonstrate effectiveness of newly proposed feature set with short discussion on experimental results. KeywordsLanguage identification, Teager Energy Operator (TEO), Mel cepstrum, polynomial classifier, discriminative training.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: Experimental results indicate that the proposed set of noise-robust features based on conventional MFCC feature extraction method leads to improved ASR performance in noisy environments and its computational overhead is quite small.
Abstract: Mel-frequency cepstral coefficients (MFCC) are the most widely used features for speech recognition. However, MFCC-based speech recognition performance degrades in presence of additive noise. In this paper, we propose a set of noise-robust features based on conventional MFCC feature extraction method. Our proposed method consists of two steps. In the first step, mel sub-band Wiener filtering is carried out. The second step consists of estimating SNR in each sub-band and calculating the sub-band entropy by defining a weight parameter based on sub-band SNR to entropy ratio. The weighting has been carried out in a way that gives more important roles, in cepstrum parameter formation, to sub-bands that are less affected by noise. Experimental results indicate that this method leads to improved ASR performance in noisy environments. Furthermore, due to the simplicity of the implementation of our method, its computational overhead in comparison to MFCC is quite small.

Proceedings Article
01 Aug 2008
TL;DR: A new algorithm for spike detection has been developed: it applies a Cepstrum of Bispectrum (CoB) estimated inverse filter to provide blind equalization and can detect 99% of spike events with less than 1% false positives (insertions) from the extracellular signal at up to -10dB SNR.
Abstract: Signals from extracellular electrodes in neural systems record voltages resulting from activity in many neurons. Detecting action potentials (spikes) in a small number of specific (target) neurons is difficult because of poor SNR due to noise generated by the firing of neighbouring neurons. A new algorithm for spike detection has been developed: it applies a Cepstrum of Bispectrum (CoB) estimated inverse filter to provide blind equalization. This CoB based technique can detect 99% of spike events with less than 1% false positives (insertions) from the extracellular signal at up to −10dB SNR. We compare performance with four established techniques and report that the CoB based algorithm performs best.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: In this work, F-ratio is computed as a theoretical measure to validate the experimental results and also highlight the best choice of feature set among all the proposed features for 50 speakers chosen randomly from ldquoTIMITrdquo database.
Abstract: The main objective of this paper is to explore the effectiveness of features for identifying speakers. We propose features such as line spectral frequency (LSF), differential line spectral frequency (DLSF), mel frequency cepstral coefficients (MFCC), discrete cosine transform cepstrum (DCTC), perceptual linear predictive cepstrum (PLP) and mel frequency perceptual linear predictive cepstrum (MF-PLP). These features are captured and training models are developed by K-means clustering procedure. A speaker identification system is evaluated on noise added test speeches and the experimental results reveal the performance of the proposed algorithm in identifying speakers based on minimum distance between test features and clusters and also highlight the best choice of feature set among all the proposed features for 50 speakers chosen randomly from ldquoTIMITrdquo database. In this work, F-ratio is computed as a theoretical measure to validate the experimental results.