Showing papers on "Linear predictive coding published in 2009"

PDF

Open Access

Patent•

[...]

Sriram Srinivasan¹, Ashish Pandharipande¹•Institutions (1)

10 Dec 2009

Abstract: A speech signal processing system comprises an audio processor (103) for providing a first signal representing an acoustic speech signal of a speaker. An EMG processor (109) provides a second signal which represents an electromyographic signal for the speaker captured simultaneously with the acoustic speech signal. A speech processor (105) is arranged to process the first signal in response to the second signal to generate a modified speech signal. The processing may for example be a beam forming, noise compensation, or speech encoding. Improved speech processing may be achieved in particular in an acoustically noisy environment.

...read moreread less

547 citations

Proceedings Article•DOI•

Unified speech and audio coding scheme for high quality at low bitrates

[...]

Max Neuendorf, Philippe Gournay¹, Markus Multrus, Jeremie Lecomte, B. Bessette¹, R. Geiger, Stefan Bayer, Guillaume Fuchs, Johannes Hilpert, Nikolaus Rettelbach, Redwan Salami, Gerald Schuller, Roch Lefebvre¹, Grill Bernhard - Show less +10 more•Institutions (1)

Université de Sherbrooke¹

19 Apr 2009

TL;DR: This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding, which results in a codec that exhibits consistently high quality for speech, music and mixed audio content.

...read moreread less

Abstract: Traditionally, speech coding and audio coding were separate worlds. Based on different technical approaches and different assumptions about the source signal, neither of the two coding schemes could efficiently represent both speech and music at low bitrates. This paper presents a unified speech and audio codec, which efficiently combines techniques from both worlds. This results in a codec that exhibits consistently high quality for speech, music and mixed audio content. The paper gives an overview of the codec architecture and presents results of formal listening tests comparing this new codec with HE-AAC(v2) and AMR-WB+. This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding.

...read moreread less

108 citations

Proceedings Article•DOI•

Bangla Speech Recognition System Using LPC and ANN

[...]

Anup Kumar Paul, Dipankar Das¹, Md. Mustafa Kamal•Institutions (1)

University of Rajshahi¹

04 Feb 2009

TL;DR: Comparison among different structures of Neural Networks conducted here for a better understanding of the problem and its possible solutions is conducted.

...read moreread less

Abstract: This paper presents the Bangla speech recognition system. Bangla speech recognition system is divided mainly into two major parts. The first part is speech signal processing and the second part is speech pattern recognition technique. The speech processing stage consists of speech starting and end point detection, windowing, filtering, calculating the Linear Predictive Coding(LPC) and Cepstral Coefficients and finally constructing the codebook by vector quantization. The second part consists of pattern recognition system using Artificial Neural Network(ANN). Speech signals are recorded using an audio wave recorder in the normal room environment. The recorded speech signal is passed through the speech starting and end-point detection algorithm to detect the presence of the speech signal and remove the silence and pauses portions of the signals. The resulting signal is then filtered for the removal of unwanted background noise from the speech signals. The filtered signal is then windowed ensuring half frame overlap. After windowing, the speech signal is then subjected to calculate the LPC coefficient and Cepstral coefficient. The feature extractor uses a standard LPC Cepstrum coder, which converts the incoming speech signal into LPC Cepstrum feature space. The Self Organizing Map(SOM) Neural Network makes each variable length LPC trajectory of an isolated word into a fixed length LPC trajectory and thereby making the fixed length feature vector, to be fed into to the recognizer. The structures of the neural network is designed with Multi Layer Perceptron approach and tested with 3, 4, 5 hidden layers using the Transfer functions of Tanh Sigmoid for the Bangla speech recognition system. Comparison among different structures of Neural Networks conducted here for a better understanding of the problem and its possible solutions.

...read moreread less

92 citations

Patent•

Information processing apparatus, method and recording medium for generating acoustic model

[...]

Nobuyuki Washio¹•Institutions (1)

Fujitsu¹

22 Dec 2009

TL;DR: In this article, an information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers, a second speech dataset stored speech data uttering by a plurality of speakers, and a third speech dataset containing speech data to be mixed with the speech data of the second dataset.

...read moreread less

Abstract: An information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.

...read moreread less

81 citations

Dissertation•

Speech Analysis for Automatic Speech Recognition

[...]

Noelia Alcaraz Meseguer

01 Jan 2009

TL;DR: A study of the implementation of a speech generative model; whereby the speech is synthesized and recovered back from its MFCC representations, and the spectral distance between the original speech signal and the one produced from the MFCC vectors has been computed.

...read moreread less

Abstract: The classical front end analysis in speech recognition is a spectral analysis which parametrizes the speech signal into feature vectors; the most popular set of them is the Mel Frequency Cepstral Coefficients (MFCC). They are based on a standard power spectrum estimate which is first subjected to a log-based transform of the frequency axis (mel- frequency scale), and then decorrelated by using a modified discrete cosine transform. Following a focused introduction on speech production, perception and analysis, this paper gives a study of the implementation of a speech generative model; whereby the speech is synthesized and recovered back from its MFCC representations. The work has been developed into two steps: first, the computation of the MFCC vectors from the source speech files by using HTK Software; and second, the implementation of the generative model in itself, which, actually, represents the conversion chain from HTK-generated MFCC vectors to speech reconstruction. In order to know the goodness of the speech coding into feature vectors and to evaluate the generative model, the spectral distance between the original speech signal and the one produced from the MFCC vectors has been computed. For that, spectral models based on Linear Prediction Coding (LPC) analysis have been used. During the implementation of the generative model some results have been obtained in terms of the reconstruction of the spectral representation and the quality of the synthesized speech.

...read moreread less

49 citations

Proceedings Article•DOI•

A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation

[...]

Jesper B. Boldt, Daniel P. W. Ellis¹•Institutions (1)

Columbia University¹

24 Aug 2009

TL;DR: A measure based on the similarity between the time-varying spectral envelopes of target speech and system output, as measured by correlation, can provide a more meaningful evaluation measure for nonlinear speech enhancement systems, as well as providing a transparent objective function for the optimization of such systems.

...read moreread less

Abstract: Applying a binary mask to a pure noise signal can result in speech that is highly intelligible, despite the absence of any of the target speech signal. Therefore, to estimate the intelligibility benefit of highly nonlinear speech enhancement techniques, we contend that SNR is not useful; instead we propose a measure based on the similarity between the time-varying spectral envelopes of target speech and system output, as measured by correlation. As with previous correlation-based intelligibility measures, our system can broadly match subjective intelligibility for a range of enhanced signals. Our system, however, is notably simpler and we explain the practical motivation behind each stage. This measure, freely available as a small Matlab implementation, can provide a more meaningful evaluation measure for nonlinear speech enhancement systems, as well as providing a transparent objective function for the optimization of such systems.

...read moreread less

43 citations

Journal Article•DOI•

Investigation on LP-residual representations for speaker identification

[...]

Mohamed Chetouani¹, Marcos Faundez-Zanuy, Bruno Gas¹, Jean-Luc Zarader¹•Institutions (1)

Pierre-and-Marie-Curie University¹

01 Mar 2009-Pattern Recognition

TL;DR: The results show that residual features carry speaker-dependent features and the combination with the LPCC or the MFCC shows global improvements in terms of robustness under different mismatches.

...read moreread less

42 citations

Journal Article•DOI•

Data transmission over GSM voice channel using digital modulation technique based on autoregressive modeling of speech production

[...]

Bojan Kotnik, Zdenko Mezgec, Janja Svečko¹, Amor Chowdhury¹•Institutions (1)

ULTra¹

01 Jul 2009-Digital Signal Processing

TL;DR: The proposed modulation scheme was compared to the regular frequency shift keying method (FSK) and the performance improvement of ARDMA against FSK is observed at higher bit-rates in the case of three compared GSM speech coders.

...read moreread less

35 citations

Patent•

Multimode coding of speech-like and non-speech-like signals

[...]

Rongshan Yu¹, Regunathan Radhakrishnan¹, Robert L. Andersen¹, Grant Allen Davidson¹•Institutions (1)

Dolby Laboratories¹

12 Mar 2009

TL;DR: In this paper, the authors describe methods and apparatus for code excited linear prediction (CELP) audio encoding and decoding that employ linear predictive coding (LPC) synthesis filters controlled by LPC parameters.

...read moreread less

Abstract: The invention relates to the coding of audio signals that may include both speech-like and non-speech-like signal components. It describes methods and apparatus for code excited linear prediction (CELP) audio encoding and decoding that employ linear predictive coding (LPC) synthesis filters controlled by LPC parameters, a plurality of codebooks each having codevectors, at least one codebook providing an excitation more appropriate for non-speech-like signals and at least one codebook providing an excitation more appropriate for speech-like signals, and a plurality of gain factors, each associated with a codebook. The encoding methods and apparatus select from the codebooks codevectors and/or associated gain factors by minimizing a measure of the difference between the audio signal and a reconstruction of the audio signal derived from the codebook excitations. The decoding methods and apparatus generate a reconstructed output signal from the LPC parameters, codevectors, and gain factors.

...read moreread less

34 citations

Proceedings Article•DOI•

Near end listening enhancement optimized with respect to Speech Intelligibility Index

[...]

Bastian Sauert¹, Peter Vary¹•Institutions (1)

RWTH Aachen University¹

24 Aug 2009

TL;DR: A simple condition for the speech spectrum level of every subband that maximizes the SII for a given noise spectrum level is derived and used to derive a theoretical bound for a maximum achievable SII as well as a new SII optimized algorithm for near end listening enhancement.

...read moreread less

Abstract: Signal processing algorithms for near end listening enhancement allow to improve the intelligibility of clean (far end) speech for the near end listener who perceives not only the far end speech but also ambient background noise. A typical scenario is mobile communication conducted in the presence of acoustical background noise such as traffic or babble noise.

...read moreread less

28 citations

Proceedings Article•

MDCT-based coder for highly adaptive speech and audio coding

[...]

Guillaume Fuchs, Markus Multrus, Max Neuendorf, Ralf Geiger

01 Aug 2009

TL;DR: The original DFT was replaced by the state-of-art transformation MDCT, and the vector quantization by the combination of a scalar quantization and an evolved context-adaptive arithmetic coder to enhance the coding efficiency of AMR-WB+ while maintaining its high flexibility.

...read moreread less

Abstract: Coding audio material at low bit rates with a consistent quality over a wide range of signals is a current and challenging problem. The high-granularity switched speech and audio coder AMR-WB+ performs especially well for speech and mixed content by promptly adapting its coding model scheme to the signal. However, the high adaptation rate is done at the price of limited performance for non-speech signals. The aim of the paper is to enhance the coding efficiency of AMR-WB+ while maintaining its high flexibility. For this purpose, the original DFT was replaced by the state-of-art transformation MDCT, and the vector quantization by the combination of a scalar quantization and an evolved context-adaptive arithmetic coder. The improvements were measured by both objective and subjective evaluations.

...read moreread less

Journal Article•DOI•

Speaker identification based on the frame linear predictive coding spectrum technique

[...]

Jian-Da Wu¹, Bing-Fu Lin¹•Institutions (1)

National Changhua University of Education¹

01 May 2009-Expert Systems With Applications

TL;DR: The experimental results showed the GMM can achieve a better recognition rate with feature extraction using the FLPCS method, and it is suggested theGMM can complete training and identification in a very short time.

...read moreread less

Abstract: In this paper, a frame linear predictive coding spectrum (FLPCS) technique for speaker identification is presented. Traditionally, linear predictive coding (LPC) was applied in many speech recognition applications, nevertheless, the modification of LPC termed FLPCS is proposed in this study for speaker identification. The analysis procedure consists of feature extraction and voice classification. In the stage of feature extraction, the representative characteristics were extracted using the FLPCS technique. Through the approach, the size of the feature vector of a speaker can be reduced within an acceptable recognition rate. In the stage of classification, general regression neural network (GRNN) and Gaussian mixture model (GMM) were applied because of their rapid response and simplicity in implementation. In the experimental investigation, performances of different order FLPCS coefficients which were induced from the LPC spectrum were compared with one another. Further, the capability analysis on GRNN and GMM was also described. The experimental results showed GMM can achieve a better recognition rate with feature extraction using the FLPCS method. It is also suggested the GMM can complete training and identification in a very short time.

...read moreread less

Patent•

Voice conversion apparatus and method and speech synthesis apparatus and method

[...]

Masatsune Tamura¹, Masahiro Morita¹, Takehiko Kagoshima¹•Institutions (1)

Toshiba¹

20 Jul 2009

TL;DR: In this paper, a voice conversion rule is used to convert the voice quality of the source speech into the quality of a target speech using a spectral parameter of the target speech, which is then converted into a speech waveform from the converted spectral parameter.

...read moreread less

Abstract: A voice conversion apparatus stores, in a parameter memory, target speech spectral parameters of target speech, stores, in a voice conversion rule memory, a voice conversion rule for converting voice quality of source speech into voice quality of the target speech, extracts, from an input source speech, a source speech spectral parameter of the input source speech, converts extracted source speech spectral parameter into a first conversion spectral parameter by using the voice conversion rule, selects target speech spectral parameter similar to the first conversion spectral parameter from the parameter memory, generates an aperiodic component spectral parameter representing from selected target speech spectral parameter, mixes a periodic component spectral parameter included in the first conversion spectral parameter with the aperiodic component spectral parameter, to obtain a second conversion spectral parameter, and generates a speech waveform from the second conversion spectral parameter.

...read moreread less

Proceedings Article•DOI•

Limited Speech Recognition for Controlling Movement of Mobile Robot Implemented on ATmega162 Microcontroller

[...]

Dhanny Wijaya Thiang¹•Institutions (1)

Petra Christian University¹

08 Mar 2009

TL;DR: The experimental results show that the highest recognition rate that can be achieved by the system is 87%.

...read moreread less

Abstract: This paper describes about speech recognition implemented on ATmega162 microcontroller. The word (voice command) in a speech signal is applied for controlling movement of a mobile robot. The mobile robot will move according to the voice command in the speech signal. There are five commands, which are Indonesian language, used to control the movement of mobile robot. They are “maju”, “mundur”, “kiri”, “kanan”, and “stop” used to command the mobile robot to move forward, move backward, turn left, turn right and stop respectively.The methods used in this research are Linear Predictive Coding (LPC) and Hidden Markov Model (HMM). LPC is used to extract word data from a speech signal. HMM is used to recognize the word pattern data, which are extracted from a speech signal. Sampling rate of the speech signal is 8 kHz and the speech signal is sampled for 0.5 seconds. Experiments were done in several variations of observation symbol number and number of sample. The experimental results show that the highest recognition rate that can be achieved by the system is 87%. The mobile robot can move in accordance with the voice command that is given to the robot.

...read moreread less

Proceedings Article•DOI•

Adaptive linear prediction for block-based lossy image coding

[...]

Jianle Chen¹, Woo-Jin Han¹•Institutions (1)

Samsung¹

07 Nov 2009

TL;DR: This paper investigates the linear prediction method for block-based lossy image coding and proposes a method that merges linear prediction technique into H.264/AVC video coding framework and shows that the proposed technique improves coding efficiency.

...read moreread less

Abstract: Linear prediction model has been well investigated and applied in lossless image and video coding. In this paper, we investigate the linear prediction method for block-based lossy image coding and propose a method that merges linear prediction technique into H.264/AVC video coding framework. A block-based linear prediction method is designed instead of pixel-based one in order to cooperate with transform module. Furthermore, line-based linear prediction with 1D transform is developed by considering coding gain tradeoff between prediction and transform. Linear prediction model coefficients are derived by using neighboring reconstructed data with least square error method. The model coefficients implicitly embed the local texture characteristics and no bits overhead is needed for signaling the coefficients since we can derive them with same process at decoder side. We insert block-based and line-based linear prediction modes into H.264/AVC as additional intra prediction modes and select the best mode by minimum rate-distortion sense. Experimental results show that the proposed technique improves coding efficiency of H.264/AVC intra picture with average 4.3% bit saving and up to 7.0% bit saving.

...read moreread less

Proceedings Article•DOI•

Bandwidth Extension for China AVS-M standard

[...]

Jie Zhan¹, Ki-hyun Choo¹, Eunmi Oh¹•Institutions (1)

Samsung¹

19 Apr 2009

TL;DR: Subjective testing results show that the presented technology exhibits a comparable performance compared to 3GPP AMR-WB+ with the same bit-rate in the framework of Audio Video coding of China Standard (AVS) Part 10 - Mobile Speech and Audio Codec.

...read moreread less

Abstract: We proposed a new frequency domain BandWidth Extension (BWE) technology. In the new technology, FFT based frequency domain gain shaping combined with Linear Prediction Coding (LPC) based spectral envelope shaping is used for generating high frequency signals. To preserve the amount of noise component in the reconstructed band, gain reduction controlled by Spectrum Flatness Measurement (SFM) is employed. Subjective testing results show that the presented technology exhibits a comparable performance compared to 3GPP AMR-WB+ with the same bit-rate in the framework of Audio Video coding of China Standard (AVS) Part 10 - Mobile Speech and Audio Codec. This technology has been formally adopted as the artificial high band coding module in AVS P10.

...read moreread less

Patent•

Selection of scalar quantixation (SQ) and vector quantization (VQ) for speech coding

[...]

Yang Gao, Adil Benyassine

26 Jan 2009

TL;DR: In this paper, a pitch pre-processing procedure was proposed for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic, which allowed the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech input signal than would otherwise be possible.

...read moreread less

Abstract: In accordance with one aspect of the invention, a selector supports the selection of a first encoding scheme or the second encoding scheme based upon the detection or absence of the triggering characteristic in the interval of the input speech signal. The first encoding scheme has a pitch pre-processing procedure for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic. The pre-processing procedure allows the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech signal than would otherwise be possible. In accordance with another aspect of the invention, the second encoding scheme entails a long-term prediction mode for encoding the pitch on a sub-frame by sub-frame basis. The long-term prediction mode is tailored to where the generally periodic component of the speech is generally not stationary or less than completely periodic and requires greater frequency of updates from the adaptive codebook to achieve a desired perceptual quality of the reproduced speech under a long-term predictive procedure.

...read moreread less

Proceedings Article•DOI•

Applying improved spectral modeling for High Quality voice conversion

[...]

Fernando Villavicencio¹, Axel Röbel², Xavier Rodet²•Institutions (2)

Pompeu Fabra University¹, IRCAM²

19 Apr 2009

TL;DR: In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achieve High-Quality timbre conversion and shows improved spectral conversion performance as well as increased converted-speech quality when compared to Linear Prediction.

...read moreread less

Abstract: In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achieve High-Quality timbre conversion. True-Envelope based estimators allow model order selection leading to an adaptation of the spectral features to the characteristics of the speaker. Optimal residual signals can also be computed following a local adaptation of the model order in terms of the F 0 . A new perceptual criteria is proposed to measure the impact of the spectral conversion error. The proposed envelope models show improved spectral conversion performance as well as increased converted-speech quality when compared to Linear Prediction.

...read moreread less

Proceedings Article•DOI•

Grassmannian predictive coding for delayed limited feedback MIMO systems

[...]

Takao Inoue¹, Robert W. Heath¹•Institutions (1)

University of Texas at Austin¹

30 Sep 2009

TL;DR: A Grassmannian prediction and predictive coding framework for delayed feedback systems by exploiting the memory in the channel and a prediction step size optimization criterion for correlated time-series evolving on the Grassmann manifold is derived.

...read moreread less

Abstract: Limited feedback in multiple antenna wireless systems is a practical technique to obtain channel state information at the transmitter. When the channel is time-varying with memory, however, selected codeword may become outdated before its use at the transmitter. To overcome this problem, we propose a Grassmannian prediction and predictive coding framework for delayed feedback systems by exploiting the memory in the channel. A prediction step size optimization criterion for correlated time-series evolving on the Grassmann manifold is derived. The proposed predictive coding framework uses optimized prediction to account for the feedback delay. Application to delayed limited feedback multiuser multiple antenna system shows sum rate improvement and robustness to delay.

...read moreread less

Proceedings Article•DOI•

Joint estimation of short-term and long-term predictors in speech coders

[...]

Daniele Giacobello¹, Mads Græsbøll Christensen¹, Joachim Dahl¹, Søren Holdt Jensen¹, Marc Moonen² - Show less +1 more•Institutions (2)

Aalborg University¹, Katholieke Universiteit Leuven²

19 Apr 2009

TL;DR: An analysis model is proposed that jointly finds the two predictors by adding a regularization term in the minimization process to impose sparsity constraints on a high order predictor, resulting in a linear predictor that can be easily factorized into the short-term and long-term predictors.

...read moreread less

Abstract: In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a short-term and a long-term linear predictor. These two predictors are usually found in a sequential and therefore suboptimal approach. In this paper we propose an analysis model that jointly finds the two predictors by adding a regularization term in the minimization process to impose sparsity constraints on a high order predictor. The result is a linear predictor that can be easily factorized into the short-term and long-term predictors. This estimation method is then incorporated into an Algebraic Code Excited Linear Prediction scheme and shows to have a better performance than traditional cascade methods and other joint optimization methods, offering lower distortion and higher perceptual speech quality.

...read moreread less

Patent•

Voice activity detection system, method, and program product

[...]

Takashi Fukuda¹, Osamu Ichikawa¹, Masafumi Nishimura¹•Institutions (1)

IBM¹

27 Feb 2009

TL;DR: In this paper, a voice activity detection method in a low SNR environment is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech.

...read moreread less

Abstract: A voice activity detection method in a low SNR environment. The voice activity detection is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech (i) using the long-term spectrum variation component feature or (ii) using a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection.

...read moreread less

Proceedings Article•DOI•

Speech endpoint detection in strong noisy environment based on the Hilbert-Huang Transform

[...]

Zhimao Lu¹, Baisen Liu¹, Liran Shen¹•Institutions (1)

Harbin Engineering University¹

18 Sep 2009

TL;DR: A novel algorithm for speech endpoint detection based on Hilbert-Huang transform is provided and results show that the speech signal can be effective detected by this algorithm at low signal-to-noise ratio.

...read moreread less

Abstract: Speech endpoint detection in strong noise environment plays an important role in speech signal processing. Hilbert-Huang Transform (HHT) is based on the local characteristics of signals, which is an adaptive and efficient transformation method. It is particularly suitable for analyzing the non-linear and non-stationary signals such as speech signal. In this paper, we chose the noisy speech signal when the signal-to-noise ratio is negative. A novel algorithm for speech endpoint detection based on Hilbert-Huang transform is provided after analyzing the noisy speech signal. The signal is first decomposed by Empirical Mode Decomposition (EMD), and partial decomposition results are processed by Hilbert transform. The threshold of noise is estimated by analyzing the front of signal's Hilbert amplitude spectrum. The speech segments and non-speech segments can be distinguished by the threshold and the whole signal's Hilbert amplitude spectrum. Simulation results show that the speech signal can be effective detected by this algorithm at low signal-to-noise ratio.

...read moreread less

Patent•

Method and apparatus for encoding/decoding audio signal using adaptive lpc coefficient interpolation

[...]

Jong-Hoon Jeong¹, Geon-Hyoung Lee¹, Chul-woo Lee¹, Nam-Suk Lee¹, Han-gil Moon¹ - Show less +1 more•Institutions (1)

Samsung¹

29 Jan 2009

TL;DR: In this article, a method and apparatus for encoding or decoding an audio signal by adaptively interpolating a linear predictive coding (LPC) coefficient is presented, depending on whether a transient section is present in a current frame, thereby preventing noise from occurring when interpolating LPC coefficients in the transient section.

...read moreread less

Abstract: Provided are a method and apparatus for encoding or decoding an audio signal by adaptively interpolating a linear predictive coding (LPC) coefficient. In the method and apparatus of encoding or decoding an audio signal, LPC coefficient interpolation is selectively performed depending on whether a transient section is present in a current frame, thereby preventing noise from occurring when interpolating LPC coefficients in the transient section.

...read moreread less

Proceedings Article•DOI•

Evaluation of pitch estimation in noisy speech for application in non-intrusive speech quality assessment

[...]

Dushyant Sharma¹, Patrick A. Naylor¹•Institutions (1)

Imperial College London¹

24 Aug 2009

TL;DR: This paper evaluates the performance of four established state-of-the-art algorithms for pitch estimation in additive noise and reverberation and shows how accurate estimation of the pitch of a speech signal can influence objective speech quality measurement algorithms.

...read moreread less

Abstract: Pitch estimation has a central role in many speech processing applications. In voiced speech, pitch can be objectively defined as the rate of vibration of the vocal folds. However, pitch is an inherently subjective quantity and cannot be directly measured from the speech signal. It is a nonlinear function of the signal's spectral and temporal energy distribution. A number of methods for pitch estimation have been developed but none can claim to work accurately in the presence of high levels of additive noise or reverberation. Any system of practical importance must be robust to additive noise and reverberation as these are encountered frequently in the field of operation of voice telecommunications systems. In non-intrusive speech quality measurement algorithms, such as the P.563 and LCQA, pitch is used as a feature for quality assessment. The accuracy of this feature in noisy speech signals will be shown to correlate with the accuracy of the objective measure of the quality of the speech signal. In this paper we evaluate the performance of four established state-of-the-art algorithms for pitch estimation in additive noise and reverberation. Furthermore, we show how accurate estimation of the pitch of a speech signal can influence objective speech quality measurement algorithms.

...read moreread less

Proceedings Article•DOI•

Amplification of signal features using variance fractal dimension trajectory

[...]

Witold Kinsner¹, Warren Grieder¹•Institutions (1)

University of Manitoba¹

15 Jun 2009

TL;DR: This paper describes how the selection of parameters for the variance fractal dimension (VFD) multiscale time-domain algorithm can create an amplification of the fractal Dimension trajectory that is obtained for a natural-speech waveform in the presence of ambient noise.

...read moreread less

Abstract: This paper describes how the selection of parameters for the variance fractal dimension (VFD) multiscale time-domain algorithm can create an amplification of the fractal dimension trajectory that is obtained for a natural-speech waveform in the presence of ambient noise. The technique is based on the variance fractal dimension trajectory (VFDT) algorithm that is used not only to detect the external boundaries of an utterance, but also its internal pauses representing the unvoiced speech. The VFDT algorithm can also amplify internal features of phonemes. This fractal feature amplification is accomplished when the time increments are selected in a dyadic manner rather than selecting the increments in a unit distance sequence. These amplified trajectories for different phonemes are more distinct, thus providing a better characterization of the individual segments in the speech signal. This approach is superior to other energy-based boundary-detection techniques. These observations are based on extensive experimental results on speech utterances digitized at 44.1 kilosamples per second, with 16 bits in each sample.

...read moreread less

Journal Article•DOI•

Compression of Multidimensional Biomedical Signals With Spatial and Temporal Codebook-Excited Linear Prediction

[...]

E.S.G. Carotti, J.C. De Martin, R. Merletti, Dario Farina¹•Institutions (1)

Aalborg University¹

28 Jul 2009-IEEE Transactions on Biomedical Engineering

TL;DR: A method for exploiting both the temporal and spatial redundancy, typical of multidimensional biomedical signals, has been proposed and proved to be superior to previous coding schemes.

...read moreread less

Abstract: In this paper, we propose a model-based lossy coding technique for biomedical signals in multiple dimensions. The method is based on the codebook-excited linear prediction approach and models signals as filtered noise. The filter models short-term redundancy in time; the shape of the power spectrum of the signal and the residual noise, quantized using an algebraic codebook, is used for reconstruction of the waveforms. In addition to temporal redundancy, redundancy in the coding of the filter and residual noise across spatially related signals is also exploited, yielding better compression performance in terms of SNR for a given bit rate. The proposed coding technique was tested on sets of multichannel electromyography (EMG) and EEG signals as representative examples. For 2-D EMG recordings of 56 signals, the coding technique resulted in SNR greater than 3.4 plusmn 1.3 dB with respect to independent coding of the signals in the grid when the compression ratio was 89%. For EEG recordings of 15 signals and the same compression ratio as for EMG, the average gain in SNR was 2.4 plusmn 0.1 dB. In conclusion, a method for exploiting both the temporal and spatial redundancy, typical of multidimensional biomedical signals, has been proposed and proved to be superior to previous coding schemes.

...read moreread less

Patent•

Method and apparatus to encode and decode an audio/speech signal

[...]

Eun-Mi Oh¹, Jung-Hoe Kim¹, Ki-Hyun Choo¹, Ho-Sang Sung¹, Miyoung Kim¹ - Show less +1 more•Institutions (1)

Samsung¹

14 Jul 2009

TL;DR: In this paper, a method and apparatus to encode and decode an audio/speech signal is described, where an inputted audio signal or speech signal may be transformed into at least one of a high frequency resolution signal and a high temporal resolution signal.

...read moreread less

Abstract: A method and apparatus to encode and decode an audio/speech signal is provided. An inputted audio signal or speech signal may be transformed into at least one of a high frequency resolution signal and a high temporal resolution signal. The signal may be encoded by determining an appropriate resolution, the encoded signal may be decoded, and thus the audio signal, the speech signal, and a mixed signal of the audio signal and the speech signal may be processed.

...read moreread less

Proceedings Article•DOI•

Chirp rate estimation of speech based on a time-varying quasi-harmonic model

[...]

Yannis Pantazis, Olivier Rosec, Yannis Stylianou

19 Apr 2009

TL;DR: A previously suggested time-varying quasi-harmonic model is extended in order or to estimate the chirp rate for each sinusoidal component, thus successfully tracking fast variations in frequency and amplitude.

...read moreread less

Abstract: The speech signal is usually considered as stationary during short analysis time intervals. Though this assumption may be sufficient in some applications, it is not valid for high-resolution speech analysis and in applications such as speech transformation and objective voice function assessment for detection of voice disorders. In speech, there are non stationary components, for instance time-varying amplitudes and frequencies, which may change quickly over short time intervals. In this paper, a previously suggested time-varying quasi-harmonic model is extended in order or to estimate the chirp rate for each sinusoidal component, thus successfully tracking fast variations in frequency and amplitude. The parameters of the model are estimated through linear Least Squares and the model accuracy is evaluated on synthetic chirp signals. Experiments on speech signals indicate that the new model is able to efficiently estimate the signal component chirp rates, providing means to develop more accurate speech models for high-quality speech transformations.

...read moreread less

Proceedings Article•DOI•

Enhancement of speech intelligibility using transients extracted by wavelet packets

[...]

Daniel M. Rasetshwane¹, J.R. Boston¹, Ching-Chung Li¹, John D. Durrant¹, Greg Genna¹ - Show less +1 more•Institutions (1)

University of Pittsburgh¹

04 Dec 2009

TL;DR: Results show that modified speech created by amplifying transient speech and adding it to original speech has higher percent word recognition scores than original speech in the presence of background noise.

...read moreread less

Abstract: Speech transients have been shown to be important cues for identifying and discriminating speech sounds. We previously described a wavelet packet-based method for extracting transient speech (Rasetshwane et al. WASPAA 2007, pp. 179–182). The algorithm uses a “transitivity function” to characterize the rate of change of wavelet coefficients, and it can be implemented in real-time to process continuous speech. Psycho-acoustic experiments to select parameters for and to evaluate this method are presented. Results show that modified speech created by amplifying transient speech and adding it to original speech has higher percent word recognition scores than original speech in the presence of background noise.

...read moreread less

Patent•

Regeneration of wideband speech

[...]

Mattias Nilsson, Soren Vang Andersen, Koen Bernard Vos

10 Dec 2009

TL;DR: In this paper, the authors proposed a method of regenerating wideband speech from narrowband speech using a modulation signal adapted to upshift each frequency in the first range of frequencies by an amount determined by the modulating frequency.

...read moreread less

Abstract: A method of regenerating wideband speech from narrowband speech, the method comprising: receiving samples of a narrowband speech signal in a first range of frequencies; modulating received samples of the narrowband speech signal with a modulation signal having a modulating frequency adapted to upshift each frequency in the first range of frequencies by an amount determined by the modulating frequency wherein the modulating frequency is selected to translate into a target band a selected frequency band within the first range of signals; filtering the modulated samples using a target band filter to form a regenerated speech signal in the target band; and combining the narrow band speech signal with the regenerated speech signal in the target band to regenerate a wideband speech signal, the method comprising the step of controlling the modulated samples to lie in a second range of frequencies identified by determining a signal characteristic of frequencies in the first range of frequencies

...read moreread less

Collapse