scispace - formally typeset
Search or ask a question

Showing papers by "Andreas Spanias published in 1997"


Proceedings ArticleDOI
02 Jul 1997
TL;DR: Algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities are reviewed, including the ISO/MPEG family and the Dolby AC-3 algorithms.
Abstract: Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio As a result, many algorithms have been proposed and several have now become international and/or commercial product standards This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail Then, we review methodologies which achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms which manipulate transform components and subband signal decompositions The discussion concentrates on architectures and applications of those techniques which utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver Several algorithms which have become international and/or commercial standards are also presented, including the ISO/MPEG family and the Dolby AC-3 algorithms The paper concludes with a brief discussion of future research directions

92 citations


Proceedings Article
01 Jan 1997
TL;DR: In this paper, a review of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio signals is presented, including algorithms which manipulate transform components and subband signal decompositions.
Abstract: Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Then, we review methodologies which achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms which manipulate transform components and subband signal decompositions. The discussion concentrates on architectures and applications of those techniques which utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms which have become international and/or commercial standards are also presented, including the ISO/MPEG family and the Dolby AC-3 algorithms. The paper concludes with a brief discussion of future research directions.

77 citations


Proceedings ArticleDOI
21 Apr 1997
TL;DR: The hidden Markov model (HMM) based minimum mean square error (MMSE) estimator is extended to incorporate a ternary voicing state, and applies it to a harmonic representation of voiced speech, and noise reduction during voiced sounds is improved.
Abstract: This paper describes a technique for reduction of non-stationary noise in electronic voice communication systems. Removal of noise is needed in many such systems, particularly those deployed in harsh mobile or otherwise dynamic acoustic environments. The proposed method employs state-based statistical models of both speech and noise, and is thus capable of tracking variations in noise during sustained speech. This work extends the hidden Markov model (HMM) based minimum mean square error (MMSE) estimator to incorporate a ternary voicing state, and applies it to a harmonic representation of voiced speech. Noise reduction during voiced sounds is thereby improved. Performance is evaluated using speech and noise from standard databases. The extended algorithm is demonstrated to improve speech quality as measured by informal preference tests and objective measures, to preserve speech intelligibility as measured by informal diagnostic rhyme tests, and to improve the performance of a low bit-rate speech coder and a speech recognition system when used as a pre-processor.

19 citations


Journal ArticleDOI
TL;DR: In this article, an approximate harmonic representation is used wherein voiced speech is represented by a set of sine waves at multiples of the fundamental frequency and several additional components at frequencies near each harmonic.
Abstract: A procedure for estimating the parameters of a sinusoidal model from speech corrupted by additive noise is described. An approximate harmonic representation is used wherein voiced speech is represented by a set of sine waves at multiples of the fundamental frequency and several additional components at frequencies near each harmonic. Amplitudes and phases of the sinusoidal components are estimated using a state-based technique that employs hidden Markov models (HMMs) to classify speech and noise spectra. Voicing and fundamental frequency are determined using an analysis-by-synthesis approach. Simulation results are presented, comparing the performance of the proposed algorithm to that of the standard HMM-based minimum mean square error (MMSE) estimator. The proposed method was found to reduce the structured residual noise associated with HMM-based algorithms.

10 citations


Proceedings ArticleDOI
21 Apr 1997
TL;DR: Performance analysis on a large speech database indicates considerable improvement in temporal and spectral matching between the original and reconstructed signals as compared to other sinusoidal phase models as well as improved subjective quality of the reproduced speech.
Abstract: A new phase modeling algorithm for sinusoidal analysis and synthesis of speech signals is presented. Short-time sinusoidal phases are efficiently approximated by incorporating linear prediction, spectral sampling, delay compensation, and phase correction techniques. The algorithm is different than phase compensation methods proposed for multi-pulse LPC in that it has been tailored to sinusoidal transform coding of speech signals. Performance analysis on a large speech database indicates considerable improvement in temporal and spectral matching between the original and reconstructed signals as compared to other sinusoidal phase models as well as improved subjective quality of the reproduced speech.

3 citations


Journal ArticleDOI
TL;DR: The subspace approach is used as a preprocessing step in a hidden Markov model (HMM) based system to enhance discrimination of acoustically similar pairs of words and is compared with standard linear discriminant analysis techniques and shown to yield as much as 4.5% improvement.
Abstract: This paper describes the use of the divergence measure as a criterion for finding a transformation matrix which will map the original speech observations onto a subspace with more discriminative ability than the original. A gradient-based algorithm is also proposed to compute the transformation matrix efficiently. The subspace approach is used as a preprocessing step in a hidden Markov model (HMM) based system to enhance discrimination of acoustically similar pairs of words. This approach is compared with standard linear discriminant analysis (LDA) techniques and shown to yield as much as 4.5% improvement. The subspace approach is also applied successfully to a more general recognition problem, i.e., discrimination of K confusable words, using the average divergence measure.

1 citations


Proceedings ArticleDOI
S. Ahmadi1, Andreas Spanias1
17 Dec 1997
TL;DR: Performance analysis on a large speech database indicates that the use of the proposed algorithms resulted in considerable improvement in temporal and spectral signal matching, as well as improved subjective quality of the reproduced speech.
Abstract: This paper addresses the design, development, evaluation, and implementation of efficient low bit rate speech coding algorithms based on the sinusoidal model. A series of algorithms have been developed for pitch frequency determination and voicing detection, simultaneous modeling of the sinusoidal amplitudes and phases, and mid-frame interpolation. An improved sinusoidal phase matching algorithm is presented, where short-time sinusoidal phases are approximated using an elaborate combination of linear prediction, spectral sampling, delay compensation, and phase correction techniques. A voicing-dependent perceptual split vector quantization scheme is used to encode the sinusoidal amplitudes. The perceptual properties of the human auditory system are effectively exploited in the developed algorithms. The algorithms have been successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder has been evaluated in terms of subjective tests such as the mean opinion score and the diagnostic rhyme test, as well as some perceptually-motivated objective distortion measures. Performance analysis on a large speech database indicates that the use of the proposed algorithms resulted in considerable improvement in temporal and spectral signal matching, as well as improved subjective quality of the reproduced speech.