Showing papers by "Andreas Spanias published in 1997"

PDF

Open Access

Proceedings Article•DOI•

A review of algorithms for perceptual coding of digital audio signals

[...]

T. Painter¹, Andreas Spanias¹•Institutions (1)

02 Jul 1997

TL;DR: Algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities are reviewed, including the ISO/MPEG family and the Dolby AC-3 algorithms.

...read moreread less

Abstract: Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio As a result, many algorithms have been proposed and several have now become international and/or commercial product standards This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail Then, we review methodologies which achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms which manipulate transform components and subband signal decompositions The discussion concentrates on architectures and applications of those techniques which utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver Several algorithms which have become international and/or commercial standards are also presented, including the ISO/MPEG family and the Dolby AC-3 algorithms The paper concludes with a brief discussion of future research directions

...read moreread less

92 citations

Proceedings Article•

Review of algorithms for perceptual coding of digital audio signals

[...]

Ted Painter¹, Andreas Spanias•Institutions (1)

Arizona State University¹

01 Jan 1997

TL;DR: In this paper, a review of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio signals is presented, including algorithms which manipulate transform components and subband signal decompositions.

...read moreread less

Abstract: Considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Then, we review methodologies which achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms which manipulate transform components and subband signal decompositions. The discussion concentrates on architectures and applications of those techniques which utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms which have become international and/or commercial standards are also presented, including the ISO/MPEG family and the Dolby AC-3 algorithms. The paper concludes with a brief discussion of future research directions.

...read moreread less

77 citations

Proceedings Article•DOI•

HMM-based speech enhancement using harmonic modeling

[...]

Michael E. Deisher¹, Andreas Spanias²•Institutions (2)

Intel¹, Arizona State University²

21 Apr 1997

TL;DR: The hidden Markov model (HMM) based minimum mean square error (MMSE) estimator is extended to incorporate a ternary voicing state, and applies it to a harmonic representation of voiced speech, and noise reduction during voiced sounds is improved.

...read moreread less

Abstract: This paper describes a technique for reduction of non-stationary noise in electronic voice communication systems. Removal of noise is needed in many such systems, particularly those deployed in harsh mobile or otherwise dynamic acoustic environments. The proposed method employs state-based statistical models of both speech and noise, and is thus capable of tracking variations in noise during sustained speech. This work extends the hidden Markov model (HMM) based minimum mean square error (MMSE) estimator to incorporate a ternary voicing state, and applies it to a harmonic representation of voiced speech. Noise reduction during voiced sounds is thereby improved. Performance is evaluated using speech and noise from standard databases. The extended algorithm is demonstrated to improve speech quality as measured by informal preference tests and objective measures, to preserve speech intelligibility as measured by informal diagnostic rhyme tests, and to improve the performance of a low bit-rate speech coder and a speech recognition system when used as a pre-processor.

...read moreread less

19 citations

Journal Article•DOI•

Speech enhancement using state-based estimation and sinusoidal modeling

[...]

Michael E. Deisher¹, Andreas Spanias•Institutions (1)

Arizona State University¹

01 Aug 1997-Journal of the Acoustical Society of America

TL;DR: In this article, an approximate harmonic representation is used wherein voiced speech is represented by a set of sine waves at multiples of the fundamental frequency and several additional components at frequencies near each harmonic.

...read moreread less

Abstract: A procedure for estimating the parameters of a sinusoidal model from speech corrupted by additive noise is described. An approximate harmonic representation is used wherein voiced speech is represented by a set of sine waves at multiples of the fundamental frequency and several additional components at frequencies near each harmonic. Amplitudes and phases of the sinusoidal components are estimated using a state-based technique that employs hidden Markov models (HMMs) to classify speech and noise spectra. Voicing and fundamental frequency are determined using an analysis-by-synthesis approach. Simulation results are presented, comparing the performance of the proposed algorithm to that of the standard HMM-based minimum mean square error (MMSE) estimator. The proposed method was found to reduce the structured residual noise associated with HMM-based algorithms.

...read moreread less

10 citations

Proceedings Article•DOI•

A new sinusoidal phase modeling algorithm

[...]

S. Ahmadi¹, Andreas Spanias•Institutions (1)

Arizona State University¹

21 Apr 1997

TL;DR: Performance analysis on a large speech database indicates considerable improvement in temporal and spectral matching between the original and reconstructed signals as compared to other sinusoidal phase models as well as improved subjective quality of the reproduced speech.

...read moreread less

Abstract: A new phase modeling algorithm for sinusoidal analysis and synthesis of speech signals is presented. Short-time sinusoidal phases are efficiently approximated by incorporating linear prediction, spectral sampling, delay compensation, and phase correction techniques. The algorithm is different than phase compensation methods proposed for multi-pulse LPC in that it has been tailored to sinusoidal transform coding of speech signals. Performance analysis on a large speech database indicates considerable improvement in temporal and spectral matching between the original and reconstructed signals as compared to other sinusoidal phase models as well as improved subjective quality of the reproduced speech.

...read moreread less

3 citations

Journal Article•DOI•

Improving discrimination of confusable words using the divergence measure

[...]

Philipos C. Loizou¹, Andreas Spanias•Institutions (1)

University of Arkansas at Little Rock¹

01 Feb 1997-Journal of the Acoustical Society of America

TL;DR: The subspace approach is used as a preprocessing step in a hidden Markov model (HMM) based system to enhance discrimination of acoustically similar pairs of words and is compared with standard linear discriminant analysis techniques and shown to yield as much as 4.5% improvement.

...read moreread less

Abstract: This paper describes the use of the divergence measure as a criterion for finding a transformation matrix which will map the original speech observations onto a subspace with more discriminative ability than the original. A gradient-based algorithm is also proposed to compute the transformation matrix efficiently. The subspace approach is used as a preprocessing step in a hidden Markov model (HMM) based system to enhance discrimination of acoustically similar pairs of words. This approach is compared with standard linear discriminant analysis (LDA) techniques and shown to yield as much as 4.5% improvement. The subspace approach is also applied successfully to a more general recognition problem, i.e., discrimination of K confusable words, using the average divergence measure.

...read moreread less

1 citations

Proceedings Article•DOI•

New algorithms for sinusoidal speech coding at low bit rates

[...]

S. Ahmadi¹, Andreas Spanias¹•Institutions (1)

Nokia¹

17 Dec 1997

TL;DR: Performance analysis on a large speech database indicates that the use of the proposed algorithms resulted in considerable improvement in temporal and spectral signal matching, as well as improved subjective quality of the reproduced speech.

...read moreread less

Abstract: This paper addresses the design, development, evaluation, and implementation of efficient low bit rate speech coding algorithms based on the sinusoidal model. A series of algorithms have been developed for pitch frequency determination and voicing detection, simultaneous modeling of the sinusoidal amplitudes and phases, and mid-frame interpolation. An improved sinusoidal phase matching algorithm is presented, where short-time sinusoidal phases are approximated using an elaborate combination of linear prediction, spectral sampling, delay compensation, and phase correction techniques. A voicing-dependent perceptual split vector quantization scheme is used to encode the sinusoidal amplitudes. The perceptual properties of the human auditory system are effectively exploited in the developed algorithms. The algorithms have been successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder has been evaluated in terms of subjective tests such as the mean opinion score and the diagnostic rhyme test, as well as some perceptually-motivated objective distortion measures. Performance analysis on a large speech database indicates that the use of the proposed algorithms resulted in considerable improvement in temporal and spectral signal matching, as well as improved subjective quality of the reproduced speech.

...read moreread less