Topic

Audio signal processing

About: Audio signal processing is a research topic. Over the lifetime, 21463 publications have been published within this topic receiving 319597 citations. The topic is also known as: audio processing & Acoustic signal processing.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Perceptual coding of digital audio

[...]

T. Painter¹, Andreas Spanias¹•Institutions (1)

Arizona State University¹

01 Apr 2000

TL;DR: This paper reviews methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model.

...read moreread less

Abstract: During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. This paper is organized as follows. First, psychoacoustic principles are described, with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the modified discrete cosine transform, a perfect reconstruction cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of those techniques that utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms that have become international and/or commercial standards receive in-depth treatment, including the ISO/IEC MPEG family (-1, -2, -4), the Lucent Technologies PAC/EPAC/MPAC, the Dolby AC-2/AC-3, and the Sony ATRAC/SDDS algorithms. Then, we describe subjective evaluation methodologies in some detail, including the ITU-R BS.1116 recommendation on subjective measurements of small impairments. This paper concludes with a discussion of future research directions.

...read moreread less

938 citations

Book•

Fast Algorithms for Digital Signal Processing

[...]

Richard E. Blahut

01 Sep 1985

TL;DR: Fast algorithms for digital signal processing, Fast algorithms fordigital signal processing , and so on.

...read moreread less

Abstract: Fast algorithms for digital signal processing , Fast algorithms for digital signal processing , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی

...read moreread less

797 citations

Journal Article•DOI•

Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation

[...]

Alexey Ozerov¹, Cédric Févotte¹•Institutions (1)

Télécom ParisTech¹

01 Mar 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this article, a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals, is considered.

...read moreread less

Abstract: We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the short-time Fourier transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization (EM) algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms are applied to stereo audio source separation in various settings, covering blind and supervised separation, music and speech sources, synthetic instantaneous and convolutive mixtures, as well as professionally produced music recordings. Our EM method produces competitive results with respect to state-of-the-art as illustrated on two tasks from the international Signal Separation Evaluation Campaign (SiSEC 2008).

...read moreread less

636 citations

Journal Article•DOI•

An overview of automatic speaker diarization systems

[...]

S. E. Tranter¹, Douglas A. Reynolds²•Institutions (2)

University of Cambridge¹, Massachusetts Institute of Technology²

01 Sep 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An overview of the approaches currently used in a key area of audio diarization, namely speaker diarizations, are provided and their relative merits and limitations are discussed.

...read moreread less

Abstract: Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives, and increasing the richness of automatic transcriptions, making them more readable. In this paper, we provide an overview of the approaches currently used in a key area of audio diarization, namely speaker diarization, and discuss their relative merits and limitations. Performances using the different techniques are compared within the framework of the speaker diarization task in the DARPA EARS Rich Transcription evaluations. We also look at how the techniques are being introduced into real broadcast news systems and their portability to other domains and tasks such as meetings and speaker verification

...read moreread less

634 citations

Journal Article•DOI•

Environmental Sound Recognition With Time–Frequency Audio Features

[...]

Selina Chu¹, Shrikanth S. Narayanan¹, C.-C.J. Kuo¹•Institutions (1)

University of Southern California¹

01 Aug 2009-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.

...read moreread less

Abstract: The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spectrum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to characterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characterization and propose to use the matching pursuit (MP) algorithm to obtain effective time-frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effectiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners.

...read moreread less

626 citations

Collapse

Network Information

Performance

Metrics

21,541

Papers

328,867

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	63
2021	217
2020	525
2019	659
2018	597

Audio signal processing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics