A robust audio classification and segmentation method

doi:10.1145/500141.500173

Proceedings ArticleDOI

A robust audio classification and segmentation method

- pp 203-211

TLDR

A robust algorithm that is capable of segmenting and classifying an audio stream into speech, music, environment sound and silence is presented and some new features such as the noise frame ratio and band periodicity are introduced.

Abstract:

In this paper, we present a robust algorithm for audio classification that is capable of segmenting and classifying an audio stream into speech, music, environment sound and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and non-speech discrimination. In this step, a novel algorithm based on KNN and LSP VQ is presented. The second step further divides non-speech class into music, environment sounds and silence with a rule based classification scheme. Some new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. Our experiments in the context of video structure parsing have shown the algorithms produce very satisfactory results.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

DEAP: A Database for Emotion Analysis ;Using Physiological Signals

Sander Koelstra, +8 more

- 01 Jan 2012 -

IEEE Transactions on Affective Computing

TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.

...read moreread less

Proceedings ArticleDOI

A user attention model for video summarization

Yu-Fei Ma, +3 more

TL;DR: A generic framework of video summarization based on the modeling of viewer's attention is presented, which takes advantage of computational attention models and eliminates the needs of complex heuristic rules inVideo summarization.

...read moreread less

Journal ArticleDOI

A generic framework of user attention model and its application in video summarization

Yu-Fei Ma, +3 more

- 01 Oct 2005 -

IEEE Transactions on Multimedia

TL;DR: A generic framework of a user attention model is presented, which estimates the attentions viewers may pay to video contents, and a set of modeling methods for visual and aural attentions are proposed.

...read moreread less

Journal ArticleDOI

Content analysis for audio classification and segmentation

Lie Lu, +2 more

- 10 Dec 2002 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: A robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence is proposed, and an unsupervised speaker segmentation algorithm using a novel scheme based on quasi-GMM and LSP correlation analysis is developed.

...read moreread less

Patent

Modular intelligent transportation system

Paul J. Lagassey

TL;DR: In this article, a modular intelligent transportation system, comprising an environmentally protected enclosure, a system communications bus, a processor module, communicating with said bus, having a image data input and an audio input, the processor module analyzing the image data and/or audio input for data patterns represented therein, having at least one available option slot, a power supply, and a communication link for external communications.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

An Algorithm for Vector Quantizer Design

Y. Linde, +2 more

- 01 Jan 1980 -

IEEE Transactions on Communications

TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.

...read moreread less

Journal ArticleDOI

Speaker recognition: a tutorial

Jr. J.P. Campbell

TL;DR: A tutorial on the design and development of automatic speaker-recognition systems is presented and a new automatic speakers recognition system is given that performs with 98.9% correct decalcification.

...read moreread less

Journal ArticleDOI

Content-based classification, search, and retrieval of audio

E. Wold, +3 more

- 01 Sep 1996 -

IEEE MultiMedia

TL;DR: The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features, which lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features.

...read moreread less

Proceedings ArticleDOI

Construction and evaluation of a robust multifeature speech/music discriminator

Eric D. Scheirer, +1 more

TL;DR: A real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input is constructed and extensive data on system performance and the cross-validated training/test setup used to evaluate the system is provided.

...read moreread less

Proceedings ArticleDOI

Real-time discrimination of broadcast speech/music

J. Saunders

TL;DR: A technique which is successful at discriminating speech from music on broadcast FM radio is described, which provides the capability to robustly distinguish the two classes and runs easily in real time.

...read moreread less

A robust audio classification and segmentation method

Citations

DEAP: A Database for Emotion Analysis ;Using Physiological Signals

A user attention model for video summarization

A generic framework of user attention model and its application in video summarization

Content analysis for audio classification and segmentation

Modular intelligent transportation system

References

An Algorithm for Vector Quantizer Design

Speaker recognition: a tutorial

Content-based classification, search, and retrieval of audio

Construction and evaluation of a robust multifeature speech/music discriminator

Real-time discrimination of broadcast speech/music

Related Papers (5)

Construction and evaluation of a robust multifeature speech/music discriminator

Content analysis for audio classification and segmentation

Real-time discrimination of broadcast speech/music

Content-based classification, search, and retrieval of audio

A user attention model for video summarization