A speech/music discriminator based on RMS and zero-crossings

doi:10.1109/TMM.2004.840604

Journal ArticleDOI

A speech/music discriminator based on RMS and zero-crossings

Costas Panagiotakis, +1 more

- 01 Feb 2005 -

IEEE Transactions on Multimedia

- Vol. 7, Iss: 1, pp 155-166

TLDR

The goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music, and results show that efficiency is exceptionally good, without sacrificing performance.

Abstract:

Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which may be part of a larger audiovisual system or may be autonomous, as for example in the case of an audio recording stored digitally on disk. Our goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music. Among the system's requirements are its processing speed and its ability to function in a real-time environment with a small responding delay. Because of the restriction to two classes, the characteristics that are extracted are considerably reduced and moreover the required computations are straightforward. Experimental results show that efficiency is exceptionally good, without sacrificing performance. Segmentation is based on mean signal amplitude distribution, whereas classification utilizes an additional characteristic related to the frequency. The classification algorithm may be used either in conjunction with the segmentation algorithm, in which case it verifies or refutes a music-speech or speech-music change, or autonomously, with given audio segments. The basic characteristics are computed in 20 ms intervals, resulting in the segments' limits being specified within an accuracy of 20 ms. The smallest segment length is one second. The segmentation and classification algorithms were benchmarked on a large data set, with correct segmentation about 97% of the time and correct classification about 95%.

A speech/music discriminator based on RMS and zero-crossings

Citations

Features for Content-Based Audio Retrieval

Classification of audio signals using SVM and RBFNN

Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals

Audio signal processing using time-frequency approaches: coding, classification, fingerprinting, and watermarking

Segmentation and Sampling of Moving Object Trajectories Based on Representativeness

References

Probability, random variables and stochastic processes

Probability, random variables and stochastic processes

Random variables and stochastic processes

Musical genre classification of audio signals

Content-based classification, search, and retrieval of audio

Related Papers (5)

Construction and evaluation of a robust multifeature speech/music discriminator

Real-time discrimination of broadcast speech/music

Content analysis for audio classification and segmentation

Audio content analysis for online audiovisual data segmentation and classification

Musical genre classification of audio signals