scispace - formally typeset
Journal ArticleDOI

A speech/music discriminator based on RMS and zero-crossings

Costas Panagiotakis, +1 more
- 01 Feb 2005 - 
- Vol. 7, Iss: 1, pp 155-166
TLDR
The goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music, and results show that efficiency is exceptionally good, without sacrificing performance.
Abstract
Over the last several years, major efforts have been made to develop methods for extracting information from audiovisual media, in order that they may be stored and retrieved in databases automatically, based on their content. In this work we deal with the characterization of an audio signal, which may be part of a larger audiovisual system or may be autonomous, as for example in the case of an audio recording stored digitally on disk. Our goal was to first develop a system for segmentation of the audio signal, and then classification into one of two main categories: speech or music. Among the system's requirements are its processing speed and its ability to function in a real-time environment with a small responding delay. Because of the restriction to two classes, the characteristics that are extracted are considerably reduced and moreover the required computations are straightforward. Experimental results show that efficiency is exceptionally good, without sacrificing performance. Segmentation is based on mean signal amplitude distribution, whereas classification utilizes an additional characteristic related to the frequency. The classification algorithm may be used either in conjunction with the segmentation algorithm, in which case it verifies or refutes a music-speech or speech-music change, or autonomously, with given audio segments. The basic characteristics are computed in 20 ms intervals, resulting in the segments' limits being specified within an accuracy of 20 ms. The smallest segment length is one second. The segmentation and classification algorithms were benchmarked on a large data set, with correct segmentation about 97% of the time and correct classification about 95%.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

Features for Content-Based Audio Retrieval

TL;DR: The goal of this chapter is to review latest research in the context of audio feature extraction and to give an application-independent overview of the most important existing techniques, and to propose a novel taxonomy for the organization of audio features.
Journal ArticleDOI

Classification of audio signals using SVM and RBFNN

TL;DR: This paper proposes effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie, using the application of neural network for the classification of audio.
Journal ArticleDOI

Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals

TL;DR: The results of the numerical simulation support the effectiveness of the proposed approach for environmental audio classification with over 10% accuracy-rate improvement compared to the MFCC features.
Journal ArticleDOI

Audio signal processing using time-frequency approaches: coding, classification, fingerprinting, and watermarking

TL;DR: A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals.
Journal ArticleDOI

Segmentation and Sampling of Moving Object Trajectories Based on Representativeness

TL;DR: A method for trajectory segmentation and sampling based on the representativeness of the (sub)trajectories in the MOD is proposed, and the effectiveness of the proposed scheme is verified in comparison with other sampling techniques.
References
More filters
Book

Probability, random variables and stochastic processes

TL;DR: This chapter discusses the concept of a Random Variable, the meaning of Probability, and the axioms of probability in terms of Markov Chains and Queueing Theory.
Book

Random variables and stochastic processes

TL;DR: An electromagnetic pulse counter having successively operable, contact-operating armatures that are movable to a rest position, an intermediate position and an active position between the main pole and the secondary pole of a magnetic circuit.
Journal ArticleDOI

Musical genre classification of audio signals

TL;DR: The automatic classification of audio signals into an hierarchy of musical genres is explored and three feature sets for representing timbral texture, rhythmic content and pitch content are proposed.
Journal ArticleDOI

Content-based classification, search, and retrieval of audio

TL;DR: The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features, which lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features.
Related Papers (5)