scispace - formally typeset
Proceedings ArticleDOI

A robust audio classification and segmentation method

Lie Lu, +2 more
- pp 203-211
TLDR
A robust algorithm that is capable of segmenting and classifying an audio stream into speech, music, environment sound and silence is presented and some new features such as the noise frame ratio and band periodicity are introduced.
Abstract
In this paper, we present a robust algorithm for audio classification that is capable of segmenting and classifying an audio stream into speech, music, environment sound and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and non-speech discrimination. In this step, a novel algorithm based on KNN and LSP VQ is presented. The second step further divides non-speech class into music, environment sounds and silence with a rule based classification scheme. Some new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. Our experiments in the context of video structure parsing have shown the algorithms produce very satisfactory results.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

DEAP: A Database for Emotion Analysis ;Using Physiological Signals

TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
Proceedings ArticleDOI

A user attention model for video summarization

TL;DR: A generic framework of video summarization based on the modeling of viewer's attention is presented, which takes advantage of computational attention models and eliminates the needs of complex heuristic rules inVideo summarization.
Journal ArticleDOI

A generic framework of user attention model and its application in video summarization

TL;DR: A generic framework of a user attention model is presented, which estimates the attentions viewers may pay to video contents, and a set of modeling methods for visual and aural attentions are proposed.
Journal ArticleDOI

Content analysis for audio classification and segmentation

TL;DR: A robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence is proposed, and an unsupervised speaker segmentation algorithm using a novel scheme based on quasi-GMM and LSP correlation analysis is developed.
Patent

Modular intelligent transportation system

TL;DR: In this article, a modular intelligent transportation system, comprising an environmentally protected enclosure, a system communications bus, a processor module, communicating with said bus, having a image data input and an audio input, the processor module analyzing the image data and/or audio input for data patterns represented therein, having at least one available option slot, a power supply, and a communication link for external communications.
References
More filters
Journal ArticleDOI

An Algorithm for Vector Quantizer Design

TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.
Journal ArticleDOI

Speaker recognition: a tutorial

TL;DR: A tutorial on the design and development of automatic speaker-recognition systems is presented and a new automatic speakers recognition system is given that performs with 98.9% correct decalcification.
Journal ArticleDOI

Content-based classification, search, and retrieval of audio

TL;DR: The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features, which lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features.
Proceedings ArticleDOI

Construction and evaluation of a robust multifeature speech/music discriminator

TL;DR: A real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input is constructed and extensive data on system performance and the cross-validated training/test setup used to evaluate the system is provided.
Proceedings ArticleDOI

Real-time discrimination of broadcast speech/music

J. Saunders
TL;DR: A technique which is successful at discriminating speech from music on broadcast FM radio is described, which provides the capability to robustly distinguish the two classes and runs easily in real time.