scispace - formally typeset
Search or ask a question

Showing papers on "Audio signal processing published in 2000"


Journal ArticleDOI
01 Apr 2000
TL;DR: This paper reviews methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model.
Abstract: During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. This paper is organized as follows. First, psychoacoustic principles are described, with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the modified discrete cosine transform, a perfect reconstruction cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of those techniques that utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms that have become international and/or commercial standards receive in-depth treatment, including the ISO/IEC MPEG family (-1, -2, -4), the Lucent Technologies PAC/EPAC/MPAC, the Dolby AC-2/AC-3, and the Sony ATRAC/SDDS algorithms. Then, we describe subjective evaluation methodologies in some detail, including the ITU-R BS.1116 recommendation on subjective measurements of small impairments. This paper concludes with a discussion of future research directions.

938 citations


Journal ArticleDOI
TL;DR: This work describes audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval.
Abstract: Multimedia content analysis refers to the computerized understanding of the semantic meanings of a multimedia document, such as a video sequence with an accompanying audio track. With a multimedia document, its semantics are embedded in multiple forms that are usually complimentary of each other, Therefore, it is necessary to analyze all types of data: image frames, sound tracks, texts that can be extracted from image frames, and spoken words that can be deciphered from the audio track. This usually involves segmenting the document into semantically meaningful units, classifying each unit into a predefined scene type, and indexing and summarizing the document for efficient retrieval and browsing. We review advances in using audio and visual information jointly for accomplishing the above tasks. We describe audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval. We also describe audio and visual descriptors and description schemes that are being considered by the MPEG-7 standard for multimedia content description.

552 citations


Patent
29 Dec 2000
TL;DR: In this article, a system for dynamic distribution of audio signals at a site based on defined zones within the site is presented, where a plurality of addressable audio devices are coupled to a local network for the site which are configured to receive a designated digital audio stream over the local network and to output the received audio stream to audio equipment located at the site.
Abstract: Systems and methods are provided for dynamic distribution of audio signals at a site based on defined zones within the site. A plurality of addressable audio devices are coupled to a local network for the site which are configured to receive a designated digital audio stream over the local network and to output the received digital audio stream to audio equipment located at the site. A zone manager defines a plurality of zones for the site which may include a plurality of the addressable audio devices. The zone manager defines a relationship between a characteristic of the audio signal for a reference audio device and for the addressable audio devices in the zones. An audio interface receives digital audio streams and outputs the digital audio streams on the local network addressed to selected ones of the audio devices based on the defined zones, the defined relationship between a characteristic of the audio signal for a reference audio device and for the addressable audio devices and a control input associated with the characteristic. A user interface is provided which is configured to receive a user designation of the control input. Systems and methods for dynamic aggregation of audio equipment in zones are also provided.

519 citations


Proceedings ArticleDOI
30 Jul 2000
TL;DR: This method can find individual note boundaries or even natural segment boundaries such as verse/chorus or speech/music transitions, even in the absence of cues such as silence, by analyzing local self-similarity.
Abstract: The paper describes methods for automatically locating points of significant change in music or audio, by analyzing local self-similarity. This method can find individual note boundaries or even natural segment boundaries such as verse/chorus or speech/music transitions, even in the absence of cues such as silence. This approach uses the signal to model itself, and thus does not rely on particular acoustic cues nor requires training. We present a wide variety of applications, including indexing, segmenting, and beat tracking of music and audio. The method works well on a wide variety of audio sources.

442 citations


Journal ArticleDOI
TL;DR: The model performance is demonstrated to be comparable to those of recent time-domain models that apply a multichannel analysis, and can be run in real time using typical personal computers.
Abstract: A computationally efficient model for multipitch and periodicity analysis of complex audio signals is presented. The model essentially divides the signal into two channels, below and above 1000 Hz, computes a "generalized" autocorrelation of the low-channel signal and of the envelope of the high-channel signal, and sums the autocorrelation functions. The summary autocorrelation function (SACF) is further processed to obtain an enhanced SACF (ESACF). The SACF and ESACP representations are used in observing the periodicities of the signal. The model performance is demonstrated to be comparable to those of recent time-domain models that apply a multichannel analysis. In contrast to the multichannel models, the proposed pitch analysis model can be run in real time using typical personal computers. The parameters of the model are experimentally tuned for best multipitch discrimination with typical mixtures of complex tones. The proposed pitch analysis model may be used in complex audio signal processing applications, such as sound source separation, computational auditory scene analysis, and structural representation of audio signals. The performance of the model is demonstrated by pitch analysis examples using sound mixtures which are available for download at http://www.acoustics.hut.fi/-ttolonen/pitchAnalysis/.

389 citations


Patent
Herve Schulz1, Tom Weidner1
05 Sep 2000
TL;DR: A self-powered medical device, which is operated independently of the public utility network, has at least one voltage source, a signal input for accepting an analog input signal and a signal processing unit as mentioned in this paper.
Abstract: A self-powered medical device, which is operated independently of the public utility network, has at least one voltage source, a signal input for accepting an analog input signal and a signal processing unit. The sampling rate for the input signal or the clock frequency of at least one digital component can be varied. Therefore, the energy demand can be reduced according to the requirements of the signal processing or when the discharge state of the voltage source requires a reduction.

330 citations


Patent
24 Aug 2000
TL;DR: In this paper, a digital sound processor is provided to enhance the vocal to non-vocal noise ratio of the signal processed by a vehicle audio system such as a cellular telephone, emergency communication device, or other audio device.
Abstract: A digital sound processor is provided to enhance the vocal to non-vocal noise ratio of the signal processed by a vehicle audio system such as a cellular telephone, emergency communication device, or other audio device. Optionally, an indicator is provided for use with the vehicular audio system in order to provide a user of the audio system with a status signal relating to a reception quality of a vocal signal from the user. The microphone of the audio system may be mounted within an accessory module, which may be mounted to an interior surface of a vehicle windshield. The accessory module provides a fixed orientation of the microphone and is easily installed to the vehicle as it is manufactured or as an aftermarket device. The indicator may be mounted at the accessory module or elsewhere at the mirror assembly.

327 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: This paper restricts its considerations to the case where only a single microphone recording of the noisy signal is available and proposes a method based on temporal quantiles in the power spectral domain, which is compared with pause detection and recursive averaging.
Abstract: Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing. In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available. The algorithms which we investigate proceed in two steps. First, the noise power spectrum is estimated. A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging. The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering. The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars. Without noise reduction, we obtain an error rate of 11.7%. Quantile based noise estimation and Wiener filtering reduce the error rate to 8.6%. Similar improvements are achieved in an experiment with artificial, non-stationary noise.

226 citations


Patent
27 Jul 2000
TL;DR: In this paper, the authors proposed a method and system capable of playing different audio signals in different nodes of a small environment, consisting of a number of nodes, which may be rooms of a house or hotel, or offices of a business.
Abstract: A method and system capable of playing different audio signals in different nodes of a small environment. The system is comprised of a number of nodes, which may be rooms of a house or hotel, or offices of a business. Each node has at least one audio speaker. The system further comprises an audio signal distribution device, which is connected to the nodes and delivers audio signals to the nodes. The audio signal distribution device further comprises a storage device for storing the audio signals. At least one of the nodes has a control interface for selecting the audio signals to be transferred to the nodes. In this fashion, each node is capable of playing a different audio signal than any other node is playing concurrently.

208 citations


Journal ArticleDOI
Stan Z. Li1
TL;DR: The results show that the NFL-based method produces consistently better results than the NN-based and other methods.
Abstract: A method is presented for content-based audio classification and retrieval. It is based on a new pattern classification method called the nearest feature line (NFL). In the NFL, information provided by multiple prototypes per class is explored. This contrasts to the nearest neighbor (NN) classification in which the query is compared to each prototype individually. Regarding audio representation, perceptual and cepstral features and their combinations are considered. Extensive experiments are performed to compare various classification methods and feature sets. The results show that the NFL-based method produces consistently better results than the NN-based and other methods. A system resulting from this work has achieved the error rate of 9.78%, as compared to that of 18.34% of a compelling existing system, as tested on a common audio database.

203 citations


Journal ArticleDOI
01 Aug 2000
TL;DR: This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough 'n' Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data.
Abstract: With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough 'n' Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives.

Patent
26 Oct 2000
TL;DR: In this paper, an audio recognition peripheral system consisting of a feature extractor and a vector processor is presented. And the extracted audio recognition features are transmitted to the programmable processor and processed in accordance with an audio classification algorithm.
Abstract: The present invention includes a novel audio recognition peripheral system and method. The audio recognition peripheral system comprises an audio recognition peripheral a programmable processor such as a microprocessor or microcontroller. In one embodiment, the audio recognition peripheral includes a feature extractor and vector processor. The feature extractor receives an audio signal and extracts recognition features. The extracted audio recognition features are transmitted to the programmable processor and processed in accordance with an audio recognition algorithm. During execution of the audio recognition algorithm, the programmable processor signals the audio recognition peripheral to perform vector operations. Thus, computationally intensive recognition operations are advantageously offloaded to the peripheral.

Patent
19 Oct 2000
TL;DR: In this article, a portable audio player (10) that includes a removable media drive (30) and audio playback components (20) to play audio data stored on the removable media (12).
Abstract: A portable audio player (10) that includes a removable media drive (30) and audio playback components (20) to play audio data stored on the removable media (12). The audio playback components (20) include an audio decoder, an audio Codec (24), and a digital to analog converter (25) which receives outputs analog audio signals to headphones (50). A memory (23) within the audio player (10) stores an operating system and a plurality of software Codecs. A suitable software Codec is selected from the plurality of Codecs to decompress the audio data file prior to conversion to analog signals. If the suitable Codec is not stored in memory, it may be read from the removable media (12) such that the portable audio player (10) may properly play the audio content. The portable audio player (10) may also be operated as a removable data storage device for a personal computer.

Patent
19 Jun 2000
TL;DR: In this paper, a multi-channel audio compression technology is presented that extends the range of sampling frequencies compared to existing technologies and/or lowers the noise floor while remaining compatible with those earlier generation technologies.
Abstract: A multi-channel audio compression technology is presented that extends the range of sampling frequencies compared to existing technologies and/or lowers the noise floor while remaining compatible with those earlier generation technologies. The high-sampling frequency multi-channel audio (12) is decomposed into core audio up to the existing sampling frequencies and a difference signal up to the sampling frequencies of the next generation technologies. The core audio is encoded (18) using the first generation technology such as DTS, DOLBY AC-3 or MPEG I or MPEG II such that the encoded core bit stream (20) is fully compatible with a comparable decoder in the market. The difference signal (34) is encoded (36) using technologies that extend the sampling frequency and/or improve the quality of the core audio. The compressed difference signal (38) is attached as an extension to the core bit stream (20). The extension data will be ignored by the first generation decoders but can be decoded by the second generation decoders. By summing the decoded core and extension audio signals together (28), a second generation decoder can effectively extend the audio signal bandwidth and/or improve the signal to noise ratio beyond that available through the core decoder alone.

Patent
17 Feb 2000
TL;DR: In this paper, a technique for enhancing audio signals generated from compressed digital audio files is described using a Bass Maximizer module, a Harmonic Exciter module and a Quasi Stereo module.
Abstract: A technique for enhancing audio signals generated from compressed digital audio files is described. The technique uses a Bass Maximizer module, a Harmonic Exciter module and a Quasi Stereo module. The Bass Exciter module enhances the intensity, depth and punch of the bass audio content by creating harmonic sequences from low frequency components contained in the original input signal. The Harmonic Exciter module adds to the treble audio content of the original input signal by generating harmonic series from the high frequency components contained in the input signal. The Quasi Stereo Module creates a stereo image of the enhanced input signal by adding and subtracting delayed and filtered versions of the enhanced input signal with itself to create left and right channeled stereo-like outputs. The technique provides a useful tool to regenerate from an audio signal more pleasant and joyful sounds.

PatentDOI
TL;DR: In this article, an audio signal processing device arranges objects in a virtual 3D space and generates audio signals by performing, at a prescribed listening position, audio simulation to sounds generated from a prescribed sounding position.
Abstract: The audio signal processing device of the present invention arranges objects in a virtual three-dimensional space and generates audio signals by performing, at a prescribed listening position, audio simulation to sounds generated from a prescribed sounding position. This invention is characterized in that the sound field space subject to audio simulation is structured by combining spatial objects and audio simulation is performed thereto. Here, “spatial object” shall mean the space (sound field space) for audio simulation which was simply modeled in order to simplify audio simulation, and is a virtual object provided with prescribed audio parameters. The sound field space is structured by combining these spatial objects.

Patent
Steven D. Curtin1
11 Apr 2000
TL;DR: In this paper, a digital wireless premises audio system, a method of operating the same and a home theater system incorporating the audio system or the method, is presented, which includes a digital audio encoder/transmitter, located on the premises, that accepts an audio channel in digital form, encodes the channel into a stream of digital data and wirelessly transmits the stream about the premises.
Abstract: A digital wireless premises audio system, a method of operating the same and a home theater system incorporating the audio system or the method. In one embodiment, the audio system includes: (1) a digital audio encoder/transmitter, located on the premises, that accepts an audio channel in digital form, encodes the channel into a stream of digital data and wirelessly transmits the stream about the premises and (2) a speaker module, located on the premises, couplable to a power source and including, in series, a digital audio receiver/decoder, an audio amplifier and a speaker, that receives the stream, decodes the audio channel therefrom, converts the audio channel to analog form and employs power from the power source to amplify the audio channel and drive the speaker therewith.

Proceedings ArticleDOI
30 Jul 2000
TL;DR: A novel algorithm for video scene segmentation that fuse the resulting segments using a nearest neighbor algorithm that is further refined using a time-alignment distribution derived from the ground truth.
Abstract: We present a novel algorithm for video scene segmentation. We model a scene as a semantically consistent chunk of audio-visual data. Central to the segmentation framework is the idea of a finite-memory model. We separately segment the audio and video data into scenes, using data in the memory. The audio segmentation algorithm determines the correlations amongst the envelopes of audio features. The video segmentation algorithm determines the correlations amongst shot key-frames. The scene boundaries in both cases are determined using local correlation minima. Then, we fuse the resulting segments using a nearest neighbor algorithm that is further refined using a time-alignment distribution derived from the ground truth. The algorithm was tested on a difficult data set; the first hour of a commercial film with good results. It achieves a scene segmentation accuracy of 84%.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: A predominant-F0 estimation method called PreFEst is proposed that does not rely on the F0's unreliable frequency component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range.
Abstract: This paper describes a robust method for estimating the fundamental frequency (F0) of melody and bass lines in monaural real-world musical audio signals containing sounds of various instruments. Most previous F0-estimation methods had great difficulty dealing with such complex audio signals because they were designed to deal with mixtures of only a few sounds. To make it possible to estimate the F0 of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the F0's unreliable frequency component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. It evaluates the relative dominance of every possible F0 by using the expectation-maximization algorithm and considers the temporal continuity of F0s by using a multiple-agent architecture. Experimental results show that our real-time system can detect the melody and bass lines in audio signals sampled from commercially distributed compact discs.

Patent
13 Jun 2000
TL;DR: In this paper, a wireless communications system for transmitting and receiving images and audio information includes a receiver including a display for displaying received images and a speaker for producing audio signals, a processing unit for storing ID code signals for decoding the received image and audio signals and receives an RF signal containing images and RF information.
Abstract: A wireless communications system for transmitting and receiving images and audio information includes a receiver including a display for displaying received images and a speaker for producing audio signals, a processing unit for storing ID code signals for decoding the received images and audio signals and receives an RF signal containing images and audio information. The receiver is responsive to the ID code signal for extracting the received image and audio signals of the images and the audio information from the RF signal. An electronic camera has an image sensor for capturing one or more images of a scene and producing image signals, and receives audio signals to produce audio information signals. A second processing unit is provided for encoding and then superimposing the image and audio signals onto an RF carrier, and transmits the RF carrier and superimposed signals. First and second interconnects respectively included in the receiver and the electronic camera are adapted when interconnected to transfer the ID code signal from the camera to the receiver where it is stored, to thereby permit the electronic camera to communicate with the receiver.


Patent
23 Aug 2000
TL;DR: An audio system includes a memory storing audio data and an audio signal processor for processing the audio data as discussed by the authors, where addressing circuitry addresses the memory and a pre-fetch storage area stores data for a current address and for one or more following addresses.
Abstract: An audio system includes a memory storing audio data and an audio signal processor for processing the audio data. Addressing circuitry addresses the memory and a pre-fetch storage area stores data for a current address and for one or more following addresses to hide memory access latency during address changes of the addressing circuitry.

Patent
10 Nov 2000
TL;DR: In this article, a system and method that allows digital audio files, either streaming or stored to be controlled and selected and provides an analog audio signal for broadcast by a radio or amplifier without interfering with the operation of the host PC is presented.
Abstract: A system and method that allows digital audio files, either streaming or stored to be controlled and selected and provides an analog audio signal for broadcast by a radio or amplifier without interfering with the operation of the host PC. A remote device is provided which facilitates control of the system.

Proceedings ArticleDOI
Xin Li1, H.H. Yu
30 Jul 2000
TL;DR: The experiment results have shown that the novel audio data hiding scheme in the cepstrum domain can achieve transparent and robust data hiding at the capacity region of above 20 bps.
Abstract: We propose a novel data hiding scheme for audio signals in the cepstrum domain. Cepstrum representation of audio can be shown to be very robust to a wide range of attacks including most challenging time-scaling and pitch-shifting warping. In the cepstrum domain, we propose to embed data by manipulating the statistical mean of selected cepstrum coefficients. An intuitive psychoacoustic model is employed to control the audibility of introduced distortion. Our experiment results have shown that the novel audio data hiding scheme in the cepstrum domain can achieve transparent and robust data hiding at the capacity region of above 20 bps.

Proceedings ArticleDOI
08 Aug 2000
TL;DR: An algorithm-independent framework for rigorously comparing digital watermarking algorithms with respect to bit rate, perceptual quality, computational complexity, and robustness to signal processing is proposed.
Abstract: We propose an algorithm-independent framework for rigorously comparing digital watermarking algorithms with respect to bit rate, perceptual quality, computational complexity, and robustness to signal processing. The framework is used to evaluate five audio watermarking algorithms from the literature, revealing that frequency domain techniques perform well under the criteria.

Proceedings ArticleDOI
30 Jul 2000
TL;DR: A system to continuously monitor the user's voice and facial motions for recognizing emotional expressions is described, crucial for intelligent computers that take on a social role such as an actor or a companion.
Abstract: Visual and auditory modalities are two of the most commonly used media in interactions between humans. The authors describe a system to continuously monitor the user's voice and facial motions for recognizing emotional expressions. Such an ability is crucial for intelligent computers that take on a social role such as an actor or a companion. We outline methods to extract audio and visual features useful for classifying emotions. Audio and visual information must be handled appropriately in single-modal and bimodal situations. We report audio-only and video-only emotion recognition on the same subjects, in person-dependent and person-independent fashions, and outline methods to handle bimodal recognition.

Patent
Donald W. Moses1, Robert W. Moses1
12 Oct 2000
TL;DR: In this article, a computer-implemented system for providing a digital watermark in an audio signal is presented. But, the system is limited to the use of audio signals.
Abstract: The foregoing problems are solved and a technical advance is achieved by a computer-implemented system for providing a digital watermark in an audio signal. In a preferred embodiment, a audio file (108), such as a .WAV file, representing an audio signal to be watermarked is processed using an algorithm of the present invention herein referred to as the 'PAWS algorithm' (104) to determine and log the location and number of opportunities that exist for inserting a watermark into the audio signal such that it will be masked by the audio signal. The user can adjust (17) certain parameters (112) of the PAWS algorithm (104) before the audio file is processed. A/B/X testing between the original and watermarked files is also supporter to allow the user to undo or re-encode the watermark, if desired.

Journal ArticleDOI
TL;DR: This letter unveils an efficient algorithm for sampling rate conversion (SRC) technique from 44.1 kHz compact disc (CD) to 48 kHz digital audio tape (DAT) that requires fewer million instructions per second (MIPS) and memory.
Abstract: This letter unveils an efficient algorithm for sampling rate conversion (SRC) technique from 44.1 kHz compact disc (CD) to 48 kHz digital audio tape (DAT). This method involves upsampling the input signal by two, and then passing the interpolated signal through a fractional delay filter that employs a simple decimation. This method can also be used for SRC from DAT to CD without changing the filter coefficients. The proposed algorithm is simulated in Matlab and can be implemented in a realtime digital signal processor (DSP). Compared with other existing methods, the proposed method has the advantage that it requires fewer million instructions per second (MIPS) and memory.

Patent
07 Sep 2000
TL;DR: In this article, a method for overlapping stored audio elements in a system for providing a customized radio broadcast is proposed, which includes the steps of dividing a first audio element into a plurality of audio element components.
Abstract: A method for overlapping stored audio elements in a system for providing a customized radio broadcast. The method includes the steps of dividing a first audio element into a plurality of audio element components; selecting one of said audio element components; decompressing the selected audio element component; selecting a second audio element; decompressing the second audio element; mixing the decompressed audio element component with the decompressed second audio element to form a mixed audio element component; and compressing the mixed audio element component to form a compressed overlapping audio element component. The compressed overlapping audio element component may replace the selected audio component. The first audio element may be a song, while the second audio element may be a DJ introduction. Accordingly, the compressed overlapping audio element may be broadcast followed by the remaining components of the song audio element.

PatentDOI
TL;DR: In this article, a system and method for locating program boundaries and commercial boundaries using audio categories is described. But the system is not suitable for use in a video signal processor, as it requires the use of an audio classifier controller that determines the rates of change of audio categories.
Abstract: For use in a video signal processor, there is disclosed a system and method for locating program boundaries and commercial boundaries using audio categories. The system comprises an audio classifier controller that obtains information concerning the audio categories of the segments of an audio signal. Audio categories include such categories as silence, music, noise and speech. The audio classifier controller determines the rates of change of the audio categories. The audio classifier controller then compares each rate of change of the audio categories with a threshold value to locate the boundaries of the programs and commercials. The audio classifier controller is also capable of classifying at least one feature of an audio category change rate using a multifeature classifier to locate the boundaries of the programs and commercials.