Showing papers on "Audio signal processing published in 2000"

PDF

Open Access

Journal Article•DOI•

[...]

T. Painter¹, Andreas Spanias¹•Institutions (1)

01 Apr 2000

TL;DR: This paper reviews methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model.

...read moreread less

Abstract: During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. This paper is organized as follows. First, psychoacoustic principles are described, with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the modified discrete cosine transform, a perfect reconstruction cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of those techniques that utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms that have become international and/or commercial standards receive in-depth treatment, including the ISO/IEC MPEG family (-1, -2, -4), the Lucent Technologies PAC/EPAC/MPAC, the Dolby AC-2/AC-3, and the Sony ATRAC/SDDS algorithms. Then, we describe subjective evaluation methodologies in some detail, including the ITU-R BS.1116 recommendation on subjective measurements of small impairments. This paper concludes with a discussion of future research directions.

...read moreread less

938 citations

Journal Article•DOI•

Multimedia content analysis-using both audio and visual clues

[...]

Yao Wang, Zhu Liu¹, Jincheng Huang²•Institutions (2)

New York University¹, Tsinghua University²

01 Nov 2000-IEEE Signal Processing Magazine

TL;DR: This work describes audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval.

...read moreread less

Abstract: Multimedia content analysis refers to the computerized understanding of the semantic meanings of a multimedia document, such as a video sequence with an accompanying audio track. With a multimedia document, its semantics are embedded in multiple forms that are usually complimentary of each other, Therefore, it is necessary to analyze all types of data: image frames, sound tracks, texts that can be extracted from image frames, and spoken words that can be deciphered from the audio track. This usually involves segmenting the document into semantically meaningful units, classifying each unit into a predefined scene type, and indexing and summarizing the document for efficient retrieval and browsing. We review advances in using audio and visual information jointly for accomplishing the above tasks. We describe audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval. We also describe audio and visual descriptors and description schemes that are being considered by the MPEG-7 standard for multimedia content description.

...read moreread less

552 citations

Patent•

Methods, systems and computer program products for zone based distribution of audio signals

[...]

Larson Isely, Gary Deen, Gregory Knowles, Brian Webb

29 Dec 2000

TL;DR: In this article, a system for dynamic distribution of audio signals at a site based on defined zones within the site is presented, where a plurality of addressable audio devices are coupled to a local network for the site which are configured to receive a designated digital audio stream over the local network and to output the received audio stream to audio equipment located at the site.

...read moreread less

Abstract: Systems and methods are provided for dynamic distribution of audio signals at a site based on defined zones within the site. A plurality of addressable audio devices are coupled to a local network for the site which are configured to receive a designated digital audio stream over the local network and to output the received digital audio stream to audio equipment located at the site. A zone manager defines a plurality of zones for the site which may include a plurality of the addressable audio devices. The zone manager defines a relationship between a characteristic of the audio signal for a reference audio device and for the addressable audio devices in the zones. An audio interface receives digital audio streams and outputs the digital audio streams on the local network addressed to selected ones of the audio devices based on the defined zones, the defined relationship between a characteristic of the audio signal for a reference audio device and for the addressable audio devices and a control input associated with the characteristic. A user interface is provided which is configured to receive a user designation of the control input. Systems and methods for dynamic aggregation of audio equipment in zones are also provided.

...read moreread less

519 citations

Proceedings Article•DOI•

Automatic audio segmentation using a measure of audio novelty

[...]

Jonathan Foote

30 Jul 2000

TL;DR: This method can find individual note boundaries or even natural segment boundaries such as verse/chorus or speech/music transitions, even in the absence of cues such as silence, by analyzing local self-similarity.

...read moreread less

Abstract: The paper describes methods for automatically locating points of significant change in music or audio, by analyzing local self-similarity. This method can find individual note boundaries or even natural segment boundaries such as verse/chorus or speech/music transitions, even in the absence of cues such as silence. This approach uses the signal to model itself, and thus does not rely on particular acoustic cues nor requires training. We present a wide variety of applications, including indexing, segmenting, and beat tracking of music and audio. The method works well on a wide variety of audio sources.

...read moreread less

442 citations

Journal Article•DOI•

A computationally efficient multipitch analysis model

[...]

Tero Tolonen¹, Matti Karjalainen¹•Institutions (1)

Helsinki University of Technology¹

01 Nov 2000-IEEE Transactions on Speech and Audio Processing

TL;DR: The model performance is demonstrated to be comparable to those of recent time-domain models that apply a multichannel analysis, and can be run in real time using typical personal computers.

...read moreread less

Abstract: A computationally efficient model for multipitch and periodicity analysis of complex audio signals is presented. The model essentially divides the signal into two channels, below and above 1000 Hz, computes a "generalized" autocorrelation of the low-channel signal and of the envelope of the high-channel signal, and sums the autocorrelation functions. The summary autocorrelation function (SACF) is further processed to obtain an enhanced SACF (ESACF). The SACF and ESACP representations are used in observing the periodicities of the signal. The model performance is demonstrated to be comparable to those of recent time-domain models that apply a multichannel analysis. In contrast to the multichannel models, the proposed pitch analysis model can be run in real time using typical personal computers. The parameters of the model are experimentally tuned for best multipitch discrimination with typical mixtures of complex tones. The proposed pitch analysis model may be used in complex audio signal processing applications, such as sound source separation, computational auditory scene analysis, and structural representation of audio signals. The performance of the model is demonstrated by pitch analysis examples using sound mixtures which are available for download at http://www.acoustics.hut.fi/-ttolonen/pitchAnalysis/.

...read moreread less

389 citations

Patent•

Self-powered medical device

[...]

Herve Schulz¹, Tom Weidner¹•Institutions (1)

Siemens¹

05 Sep 2000

TL;DR: A self-powered medical device, which is operated independently of the public utility network, has at least one voltage source, a signal input for accepting an analog input signal and a signal processing unit as mentioned in this paper.

...read moreread less

Abstract: A self-powered medical device, which is operated independently of the public utility network, has at least one voltage source, a signal input for accepting an analog input signal and a signal processing unit. The sampling rate for the input signal or the clock frequency of at least one digital component can be varied. Therefore, the energy demand can be reduced according to the requirements of the signal processing or when the discharge state of the voltage source requires a reduction.

...read moreread less

330 citations

Patent•

Interior rearview mirror sound processing system

[...]

Jonathan E. DeLine, Niall R. Lynam, Philip A. March, Ralph A. Spooner

24 Aug 2000

TL;DR: In this paper, a digital sound processor is provided to enhance the vocal to non-vocal noise ratio of the signal processed by a vehicle audio system such as a cellular telephone, emergency communication device, or other audio device.

...read moreread less

Abstract: A digital sound processor is provided to enhance the vocal to non-vocal noise ratio of the signal processed by a vehicle audio system such as a cellular telephone, emergency communication device, or other audio device. Optionally, an indicator is provided for use with the vehicular audio system in order to provide a user of the audio system with a status signal relating to a reception quality of a vocal signal from the user. The microphone of the audio system may be mounted within an accessory module, which may be mounted to an interior surface of a vehicle windshield. The accessory module provides a fixed orientation of the microphone and is easily installed to the vehicle as it is manufactured or as an aftermarket device. The indicator may be mounted at the accessory module or elsewhere at the mirror assembly.

...read moreread less

327 citations

Proceedings Article•DOI•

Quantile based noise estimation for spectral subtraction and Wiener filtering

[...]

Volker Stahl¹, Alexander Fischer¹, Rolf Bippus¹•Institutions (1)

Philips¹

05 Jun 2000

TL;DR: This paper restricts its considerations to the case where only a single microphone recording of the noisy signal is available and proposes a method based on temporal quantiles in the power spectral domain, which is compared with pause detection and recursive averaging.

...read moreread less

Abstract: Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing. In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available. The algorithms which we investigate proceed in two steps. First, the noise power spectrum is estimated. A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging. The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering. The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars. Without noise reduction, we obtain an error rate of 11.7%. Quantile based noise estimation and Wiener filtering reduce the error rate to 8.6%. Similar improvements are achieved in an experiment with artificial, non-stationary noise.

...read moreread less

226 citations

Patent•

Method and system for allowing multiple nodes in a small environment to play audio signals independent of other nodes

[...]

Thomas M. Wheeler¹, Tim Urry Price¹•Institutions (1)

Broadcom¹

27 Jul 2000

TL;DR: In this paper, the authors proposed a method and system capable of playing different audio signals in different nodes of a small environment, consisting of a number of nodes, which may be rooms of a house or hotel, or offices of a business.

...read moreread less

Abstract: A method and system capable of playing different audio signals in different nodes of a small environment. The system is comprised of a number of nodes, which may be rooms of a house or hotel, or offices of a business. Each node has at least one audio speaker. The system further comprises an audio signal distribution device, which is connected to the nodes and delivers audio signals to the nodes. The audio signal distribution device further comprises a storage device for storing the audio signals. At least one of the nodes has a control interface for selecting the audio signals to be transferred to the nodes. In this fashion, each node is capable of playing a different audio signal than any other node is playing concurrently.

...read moreread less

208 citations

Journal Article•DOI•

Content-based audio classification and retrieval using the nearest feature line method

[...]

Stan Z. Li¹•Institutions (1)

Microsoft¹

01 Sep 2000-IEEE Transactions on Speech and Audio Processing

TL;DR: The results show that the NFL-based method produces consistently better results than the NN-based and other methods.

...read moreread less

Abstract: A method is presented for content-based audio classification and retrieval. It is based on a new pattern classification method called the nearest feature line (NFL). In the NFL, information provided by multiple prototypes per class is explored. This contrasts to the nearest neighbor (NN) classification in which the query is compared to each prototype individually. Regarding audio representation, perceptual and cepstral features and their combinations are considered. Extensive experiments are performed to compare various classification methods and feature sets. The results show that the NFL-based method produces consistently better results than the NN-based and other methods. A system resulting from this work has achieved the error rate of 9.78%, as compared to that of 18.34% of a compelling existing system, as tested on a common audio database.

...read moreread less

203 citations

Journal Article•DOI•

Speech and language technologies for audio indexing and retrieval

[...]

John Makhoul, Francis Kubala, Tim Leek, Daben Liu, Long Nguyen, Richard Schwartz, Amit Srivastava - Show less +3 more

01 Aug 2000

TL;DR: This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough 'n' Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data.

...read moreread less

Abstract: With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough 'n' Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives.

...read moreread less

Patent•

Audio recognition peripheral system

[...]

F. S. Mozer, Robert Savoie, William T. Teasley

26 Oct 2000

TL;DR: In this paper, an audio recognition peripheral system consisting of a feature extractor and a vector processor is presented. And the extracted audio recognition features are transmitted to the programmable processor and processed in accordance with an audio classification algorithm.

...read moreread less

Abstract: The present invention includes a novel audio recognition peripheral system and method. The audio recognition peripheral system comprises an audio recognition peripheral a programmable processor such as a microprocessor or microcontroller. In one embodiment, the audio recognition peripheral includes a feature extractor and vector processor. The feature extractor receives an audio signal and extracts recognition features. The extracted audio recognition features are transmitted to the programmable processor and processed in accordance with an audio recognition algorithm. During execution of the audio recognition algorithm, the programmable processor signals the audio recognition peripheral to perform vector operations. Thus, computationally intensive recognition operations are advantageously offloaded to the peripheral.

...read moreread less

Patent•

Portable audio playback device and removable disk drive

[...]

Ross Wright, Brett Jones, Michael S. Kupka

19 Oct 2000

TL;DR: In this article, a portable audio player (10) that includes a removable media drive (30) and audio playback components (20) to play audio data stored on the removable media (12).

...read moreread less

Abstract: A portable audio player (10) that includes a removable media drive (30) and audio playback components (20) to play audio data stored on the removable media (12). The audio playback components (20) include an audio decoder, an audio Codec (24), and a digital to analog converter (25) which receives outputs analog audio signals to headphones (50). A memory (23) within the audio player (10) stores an operating system and a plurality of software Codecs. A suitable software Codec is selected from the plurality of Codecs to decompress the audio data file prior to conversion to analog signals. If the suitable Codec is not stored in memory, it may be read from the removable media (12) such that the portable audio player (10) may properly play the audio content. The portable audio player (10) may also be operated as a removable data storage device for a personal computer.

...read moreread less

Patent•

Improving sound quality of established low bit-rate audio coding systems without loss of decoder compatibility

[...]

Yu-Li You¹, William Paul Smith¹, Zoran Fejzo¹, Stephen Smyth¹•Institutions (1)

DTS¹

19 Jun 2000

TL;DR: In this paper, a multi-channel audio compression technology is presented that extends the range of sampling frequencies compared to existing technologies and/or lowers the noise floor while remaining compatible with those earlier generation technologies.

...read moreread less

Abstract: A multi-channel audio compression technology is presented that extends the range of sampling frequencies compared to existing technologies and/or lowers the noise floor while remaining compatible with those earlier generation technologies. The high-sampling frequency multi-channel audio (12) is decomposed into core audio up to the existing sampling frequencies and a difference signal up to the sampling frequencies of the next generation technologies. The core audio is encoded (18) using the first generation technology such as DTS, DOLBY AC-3 or MPEG I or MPEG II such that the encoded core bit stream (20) is fully compatible with a comparable decoder in the market. The difference signal (34) is encoded (36) using technologies that extend the sampling frequency and/or improve the quality of the core audio. The compressed difference signal (38) is attached as an extension to the core bit stream (20). The extension data will be ignored by the first generation decoders but can be decoded by the second generation decoders. By summing the decoded core and extension audio signals together (28), a second generation decoder can effectively extend the audio signal bandwidth and/or improve the signal to noise ratio beyond that available through the core decoder alone.

...read moreread less

Patent•

Method and system for enhancing audio signals

[...]

Chris Townsend, Aram Lindahl

17 Feb 2000

TL;DR: In this paper, a technique for enhancing audio signals generated from compressed digital audio files is described using a Bass Maximizer module, a Harmonic Exciter module and a Quasi Stereo module.

...read moreread less

Abstract: A technique for enhancing audio signals generated from compressed digital audio files is described. The technique uses a Bass Maximizer module, a Harmonic Exciter module and a Quasi Stereo module. The Bass Exciter module enhances the intensity, depth and punch of the bass audio content by creating harmonic sequences from low frequency components contained in the original input signal. The Harmonic Exciter module adds to the treble audio content of the original input signal by generating harmonic series from the high frequency components contained in the input signal. The Quasi Stereo Module creates a stereo image of the enhanced input signal by adding and subtracting delayed and filtered versions of the enhanced input signal with itself to create left and right channeled stereo-like outputs. The technique provides a useful tool to regenerate from an audio signal more pleasant and joyful sounds.

...read moreread less

Patent•DOI•

Audio signal processing device

[...]

Toshiyuki Kaji, Naomi Yamamoto, Toshihiro Nakane

15 Mar 2000-Journal of the Acoustical Society of America

TL;DR: In this article, an audio signal processing device arranges objects in a virtual 3D space and generates audio signals by performing, at a prescribed listening position, audio simulation to sounds generated from a prescribed sounding position.

...read moreread less

Abstract: The audio signal processing device of the present invention arranges objects in a virtual three-dimensional space and generates audio signals by performing, at a prescribed listening position, audio simulation to sounds generated from a prescribed sounding position. This invention is characterized in that the sound field space subject to audio simulation is structured by combining spatial objects and audio simulation is performed thereto. Here, “spatial object” shall mean the space (sound field space) for audio simulation which was simply modeled in order to simplify audio simulation, and is a virtual object provided with prescribed audio parameters. The sound field space is structured by combining these spatial objects.

...read moreread less

Patent•

Digital wireless premises audio system and method of operation thereof

[...]

Steven D. Curtin¹•Institutions (1)

Alcatel-Lucent¹

11 Apr 2000

TL;DR: In this paper, a digital wireless premises audio system, a method of operating the same and a home theater system incorporating the audio system or the method, is presented, which includes a digital audio encoder/transmitter, located on the premises, that accepts an audio channel in digital form, encodes the channel into a stream of digital data and wirelessly transmits the stream about the premises.

...read moreread less

Abstract: A digital wireless premises audio system, a method of operating the same and a home theater system incorporating the audio system or the method. In one embodiment, the audio system includes: (1) a digital audio encoder/transmitter, located on the premises, that accepts an audio channel in digital form, encodes the channel into a stream of digital data and wirelessly transmits the stream about the premises and (2) a speaker module, located on the premises, couplable to a power source and including, in series, a digital audio receiver/decoder, an audio amplifier and a speaker, that receives the stream, decodes the audio channel therefrom, converts the audio channel to analog form and employs power from the power source to amplify the audio channel and drive the speaker therewith.

...read moreread less

Proceedings Article•DOI•

Video scene segmentation using video and audio features

[...]

Hari Sundaram¹, Shih-Fu Chang¹•Institutions (1)

Columbia University¹

30 Jul 2000

TL;DR: A novel algorithm for video scene segmentation that fuse the resulting segments using a nearest neighbor algorithm that is further refined using a time-alignment distribution derived from the ground truth.

...read moreread less

Abstract: We present a novel algorithm for video scene segmentation. We model a scene as a semantically consistent chunk of audio-visual data. Central to the segmentation framework is the idea of a finite-memory model. We separately segment the audio and video data into scenes, using data in the memory. The audio segmentation algorithm determines the correlations amongst the envelopes of audio features. The video segmentation algorithm determines the correlations amongst shot key-frames. The scene boundaries in both cases are determined using local correlation minima. Then, we fuse the resulting segments using a nearest neighbor algorithm that is further refined using a time-alignment distribution derived from the ground truth. The algorithm was tested on a difficult data set; the first hour of a commercial film with good results. It achieves a scene segmentation accuracy of 84%.

...read moreread less

Proceedings Article•DOI•

A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings

[...]

Masataka Goto

05 Jun 2000

TL;DR: A predominant-F0 estimation method called PreFEst is proposed that does not rely on the F0's unreliable frequency component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range.

...read moreread less

Abstract: This paper describes a robust method for estimating the fundamental frequency (F0) of melody and bass lines in monaural real-world musical audio signals containing sounds of various instruments. Most previous F0-estimation methods had great difficulty dealing with such complex audio signals because they were designed to deal with mixtures of only a few sounds. To make it possible to estimate the F0 of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the F0's unreliable frequency component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. It evaluates the relative dominance of every possible F0 by using the expectation-maximization algorithm and considers the temporal continuity of F0s by using a multiple-agent architecture. Experimental results show that our real-time system can detect the melody and bass lines in audio signals sampled from commercially distributed compact discs.

...read moreread less

Patent•

Wireless communication system for transmitting images from an enabled electronic camera to receiver

[...]

Robert L. Baldino¹, Joseph A. Manico¹•Institutions (1)

Eastman Kodak Company¹

13 Jun 2000

TL;DR: In this paper, a wireless communications system for transmitting and receiving images and audio information includes a receiver including a display for displaying received images and a speaker for producing audio signals, a processing unit for storing ID code signals for decoding the received image and audio signals and receives an RF signal containing images and RF information.

...read moreread less

Abstract: A wireless communications system for transmitting and receiving images and audio information includes a receiver including a display for displaying received images and a speaker for producing audio signals, a processing unit for storing ID code signals for decoding the received images and audio signals and receives an RF signal containing images and audio information. The receiver is responsive to the ID code signal for extracting the received image and audio signals of the images and the audio information from the RF signal. An electronic camera has an image sensor for capturing one or more images of a scene and producing image signals, and receives audio signals to produce audio information signals. A second processing unit is provided for encoding and then superimposing the image and audio signals onto an RF carrier, and transmits the RF carrier and superimposed signals. First and second interconnects respectively included in the receiver and the electronic camera are adapted when interconnected to transfer the ID code signal from the camera to the receiver where it is stored, to thereby permit the electronic camera to communicate with the receiver.

...read moreread less

Journal Article•

Speech and audio signal processing: processing and perception of speech and music [Book Review]

[...]

D. Howard

01 May 2000-Iee Review

Patent•

Method and apparatus for pre-fetching audio data

[...]

Howard H. Cheng

23 Aug 2000

TL;DR: An audio system includes a memory storing audio data and an audio signal processor for processing the audio data as discussed by the authors, where addressing circuitry addresses the memory and a pre-fetch storage area stores data for a current address and for one or more following addresses.

...read moreread less

Abstract: An audio system includes a memory storing audio data and an audio signal processor for processing the audio data. Addressing circuitry addresses the memory and a pre-fetch storage area stores data for a current address and for one or more following addresses to hide memory access latency during address changes of the addressing circuitry.

...read moreread less

Patent•

Structure and method for selecting, controlling and sending internet-based or local digital audio to an am/fm radio or analog amplifier

[...]

Stephen Christopher Gladwin, Depeng Bi, Jeffrey Jonathan Spurgat, Michael Cortopassi

10 Nov 2000

TL;DR: In this article, a system and method that allows digital audio files, either streaming or stored to be controlled and selected and provides an analog audio signal for broadcast by a radio or amplifier without interfering with the operation of the host PC is presented.

...read moreread less

Abstract: A system and method that allows digital audio files, either streaming or stored to be controlled and selected and provides an analog audio signal for broadcast by a radio or amplifier without interfering with the operation of the host PC. A remote device is provided which facilitates control of the system.

...read moreread less

Proceedings Article•DOI•

Transparent and robust audio data hiding in cepstrum domain

[...]

Xin Li¹, H.H. Yu•Institutions (1)

Princeton University¹

30 Jul 2000

TL;DR: The experiment results have shown that the novel audio data hiding scheme in the cepstrum domain can achieve transparent and robust data hiding at the capacity region of above 20 bps.

...read moreread less

Abstract: We propose a novel data hiding scheme for audio signals in the cepstrum domain. Cepstrum representation of audio can be shown to be very robust to a wide range of attacks including most challenging time-scaling and pitch-shifting warping. In the cepstrum domain, we propose to embed data by manipulating the statistical mean of selected cepstrum coefficients. An intuitive psychoacoustic model is employed to control the audibility of introduced distortion. Our experiment results have shown that the novel audio data hiding scheme in the cepstrum domain can achieve transparent and robust data hiding at the capacity region of above 20 bps.

...read moreread less

Proceedings Article•DOI•

Performance evaluation of digital audio watermarking algorithms

[...]

J.D. Gordy¹, L.T. Bruton•Institutions (1)

University of Calgary¹

08 Aug 2000

TL;DR: An algorithm-independent framework for rigorously comparing digital watermarking algorithms with respect to bit rate, perceptual quality, computational complexity, and robustness to signal processing is proposed.

...read moreread less

Abstract: We propose an algorithm-independent framework for rigorously comparing digital watermarking algorithms with respect to bit rate, perceptual quality, computational complexity, and robustness to signal processing. The framework is used to evaluate five audio watermarking algorithms from the literature, revealing that frequency domain techniques perform well under the criteria.

...read moreread less

Proceedings Article•DOI•

Emotional expressions in audiovisual human computer interaction

[...]

L.S. Chen¹, Thomas S. Huang²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Eastman Kodak Company²

30 Jul 2000

TL;DR: A system to continuously monitor the user's voice and facial motions for recognizing emotional expressions is described, crucial for intelligent computers that take on a social role such as an actor or a companion.

...read moreread less

Abstract: Visual and auditory modalities are two of the most commonly used media in interactions between humans. The authors describe a system to continuously monitor the user's voice and facial motions for recognizing emotional expressions. Such an ability is crucial for intelligent computers that take on a social role such as an actor or a companion. We outline methods to extract audio and visual features useful for classifying emotions. Audio and visual information must be handled appropriately in single-modal and bimodal situations. We report audio-only and video-only emotion recognition on the same subjects, in person-dependent and person-independent fashions, and outline methods to handle bimodal recognition.

...read moreread less

Patent•

System for providing a digital watermark in an audio signal

[...]

Donald W. Moses¹, Robert W. Moses¹•Institutions (1)

Intel¹

12 Oct 2000

TL;DR: In this article, a computer-implemented system for providing a digital watermark in an audio signal is presented. But, the system is limited to the use of audio signals.

...read moreread less

Abstract: The foregoing problems are solved and a technical advance is achieved by a computer-implemented system for providing a digital watermark in an audio signal. In a preferred embodiment, a audio file (108), such as a .WAV file, representing an audio signal to be watermarked is processed using an algorithm of the present invention herein referred to as the 'PAWS algorithm' (104) to determine and log the location and number of opportunities that exist for inserting a watermark into the audio signal such that it will be masked by the audio signal. The user can adjust (17) certain parameters (112) of the PAWS algorithm (104) before the audio file is processed. A/B/X testing between the original and watermarked files is also supporter to allow the user to undo or re-encode the watermark, if desired.

...read moreread less

Journal Article•DOI•

An efficient algorithm for sample rate conversion from CD to DAT

[...]

K. Rajamani¹, Yhean-Sen Lai, C.W. Furrow•Institutions (1)

Bell Labs¹

01 Oct 2000-IEEE Signal Processing Letters

TL;DR: This letter unveils an efficient algorithm for sampling rate conversion (SRC) technique from 44.1 kHz compact disc (CD) to 48 kHz digital audio tape (DAT) that requires fewer million instructions per second (MIPS) and memory.

...read moreread less

Abstract: This letter unveils an efficient algorithm for sampling rate conversion (SRC) technique from 44.1 kHz compact disc (CD) to 48 kHz digital audio tape (DAT). This method involves upsampling the input signal by two, and then passing the interpolated signal through a fractional delay filter that employs a simple decimation. This method can also be used for SRC from DAT to CD without changing the filter coefficients. The proposed algorithm is simulated in Matlab and can be implemented in a realtime digital signal processor (DSP). Compared with other existing methods, the proposed method has the advantage that it requires fewer million instructions per second (MIPS) and memory.

...read moreread less

Patent•

System and method for overlapping audio elements in a customized personal radio broadcast

[...]

Jeremy S. De Bonet, Paul A. Viola

07 Sep 2000

TL;DR: In this article, a method for overlapping stored audio elements in a system for providing a customized radio broadcast is proposed, which includes the steps of dividing a first audio element into a plurality of audio element components.

...read moreread less

Abstract: A method for overlapping stored audio elements in a system for providing a customized radio broadcast. The method includes the steps of dividing a first audio element into a plurality of audio element components; selecting one of said audio element components; decompressing the selected audio element component; selecting a second audio element; decompressing the second audio element; mixing the decompressed audio element component with the decompressed second audio element to form a mixed audio element component; and compressing the mixed audio element component to form a compressed overlapping audio element component. The compressed overlapping audio element component may replace the selected audio component. The first audio element may be a song, while the second audio element may be a DJ introduction. Accordingly, the compressed overlapping audio element may be broadcast followed by the remaining components of the song audio element.

...read moreread less

Patent•DOI•

System and method for locating program boundaries and commercial boundaries using audio categories

[...]

Serhan Dagtas¹, Nevenka Dimitrova¹•Institutions (1)

Philips¹

22 Dec 2000-Journal of the Acoustical Society of America

TL;DR: In this article, a system and method for locating program boundaries and commercial boundaries using audio categories is described. But the system is not suitable for use in a video signal processor, as it requires the use of an audio classifier controller that determines the rates of change of audio categories.

...read moreread less

Abstract: For use in a video signal processor, there is disclosed a system and method for locating program boundaries and commercial boundaries using audio categories. The system comprises an audio classifier controller that obtains information concerning the audio categories of the segments of an audio signal. Audio categories include such categories as silence, music, noise and speech. The audio classifier controller determines the rates of change of the audio categories. The audio classifier controller then compares each rate of change of the audio categories with a threshold value to locate the boundaries of the programs and commercials. The audio classifier controller is also capable of classifying at least one feature of an audio category change rate using a multifeature classifier to locate the boundaries of the programs and commercials.

...read moreread less

Collapse