scispace - formally typeset
Search or ask a question

Showing papers on "Audio signal processing published in 2002"


Journal ArticleDOI
TL;DR: The automatic classification of audio signals into an hierarchy of musical genres is explored and three feature sets for representing timbral texture, rhythmic content and pitch content are proposed.
Abstract: Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.

2,668 citations


Patent
25 Jun 2002
TL;DR: In this paper, the authors present systems and methods for building distributed conversational applications using a Web services-based model where speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways.
Abstract: Systems and methods for conversational computing and, in particular, to systems and methods for building distributed conversational applications using a Web services-based model wherein speech engines (e.g., speech recognition) and audio I/O systems are programmable services that can be asynchronously programmed by an application using a standard, extensible SERCP (speech engine remote control protocol), to thereby provide scalable and flexible IP-based architectures that enable deployment of the same application or application development environment across a wide range of voice processing platforms and networks/gateways (e.g., PSTN (public switched telephone network), Wireless, Internet, and VoIP (voice over IP)). Systems and methods are further provided for dynamically allocating, assigning, configuring and controlling speech resources such as speech engines, speech pre/post processing systems, audio subsystems, and exchanges between speech engines using SERCP in a web service-based framework.

619 citations


Journal ArticleDOI
Lie Lu1, Hong-Jiang Zhang1, Hao Jiang1
TL;DR: A robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence is proposed, and an unsupervised speaker segmentation algorithm using a novel scheme based on quasi-GMM and LSP correlation analysis is developed.
Abstract: We present our study of audio content analysis for classification and segmentation, in which an audio stream is segmented according to audio type or speaker identity. We propose a robust approach that is capable of classifying and segmenting an audio stream into speech, music, environment sound, and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and nonspeech discrimination. In this step, a novel algorithm based on K-nearest-neighbor (KNN) and linear spectral pairs-vector quantization (LSP-VQ) is developed. The second step further divides nonspeech class into music, environment sounds, and silence with a rule-based classification scheme. A set of new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. We also develop an unsupervised speaker segmentation algorithm using a novel scheme based on quasi-GMM and LSP correlation analysis. Without a priori knowledge, this algorithm can support the open-set speaker, online speaker modeling and real time segmentation. Experimental results indicate that the proposed algorithms can produce very satisfactory results.

559 citations


Book
15 May 2002
TL;DR: DAFX - Digital Audio Effects features contributions from Daniel Arfib, Xavier Amatrain, Jordi Bonada, Giovanni de Poli, Pierre Dutilleux, Gianpaolo Evangelista, Florian Keiler, Alex Loscos, Davide Rocchesso, Mark Sandler, Xavier Serra, and Todor Todoroff.
Abstract: From the Publisher: Digital Audio Effects stand for the highest possible sound quality and the finest level of control in the modern world of music and sound. Digital Audio Effects (DAFX) is also the name chosen for the European Research Project COST G6 which investigates the use of digital signal processing, its application to sounds, and its musical use designed to put effects on a sound. The aim of the project and this book is to present the main fields of digital audio effects. It systematically introduces the reader to digital signal processing concepts as well as software implementations using MATLAB. Highly acclaimed contributors analyse the latest findings and developments in filters, delays, modulators, and time-frequency processing of sound. Features include: Chapters on time-domain, non-linear, time-segment, time-frequency, source-filter, spectral, bitstream signal processing; spatial effects, time and frequency warping and control of DAFX. MATLAB implementations throughout the book illustrate essential DSP algorithms for sound processing. Accompanying website with sound examples available The approach of applying digital signal processing to sound will appeal to sound engineers as well as to researchers and engineers in the field of signal processing. DAFX - Digital Audio Effects features contributions from Daniel Arfib, Xavier Amatrain, Jordi Bonada, Giovanni de Poli, Pierre Dutilleux, Gianpaolo Evangelista, Florian Keiler, Alex Loscos, Davide Rocchesso, Mark Sandler, Xavier Serra, and Todor Todoroff.

505 citations


Proceedings ArticleDOI
09 Dec 2002
TL;DR: Different techniques mapping functional parts to blocks of a unified framework for audio fingerprinting are reviewed, with a focus on pattern matching and robust hashing.
Abstract: An audio fingerprint is a content-based compact signature that summarizes an audio recording Audio fingerprinting technologies have recently attracted attention since they allow the monitoring of audio independently of its format and without the need of meta-data or watermark embedding The different approaches to fingerprinting are usually described with different rationales and terminology depending on the background: pattern matching, multimedia (music) information retrieval or cryptography (robust hashing) In this paper, we review different techniques mapping functional parts to blocks of a unified framework

346 citations


Patent
15 Jul 2002
TL;DR: In this article, a method and system for direct audio capture and identification of the captured audio is presented, where a user may then be offered the opportunity to purchase recordings directly over the Internet or similar outlet.
Abstract: A method and system for direct audio capture and identification of the captured audio. A user may then be offered the opportunity to purchase recordings directly over the Internet or similar outlet. The system preferably includes one or more user-carried portable audio capture devices that employ a microphone, analog to digital converter, signal processor, and memory to store samples of ambient audio or audio features calculated from the audio. Users activate their capture devices when they hear a recording that they would like to identify or purchase. Later, the user may connect the capture device to a personal computer to transfer the audio samples or audio feature samples to an Internet site for identification. The Internet site preferably uses automatic pattern recognition techniques to identify the captured samples from a library of recordings offered for sale. The user can then verify that the sample is from the desired recording and place an order online. The pattern recognition process uses features of the audio itself and does not require the presence of artificial codes or watermarks. Audio to be identified can be from any source, including radio and television broadcasts or recordings that are played locally.

345 citations


PatentDOI
TL;DR: In this article, the authors propose a speech recognition technique for video and audio signals that consists of processing a video signal associated with an arbitrary content video source, processing an audio signal associated to the video signal, and recognizing at least a portion of the processed audio signal using at least the processed video signal to generate output signal representative of the audio signal.
Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

302 citations


Patent
28 Jun 2002
TL;DR: In this paper, the authors describe a digital audio player with a menu-driven user interface for selecting, sorting, and playback of stored audio data files, including jazz, pop, and rock.
Abstract: A digital audio player (10) and a method for processing encoded digital audio data, wherein the digital audio data is encoded using one of a plurality of encoding formats. The exemplary audio data player includes a hard disk or other data storage medium (32) for storing data files, a microcontroller (22), buffer memory (25) for anti-skip protection, and an audio decoder (12). The encoded audio data files and associated decoder files are downloaded from a personal computer or similar device to the audio data player hard drive. The player provides a menu-driven user interface (21, 26) for selection, sorting, and playback of stored audio data files. The audio decoder, generally a digital signal processor, provides various preset equalization modes. The preset modes are specific to audio genres such as jazz, pop, and rock. The user may select a specific equalization mode by using the user interface (21, 26), or the preset equalization mode will be automatically set based on the genre, or other attribute information, included in a tag portion of the audio data file.

281 citations


PatentDOI
TL;DR: In this paper, the authors describe a personal communication device consisting of a transmitter/receiver coupled to a communication medium for transmitted receiving audio signals, control circuitry that controls transmission, reception and processing of call and audio signals.
Abstract: A mobile phone or other personal communication device includes resources applying measures of an individual's hearing profile, personal choice profile, and induced hearing loss profile, separately or in combination, to build the basis of sound enhancement. A personal communication device thus comprises a transmitter/receiver coupled to a communication medium for transmitted receiving audio signals, control circuitry that controls transmission, reception and processing of call and audio signals, a speaker, and a microphone. The control circuitry includes logic applying one or more of a hearing profile of the user, a user preference related hearing, and environmental noise factors in processing the audio signals. The control circuitry may includes instruction memory and an instruction execution processor such as a digital signal processor.

212 citations


Patent
29 Apr 2002
TL;DR: A sound processing apparatus for creating virtual sound sources in a three dimensional space includes a number of modules as mentioned in this paper, including an aural exciter module, panning module, distance control module, delay module, occlusion and air absorption module, Doppler module for pitch shifting, a location processor module, and an output.
Abstract: A sound processing apparatus for creating virtual sound sources in a three dimensional space includes a number of modules. These include an aural exciter module; an automated panning module; a distance control module; a delay module; an occlusion and air absorption module; a Doppler module for pitch shifting; a location processor module; and an output.

188 citations


Patent
07 Aug 2002
TL;DR: In this paper, a dirctional signal processing system for beamforming information signals is presented, which includes an oversampled filterbank, which has an analysis filterbank for transforming the information signals in time domain into channel signals in transform domain, a synthesis filterbank and a signal processor.
Abstract: A dirctional signal processing system for beamforming information signals. The system includes an oversampled filterbank, which has an analysis filterbank for transforming the information signals in time domain into channel signals in transform domain, a synthesis filterbank and a signal processor. The signal processor processes the outputs of the analysis filterbank for beamforming the information signals. The synthesis filterbank transforms the outputs of the signal processorto a single information signal in time domain.

PatentDOI
TL;DR: In this paper, an audio signal is analyzed using multiple pschoacoustic criteria to identify a region of the signal in which time scaling and/or pitch shifting processing would be inaudible or minimally audible.
Abstract: In one alternative, an audio signal is analyzed using multiple pschoacoustic criteria to identify a region of the signal in which time scaling and/or pitch shifting processing whould be inaudible or minimally audible, and the signal is time scaled and/or pitch shifted within that region. In another alternative, the signal is divided into auditory events, and the signal is time scaled and/or pitch shifted within an auditory event. In a further alternative, the signal is divided into auditory events, and the auditory events are analyzed using a psychoacoustic criterion to identify those auditory events in which the time scaling and/or pitch shifting procession of the signal would be inaudible or minimally audible. Further alternatives provide for multiple channels of audio.

Patent
13 Jun 2002
TL;DR: In this paper, a portable mixing recorder which enables a user to readily produce music using overdubbing and/or other recording techniques while suppressing degradation of sound quality to the minimum without excessive concern for space restriction is provided.
Abstract: There is provided a portable mixing recorder which enables a user to readily produce music using overdubbing and/or other recording techniques while suppressing degradation of sound quality to the minimum without excessive concern for space restriction. An input analog audio signal is converted to a digital audio signal by an A/D converter section. A decoder reads out a compressed audio signal from an original source file stored in a memory card, and then extends the compressed audio signal to a digital audio signal. A mixing section mixes the digital audio signal obtained by the A/D conversion by the A/D converter section and the digital audio signal obtained by the extension by the decoder. An encoder compresses the digital audio signal obtained by the mixing by the mixing section to a compressed audio signal (mixed file). The mixed file obtained by the compression by the encoder is stored as a new source file in the memory card.

Patent
Walter Etter1
07 Jun 2002
TL;DR: In this article, sound signals are generated by mixing weighted time-and frequency-domain processed signals, the former signal generally representing speech-based signals while the latter representing music-based signal.
Abstract: Time-scaled, sound signals (i.e. sounds output at differing speeds) are generated by mixing weighted time-and frequency-domain processed signals, the former signal generally representing speech-based signals while the latter representing music-based signals. The weights applied to each type of signal may be determined by a scaling factor, which in turn is related to the desired speed at which a listener desires to hear a sound signal. In one example of the invention, only stationary signal portions of an input sound signal are used to generate time-scaled processed signals. An adaptive frame-size may also be used to pre-process the separate signals prior to being weighted, which at least decreases the amount of unwanted reverberative sound qualities in a resulting sound signal. Together, techniques envisioned by the present invention produce improved, speed adjusted sound signals.

Patent
28 Feb 2002
TL;DR: In this paper detects transition points in the audio signal and the video signal, which are used to align in time the video and audio signals, and the resulting edited video signal is merged with the audio signals to form a music video.
Abstract: Music videos are automatically produced from source audio and video signals. The music video contains edited portions of the video signal synchronized with the audio signal. An embodiment detects transition points in the audio signal and the video signal. The transition points are used to align in time the video and audio signals. The video signal is edited according to its alignment with the audio signal. The resulting edited video signal is merged with the audio signal to form a music video.

PatentDOI
Philip R. Wiser1, LeeAnn Heringer1, Gerry Kearby1, Leon Rishniw1, Jason S. Brownell1 
TL;DR: In this paper, audio processing profiles are organized according to specific delivery bandwidths such that a sound engineer can quickly and efficiently encode audio signals for each of a number of distinct delivery media.
Abstract: Essentially all of the processing parameters which control processing of a source audio signal to produce an encoded audio signal are stored in an audio processing profile. Multiple audio processing profiles are stored in a processing profile database such that specific combinations of processing parameters can be retrieved and used at a later time. Audio processing profiles are organized according to specific delivery bandwidths such that a sound engineer can quickly and efficiently encode audio signals for each of a number of distinct delivery media. Synchronized A/B switching during playback of various encoded audio signals allows the sound engineer to detect nuances in the sound characteristics of the various encoded audio signals.

Patent
26 Jun 2002
TL;DR: In this article, the authors presented a large capacity, user defined audio content delivery system that delivers uninterrupted music and information content (e.g., news by evaluating and encoding an input audio stream while outputting another stream).
Abstract: The present invention provides a large capacity, user defined audio content delivery system. The system delivers uninterrupted music and delivers information content (e.g., news by evaluating and encoding an input audio stream while outputting another stream. Undesirable audio content (e.g., advertisements and unwanted news) are not present in the output audio stream as only desired portions of information content are stored for playback on demand. The invention also includes a user interface that is simple enough to facilitate utilization of the audio system in an automobile and employs standard hardware available in typical computing and/or personal digital assistant equipment. Additionally, the audio system can be portable (e.g., as portable as a personal digital assistant) and can be updated in real time or off line via a personal computer.

Patent
23 Apr 2002
TL;DR: In this article, the authors propose a method for synchronizing the playback of a digital audio broadcast on a plurality of network output devices by inserting an audio waveform sample in an audio stream of the audio broadcast, such that the time between the first and second unique signals must be significantly greater than the latency between sending and receiving devices.
Abstract: A method is provided for synchronizing the playback of a digital audio broadcast on a plurality of network output devices by inserting an audio waveform sample in an audio stream of the digital audio broadcast. The method includes the steps of outputting a first unique signal as part of an audio signal which has unique identifying characteristics and is regularly occurring, outputting a second unique signal so that the time between the first and second unique signals must be significantly greater than the latency between sending and receiving devices, and coordinating play of audio by an audio waveform sample assuring the simultaneous output of the audio signal from multiple devices. An algorithm in hardware, software, or a combination of the two identifies the audio waveform sample in the audio stream. The digital audio broadcast from multiple receivers does not present to a listener any audible delay or echo effect.

Patent
25 Feb 2002
TL;DR: In this paper, a method for time aligning audio signal, wherein one signal has been derived from the other or both have been derived separately from another signal, comprises deriving reduced-information characterizations of the audio signals, auditory scene analysis.
Abstract: A method for time aligning audio signal, wherein one signal has been derived from the other or both have been derived from another signal, comprises deriving reduced-information characterizations of the audio signals, auditory scene analysis. The time offset of one characterization with respect to the other characterization is calculated and the temporal relationship of the audio signals with respect to each other is modified in response to the time offset such that the audio signals are coicident with each other. These principles may also be applied to a method for time aligning a video signal and an audio signal that will be subjected to differential time offsets.

Journal ArticleDOI
TL;DR: A novel audio watermarking algorithm to protect against unauthorized copying of digital audio is presented that includes a psychoacoustic model of MPEG audio coding and it introduces no audible distortion after watermark insertion.
Abstract: Digital watermark technology is now drawing attention as a new method of protecting digital content from unauthorized copying. This paper presents a novel audio watermarking algorithm to protect against unauthorized copying of digital audio. The proposed watermarking scheme includes a psychoacoustic model of MPEG audio coding to ensure that the watermarking does not affect the quality of the original sound. After embedding the watermark, our scheme extracts copyright information without access to the original signal by using a whitening procedure for linear prediction filtering before correlation. Experimental results show that our watermarking scheme is robust against common signal processing attacks and it introduces no audible distortion after watermark insertion.

Patent
John Barile1
16 Jan 2002
TL;DR: In this article, a communication terminal (10) for video conferencing with remote participants, including a receiver receiving audio and video signals from a plurality of the remote participants and a display (30), is presented.
Abstract: A communication terminal (10) for video conferencing with remote participants, including a receiver receiving audio and video signals from a plurality of the remote participants, and a display (30). In one form, a comparator compares the audio signals and a controller controls the display (30) to display the video images extracted from the video signals based on the comparison of the received audio signals. In another form, the display has a height greater than its width and operates in a portrait mode in a default condition, and a controller controls the display to display the extrated video images in a landscape mode when the receiver receives the video signals from a plurality of the remote participants, In yet another form, a processor associates the received audio signals with the video signal received from the same remote participant, with the display displaying one of the video images on the right and another video image on the left, where an audio output sends the audio signal associated with the one video signal to a right speaker and sends the audio signal associated with the other video signal to a left speaker.

Patent
09 Oct 2002
TL;DR: In this article, the authors present a method and apparatus for identifying broadcast digital audio signals, where the digital audio signal is provided to a processing structure which is configured to identify a program-identifying code in the received digital signal, identifying a program identifying code in a decompressed received digital audio message, and identifying a feature signature in the decoded received digital message.
Abstract: Method and apparatus for identifying broadcast digital audio signals include structure and/or function whereby the digital audio signal is provided to processing structure which is configured to (i) identify a program-identifying code in the received digital audio signal, (ii) identify a program-identifying code in a decompressed received digital audio signal, (iii) identify a feature signature in the received digital audio signal, and (iv) identify a feature signature in the decompressed received digital audio signal. Preferably, such processing structure is disposed in a dwelling or a monitoring site in an audience measurement system, such as the Nielsen TV ratings system.

Patent
12 Jun 2002
TL;DR: In this paper, a method for processing and transducing audio signals is proposed, which divides the first audio signal into a first spectral band signal and a second spectral band signals, and then scales the first band signal by a scaling factor proportional to the amplitude of the second signal.
Abstract: A method for processing and transducing audio signals. An audio system has a first audio signal and a second audio signal that have amplitudes. A method for processing the audio signals includes dividing the first audio signal into a first spectral band signal and a second spectral band signal; scaling the first spectral band signal by a first scaling factor proportional to the amplitude of the second audio signal; and scaling the first spectral band signal by a second scaling factor to create a second signal portion. Other portions of the disclosure include application of the signal processing method to multichannel audio systems, and to audio systems having different combinations of directional loudspeakers, full range loudspeakers, and limited range loudspeakers.

Journal ArticleDOI
TL;DR: A new approach for signal models in the context of audio signal encoding based upon hybrid models featuring simultaneously transient, tonal and stochastic components in the signal is discussed.

Patent
13 Sep 2002
TL;DR: In this paper, a system and method for automatically combining image and audio data to create a multimedia presentation is presented, where audio and image data are received by the system and the audio data includes a list of events that correspond to points of interest in an audio file or audio stream.
Abstract: The present invention provides a system and method for automatically combining image and audio data to create a multimedia presentation In one embodiment, audio and image data are received by the system The audio data includes a list of events that correspond to points of interest in an audio file The audio data may also include an audio file or audio stream The received images are then matched to the audio file or stream using the time In one embodiment, the events represent times within the audio file or stream at which there is a certain feature or characteristic in the audio file The audio events list may be processed to remove, sort or predict or otherwise generate audio events Images processing may also occur, and may include image analysis to determine image matching to the event list, deleting images, and processing images to incorporate effects Image effects may include cropping, panning, zooming and other visual effects

Patent
23 Apr 2002
TL;DR: In this paper, the authors propose a method for synchronizing the playback of a digital audio broadcast on a plurality of network output devices by inserting a control track pulse in an audio stream of the audio broadcast.
Abstract: A method is provided for synchronizing the playback of a digital audio broadcast on a plurality of network output devices by inserting a control track pulse in an audio stream of the digital audio broadcast. The method includes the steps of outputting a first control track pulse as part of an audio signal which has unique identifying characteristics and is regularly occurring, outputting a second control track pulse so that the time between the first and second control track pulses must be significantly greater than the latency between sending and receiving devices, and coordinating play of audio at the time of the occurrence of the transmission of the second control track pulse assuring the simultaneous output of the audio signal from multiple devices. The control track pulses have a value unique from any other portion of the audio stream. The digital audio broadcast from multiple receivers does not present to a listener any audible delay or echo effect.

Patent
07 May 2002
TL;DR: In this article, a system for generating at least one remote virtual speaker location in connection with at least a partial reflective environment (12, 12, 14 or 15) in combination with an audio speaker for creating multiple sound effects including a virtual sound source from the reflective environment which is perceived by a listener (53) as an original sound source, by generating a primary direct audio output by emitting audio compression waves toward a listener, and generating a secondary indirect audio output from at least 1 virtual speaker (24, 25 or 26) remote from the audio speakers, by emitting ultrasonic sound from at
Abstract: A system for generating at least one remote virtual speaker location in connection with at least a partial reflective environment (12, 12, 14 or 15) in combination with an audio speaker for creating multiple sound effects including a virtual sound source from the reflective environment which is perceived by a listener (53) as an original sound source, by generating a primary direct audio output by emitting audio compression waves toward a listener, and generating a secondary indirect audio output from at least one virtual speaker (24, 25 or 26) remote from the audio speakers, by emitting ultrasonic sound from at least one parametric speaker (20, 21 or 22) associated with the audio speakers and oriented toward at least one reflective environment which is remote from the audio speakers.

Patent
26 Feb 2002
TL;DR: In this paper, an audio signal is divided into auditory events, each of which tends to be perceived as separate and distinct, by calculating the spectral content of successive time blocks of the audio signal.
Abstract: In one aspect, the invention divides an audio signal into auditory events, each of which tends to be perceived as separate and distinct, by calculating the spectral content of successive time blocks of the audio signal, calculating the difference in spectral content between successive time blocks of the audio signal, and identifying an auditory event boundary as the boundary between successive time blocks when the difference in the spectral content between such successive time blocks exceeds a threshold. In another aspect, the invention generates a reduced-information representation of an audio signal by dividing an audio signal into auditory events, each of which tends to be perceived as separate and distinct, and formatting and storing information relating to the auditory events. Optionally, the invention may also assign a characteristic to one or more of the auditory events. Auditory events may be determined according to the first aspect of the invention or by another method.

PatentDOI
TL;DR: In this article, a controller is operatively linked to an ultrasonic audio processor, which converts the audio signal received from the controller into ultrasonic signals and produces audible sound substantially confined within a column of air in front of the controller.
Abstract: A casino gaming apparatus is disclosed which includes a controller programmed to generate an audio signal. The controller is operatively linked to an ultrasonic audio processor. The ultrasonic audio processor is programmed to convert the audio signal received from the controller into an ultrasonic signal. The ultrasonic audio processor is operatively linked to an ultrasonic emitter which emits the ultrasonic signal along a column of air in front of the gaming apparatus. The ultrasonic signal is demodulated into audible sounds along the column by interaction of the ultrasonic signal with air to produce audible sound substantially confined within the column. The column intersects the position where the player stands or sits. Accordingly, sounds are generated by the gaming apparatus are confined to an area occupied by the player and provide little or no distractions for players using adjacent gaming apparatuses.

Patent
11 Jul 2002
TL;DR: In this article, the authors proposed a method to control the extent of reverberations by dynamically adjusting all-pass filter coefficients with the inter-channel coherence cues, which can be used to segment a signal in the time domain finely in the lower frequency region and coarsely in the higher frequency region.
Abstract: In the conventional art inventions for coding multi-channel audio signals, three of the major processes involved are: generation of a reverberation signal using an all-pass filter; segmentation of a signal in the time and frequency domains for the purpose of level adjustment; and mixing of a coded binaural signal with an original signal coded up to a fixed crossover frequency. These processes pose the problems mentioned in the present invention. The present invention proposes the following three embodiments: to control the extent of reverberations by dynamically adjusting all-pass filter coefficients with the inter-channel coherence cues; to segment a signal in the time domain finely in the lower frequency region and coarsely in the higher frequency region; and to control a crossover frequency used for mixing based on a bit rate, and if the original signal is coarsely quantized, to mix a downmix signal with an original signal in proportions determined by an inter-channel coherence cue.