scispace - formally typeset
Search or ask a question

Showing papers on "Audio signal processing published in 2013"


Patent
07 May 2013
TL;DR: In this paper, a scanning code symbol reading system includes an analog scan data signal processor for producing digitized data signals, wherein during each laser beam scanning cycle, a light collection and photo-detection module generates an analog scans data signal corresponding to a laser scanned code symbol, an analog scanner/digitizer processes the analog scans signal to generate digital data signals corresponding to the corresponding code symbols, and a synchronized digital gain control module automatically processes the digitised data signals in response to start of scan (SOS) signals generated by a SOS detector.
Abstract: A scanning code symbol reading system includes an analog scan data signal processor for producing digitized data signals, wherein during each laser beam scanning cycle, a light collection and photo-detection module generates an analog scan data signal corresponding to a laser scanned code symbol, an analog scan data signal processor/digitizer processes the analog scan data signal to generate digital data signals corresponding thereto, and a synchronized digital gain control module automatically processes the digitized data signals in response to start of scan (SOS) signals generated by a SOS detector. The synchronized digital gain control module generates digital control data which is transmitted to the analog scan data signal processor for use in controlling the gain of a signal processing stage in the light collection and photo-detection module and/or analog scan data signal processor, during the corresponding laser beam scanning cycle.

329 citations


Patent
15 Mar 2013
TL;DR: In this article, a communication component modifies production of an audio waveform at determined modification segments to mitigate the effects of a delay in processing and/or receiving a subsequent audio wave form.
Abstract: A communication component modifies production of an audio waveform at determined modification segments to thereby mitigate the effects of a delay in processing and/or receiving a subsequent audio waveform. The audio waveform and/or data associated with the audio waveform are analyzed to identify the modification segments based on characteristics of the audio waveform and/or data associated therewith. The modification segments show where the production of the audio waveform may be modified without substantially affecting the clarity of the sound or audio. In one embodiment, the invention modifies the sound production at the identified modification segments to extend production time and thereby mitigate the effects of delay in receiving and/or processing a subsequent audio waveform for production.

302 citations


Patent
15 Mar 2013
TL;DR: In this article, a power delivery method and system for powering a headset is presented, where a power signal is combined with an audio signal to form a composite signal that is communicated over a shared channel to the headset.
Abstract: A power delivery method and system for powering a headset. A power signal is combined with an audio signal to form a composite signal that is communicated over a shared channel to the headset. The power signal is generated by modulating a carrier signal with a modulation signal. The modulation signal is derived from the amplitude of the audio signal so that the peak levels of the composite signal do not exceed the maximum allowable output of an audio I/O circuit driving the headset.

297 citations


Journal ArticleDOI
TL;DR: Starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, the worth of individual features across these three domains is interpreted, considering four audio databases with observer annotations in the arousal and valence dimensions, finding a high degree of cross-domain consistency in encoding the two main dimensions of affect.
Abstract: Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of ’the sound that something makes’, in order to evaluate the systems auditory environment and its own audio output. This article aims at a first step towards a holistic computational model: Starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal and valence regression is feasible achieving significant correlations with the observer annotations of up to .78 for arousal (training on sound and testing on enacted speech) and .60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.

223 citations


Proceedings ArticleDOI
01 Jan 2013
TL;DR: An overview of systems submitted to the public evaluation challenge on acoustic scene classification and detection of sound events within a scene as well as a detailed evaluation of the results achieved by those systems are provided.
Abstract: This paper describes a newly-launched public evaluation challenge on acoustic scene classification and detection of sound events within a scene. Systems dealing with such tasks are far from exhibiting human-like performance and robustness. Undermining factors are numerous: the extreme variability of sources of interest possibly interfering, the presence of complex background noise as well as room effects like reverberation. The proposed challenge is an attempt to help the research community move forward in defining and studying the aforementioned tasks. Apart from the challenge description, this paper provides an overview of systems submitted to the challenge as well as a detailed evaluation of the results achieved by those systems.

186 citations


Journal ArticleDOI
TL;DR: This paper presents a meta-modelling architecture suitable for high-performance digital signal processing (DSP) for microwave and millimeter-wave radio systems with real-time requirements.
Abstract: Today's exploding demand for faster, more reliable, and ubiquitous radio systems in communication, instrumentation, radar, and sensors poses unprecedented challenges in microwave and millimeter-wave engineering. Recently, the predominant trend has been to place an increasing emphasis on digital signal processing (DSP). However, while offering device compactness and processing flexibility, DSP suffers fundamental drawbacks, such as high-cost analog-digital conversion, high power consumption, and poor performance at high frequencies.

176 citations


Patent
15 Mar 2013
TL;DR: In this article, perceptual and robustness evaluation is integrated into audio watermark embedding to optimize audio quality relative the original signal, and to optimize robustness or data capacity, which is applied to audio segments in audio embedder and detector configurations to support real-time operation.
Abstract: Audio signal processing enhances audio watermark embedding and detecting processes. Audio signal processes include audio classification and adapting watermark embedding and detecting based on classification. Advances in audio watermark design include adaptive watermark signal structure data protocols, perceptual models, and insertion methods. Perceptual and robustness evaluation is integrated into audio watermark embedding to optimize audio quality relative the original signal, and to optimize robustness or data capacity. These methods are applied to audio segments in audio embedder and detector configurations to support real time operation. Feature extraction and matching are also used to adapt audio watermark embedding and detecting.

174 citations


Patent
28 Aug 2013
TL;DR: In this article, the authors describe a system of rendering object-based audio content through a system that includes individually addressable drivers, including at least one driver that is configured to project sound waves toward one or more surfaces within a listening environment for reflection to a listening area within the listening environment.
Abstract: Embodiments are described for a system of rendering object-based audio content through a system that includes individually addressable drivers, including at least one driver that is configured to project sound waves toward one or more surfaces within a listening environment for reflection to a listening area within the listening environment; a renderer configured to receive and process audio streams and one or more metadata sets associated with each of the audio streams and specifying a playback location of a respective audio stream; and a playback system coupled to the renderer and configured to render the audio streams to a plurality of audio feeds corresponding to the array of audio drivers in accordance with the one or more metadata sets.

161 citations


Patent
22 Feb 2013
TL;DR: In this article, an Angle and Distance Processing (ADP) module is employed on a mobile device and configured to provide runtime angle and distance information to an adaptive beamformer for canceling noise signals, provides a means for building a table of filter coefficients for adaptive filters used in echo cancellation, provides faster and more accurate Automatic Gain Control (AGC), provides delay information for a classifier in a Voice Activity Detector (VAD), and assists in separating echo path changes from double talk.
Abstract: The disclosed system and method for a mobile device combines information derived from onboard sensors with conventional signal processing information derived from a speech or audio signal to assist in noise and echo cancellation. In some implementations, an Angle and Distance Processing (ADP) module is employed on a mobile device and configured to provide runtime angle and distance information to an adaptive beamformer for canceling noise signals, provides a means for building a table of filter coefficients for adaptive filters used in echo cancellation, provides faster and more accurate Automatic Gain Control (AGC), provides delay information for a classifier in a Voice Activity Detector (VAD), provides a means for automatic switching between a speakerphone and handset mode of the mobile device, or primary microphone and reference microphones and assists in separating echo path changes from double talk.

122 citations


Patent
Pei Xiang1, Dipanjan Sen1
18 Jul 2013
TL;DR: In this paper, techniques are described for grouping audio objects into clusters, where the cluster analysis module is configured to receive information from at least one of a transmission channel, a decoder, and a renderer, and wherein a maximum value for L is based on the information received.
Abstract: In general, techniques are described for grouping audio objects into clusters. In some examples, a device for audio signal processing comprises a cluster analysis module configured to group, based on spatial information for each of N audio objects, a plurality of audio objects that includes the N audio objects into L clusters, where L is less than N, wherein the cluster analysis module is configured to receive information from at least one of a transmission channel, a decoder, and a renderer, and wherein a maximum value for L is based on the information received. The device also comprises a downmix module configured to mix the plurality of audio objects into L audio streams, and a metadata downmix module configured to produce, based on the spatial information and the grouping, metadata that indicates spatial information for each of the L audio streams.

120 citations


Journal ArticleDOI
TL;DR: This work derives two conjugate gradient algorithms for the computation of the filter coefficients and shows improved audio source separation performance compared to the classical Wiener filter both in oracle and in blind conditions.
Abstract: Wiener filtering is one of the most ubiquitous tools in signal processing, in particular for signal denoising and source separation. In the context of audio, it is typically applied in the time-frequency domain by means of the short-time Fourier transform (STFT). Such processing does generally not take into account the relationship between STFT coefficients in different time-frequency bins due to the redundancy of the STFT, which we refer to as consistency. We propose to enforce this relationship in the design of the Wiener filter, either as a hard constraint or as a soft penalty. We derive two conjugate gradient algorithms for the computation of the filter coefficients and show improved audio source separation performance compared to the classical Wiener filter both in oracle and in blind conditions.

Patent
18 Sep 2013
TL;DR: In this paper, a processing system receives an audio signal encoding a portion of an utterance and receives context information associated with the utterance, wherein the context information is not derived from the audio signal or any other audio signal.
Abstract: A processing system receives an audio signal encoding a portion of an utterance. The processing system receives context information associated with the utterance, wherein the context information is not derived from the audio signal or any other audio signal. The processing system provides, as input to a neural network, data corresponding to the audio signal and the context information, and generates a transcription for the utterance based on at least an output of the neural network.

Patent
20 Dec 2013
TL;DR: In this article, the authors propose a speech-triggered transition of a host processor and/or computing device from a low functionality mode to a high functionality mode in which full vocabulary speech recognition can be accomplished.
Abstract: Disclosed are embodiments for seamless, single-step, and speech-triggered transition of a host processor and/or computing device from a low functionality mode to a high functionality mode in which full vocabulary speech recognition can be accomplished. First audio samples are captured by a low power audio processor while the host processor is in a low functionality mode. The low power audio processor may identify a predetermined audio pattern. The low power audio processor, upon identifying the predetermined audio pattern, triggers the host processor to transition to a high functionality mode. An end portion of the first audio samples that follow an end-point of the predetermined audio pattern may be stored in system memory accessible by the host processor. Second audio samples are captured and stored with the end portion of the first audio samples. Once the host processor transitions to a high functionality mode, multi-channel full vocabulary speech recognition can be performed and functions can be executed based on detected speech interaction phrases.

Proceedings ArticleDOI
14 Nov 2013
TL;DR: It is an open problem for signal processing and machine learning to reliably identify bird sounds in real-world audio data collected in an acoustic monitoring scenario.
Abstract: Birds have been widely used as biological indicators for ecological research. They respond quickly to environmental changes and can be used to infer about other organisms (e.g., insects they feed on). Traditional methods for collecting data about birds involves costly human effort. A promising alternative is acoustic monitoring. There are many advantages to recording audio of birds compared to human surveys, including increased temporal and spatial resolution and extent, applicability in remote sites, reduced observer bias, and potentially lower cost. However, it is an open problem for signal processing and machine learning to reliably identify bird sounds in real-world audio data collected in an acoustic monitoring scenario. Some of the major challenges include multiple simultaneously vocalizing birds, other sources of non-bird sound (e.g., buzzing insects), and background noise like wind, rain, and motor vehicles.

Journal ArticleDOI
TL;DR: A new adaptive audio watermarking algorithm based on Empirical Mode Decomposition (EMD) is introduced and the robustness of the hidden watermark for additive noise, MP3 compression, re-quantization, filtering, cropping and resampling is shown.
Abstract: In this paper a new adaptive audio watermarking algorithm based on Empirical Mode Decomposition (EMD) is introduced. The audio signal is divided into frames and each one is decomposed adaptively, by EMD, into intrinsic oscillatory components called Intrinsic Mode Functions (IMFs). The watermark and the synchronization codes are embedded into the extrema of the last IMF, a low frequency mode stable under different attacks and preserving audio perceptual quality of the host signal. The data embedding rate of the proposed algorithm is 46.9-50.3 b/s. Relying on exhaustive simulations, we show the robustness of the hidden watermark for additive noise, MP3 compression, re-quantization, filtering, cropping and resampling. The comparison analysis shows that our method has better performance than watermarking schemes reported recently.

Patent
13 Mar 2013
TL;DR: In this paper, a method for retrieving first direction of arrival (DOA) data corresponding to a first audio output device from a memory of the audio processing device and generating a first null beam directed toward the first output device based on the first DOA data was proposed.
Abstract: A method includes, while operating an audio processing device in a use mode, retrieving first direction of arrival (DOA) data corresponding to a first audio output device from a memory of the audio processing device and generating a first null beam directed toward the first audio output device based on the first DOA data. The method also includes retrieving second DOA data corresponding to a second audio output device from the memory of the audio processing device and generating a second null beam directed toward the second audio output device based on the second DOA data. The first DOA data and the second DOA data are stored in the memory during operation of the audio processing device in a calibration mode.

Proceedings ArticleDOI
01 Jan 2013
TL;DR: A novel, exemplar-based method for audio event detection based on non-negative matrix factorisation, which model events as a linear combination of dictionary atoms, and mixtures as alinear combination of overlapping events.
Abstract: We present a novel, exemplar-based method for audio event detection based on non-negative matrix factorisation. Building on recent work in noise robust automatic speech recognition, we model events as a linear combination of dictionary atoms, and mixtures as a linear combination of overlapping events. The weights of activated atoms in an observation serve directly as evidence for the underlying event classes. The atoms in the dictionary span multiple frames and are created by extracting all possible fixed-length exemplars from the training data. To combat data scarcity of small training datasets, we propose to artificially augment the amount of training data by linear time warping in the feature domain at multiple rates. The method is evaluated on the Office Live and Office Synthetic datasets released by the AASP Challenge on Detection and Classification of Acoustic Scenes and Events.

Journal ArticleDOI
TL;DR: A novel method that exploits correlation between audio-visual dynamics of a video to segment and localize objects that are the dominant source of audio to solve the problem of audio-video synchronization and is used to aid interactive segmentation.
Abstract: In this paper, we propose a novel method that exploits correlation between audio-visual dynamics of a video to segment and localize objects that are the dominant source of audio. Our approach consists of a two-step spatiotemporal segmentation mechanism that relies on velocity and acceleration of moving objects as visual features. Each frame of the video is segmented into regions based on motion and appearance cues using the QuickShift algorithm, which are then clustered over time using K-means, so as to obtain a spatiotemporal video segmentation. The video is represented by motion features computed over individual segments. The Mel-Frequency Cepstral Coefficients (MFCC) of the audio signal, and their first order derivatives are exploited to represent audio. The proposed framework assumes there is a non-trivial correlation between these audio features and the velocity and acceleration of the moving and sounding objects. The canonical correlation analysis (CCA) is utilized to identify the moving objects which are most correlated to the audio signal. In addition to moving-sounding object identification, the same framework is also exploited to solve the problem of audio-video synchronization, and is used to aid interactive segmentation. We evaluate the performance of our proposed method on challenging videos. Our experiments demonstrate significant increase in performance over the state-of-the-art both qualitatively and quantitatively, and validate the feasibility and superiority of our approach.

Journal ArticleDOI
TL;DR: A statistical technique to model and estimate the amount of reverberation and background noise variance in an audio recording is described and an energy-based voice activity detection method is proposed for automatic decaying-tail-selection from anaudio recording.
Abstract: An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of a room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique to model and estimate the amount of reverberation and background noise variance in an audio recording. An energy-based voice activity detection method is proposed for automatic decaying-tail-selection from an audio recording. Effectiveness of the proposed method is tested using a data set consisting of speech recordings. The performance of the proposed method is also evaluated for both speaker-dependent and speaker-independent scenarios.

Patent
30 Jul 2013
TL;DR: In this article, the authors describe a system, components and methods to determine acoustically positions of audios sources, such as vocal users, for providing audio spaces and spatial sound field reproduction for remote listeners.
Abstract: Embodiments of the invention relate generally to electrical and electronic hardware, computer software, wired and wireless network communications, and wearable computing devices to facilitate production and/or reproduction of a spatial sound field and/or one or more audio spaces. More specifically, disclosed are systems, components and methods to determine acoustically positions of audios sources, such as vocal users, for providing audio spaces and spatial sound field reproduction for remote listeners. In one embodiment, a media device includes a housing, transducers disposed in the housing to emit audible acoustic signals into a region including one or more audio sources, acoustic probe transducers configured to emit ultrasonic signals and acoustic sensors configured to sense received ultrasonic signals reflected from an audio source. A controller can determine a position of the audio source.

Journal ArticleDOI
TL;DR: To achieve real-time processing, independent of signal length, slice-wise processing of the full input signal is proposed and referred to as sliCQ transform, and overcomes computational inefficiency and lack of invertibility of classical constant-Q transform implementations.
Abstract: Audio signal processing frequently requires time-frequency representations and in many applications, a non-linear spacing of frequency bands is preferable. This paper introduces a framework for efficient implementation of invertible signal transforms allowing for non-uniform frequency resolution. Non-uniformity in frequency is realized by applying nonstationary Gabor frames with adaptivity in the frequency domain. The realization of a perfectly invertible constant-Q transform is described in detail. To achieve real-time processing, independent of signal length, slice-wise processing of the full input signal is proposed and referred to as sliCQ transform. By applying frame theory and FFT-based processing, the presented approach overcomes computational inefficiency and lack of invertibility of classical constant-Q transform implementations. Numerical simulations evaluate the efficiency of the proposed algorithm and the method's applicability is illustrated by experiments on real-life audio signals .

Proceedings ArticleDOI
26 May 2013
TL;DR: In this article, a probabilistic model based on a recurrent neural network was proposed to learn realistic output distributions given the input and devise an efficient algorithm to search for the global mode of that distribution.
Abstract: We investigate the problem of transforming an input sequence into a high-dimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions given the input and we devise an efficient algorithm to search for the global mode of that distribution. The resulting method produces musically plausible transcriptions even under high levels of noise and drastically outperforms previous state-of- the-art approaches on five datasets of synthesized sounds and real recordings, approximately halving the test error rate.

Journal ArticleDOI
TL;DR: A novel method to improve the sound event classification performance in severe mismatched noise conditions is proposed, based on the subband power distribution (SPD) image - a novel two-dimensional representation that characterizes the spectral power distribution over time in each frequency subband.
Abstract: The ability to automatically recognize a wide range of sound events in real-world conditions is an important part of applications such as acoustic surveillance and machine hearing. Our approach takes inspiration from both audio and image processing fields, and is based on transforming the sound into a two-dimensional representation, then extracting an image feature for classification. This provided the motivation for our previous work on the spectrogram image feature (SIF). In this paper, we propose a novel method to improve the sound event classification performance in severe mismatched noise conditions. This is based on the subband power distribution (SPD) image - a novel two-dimensional representation that characterizes the spectral power distribution over time in each frequency subband. Here, the high-powered reliable elements of the spectrogram are transformed to a localized region of the SPD, hence can be easily separated from the noise. We then extract an image feature from the SPD, using the same approach as for the SIF, and develop a novel missing feature classification approach based on a nearest neighbor classifier (kNN). We carry out comprehensive experiments on a database of 50 environmental sound classes over a range of challenging noise conditions. The results demonstrate that the SPD-IF is both discriminative over the broad range of sound classes, and robust in severe non-stationary noise.

Patent
02 Jul 2013
TL;DR: In this article, a method for echo reduction by an electronic device is described, which includes nulling at least one speaker and mixing a set of runtime audio signals based on the set of acoustic paths to determine a reference signal.
Abstract: A method for echo reduction by an electronic device is described. The method includes nulling at least one speaker. The method also includes mixing a set of runtime audio signals based on a set of acoustic paths to determine a reference signal. The method also includes receiving at least one composite audio signal that is based on the set of runtime audio signals. The method further includes reducing echo in the at least one composite audio signal based on the reference signal.

Patent
22 Oct 2013
TL;DR: In this article, the authors provide methods and systems for digitally processing audio signals, which can be used to convert an audio signal to a digital signal and then convert the digital signal to audio signals.
Abstract: The present invention provides methods and systems for digitally processing audio signals. Some embodiments receive an audio signal and converting it to a digital signal. The gain of the digital signal may be adjusted a first time, using a digital processing device located between a receiver and a driver circuit. The adjusted signal can be filtered with a first low shelf filter. The systems and methods may compress the filtered signal with a first compressor, process the signal with a graphic equalizer, and compress the processed signal with a second compressor. The gain of the compressed signal can be adjusted a second time. These may be done using the digital processing device. The signal may then be output through an amplifier and driver circuit to drive a personal audio listening device. In some embodiments, the systems and methods described herein may be part of the personal audio listening device.

Patent
21 Nov 2013
TL;DR: An audio processing system may selectively identify certain environmental sounds and play back these sounds or a representation of these sounds, in the vehicle's cabin this article, and then provide an audio alert to an occupant.
Abstract: An audio processing system may selectively identify certain environmental sounds and playing back these sounds, or a representation of these sounds, in the vehicle's cabin The audio processing system may filter the environmental sounds to identify a particular sound that matches an event such as a bouncing ball, squealing tires, footsteps, and the like The audio processing system may then provide an audio alert to an occupant in the vehicle For example, the system may process the identified sound (eg, amplify and/or isolate the sound) and use a speaker to output the processed sound into the interior of the vehicle In another embodiment, the audio processing system may use environmental sounds as an audio masking sound for creating privacy zones within the vehicle The audio processing system may filter the environmental sounds to identify a continuous sound which is then output to generate the privacy zones

Patent
18 Apr 2013
TL;DR: In this article, a method for signal level matching by an electronic device is described, which includes capturing a plurality of audio signals from multiple microphones and determining a difference signal based on an inter-microphone subtraction.
Abstract: A method for signal level matching by an electronic device is described. The method includes capturing a plurality of audio signals from a plurality of microphones. The method also includes determining a difference signal based on an inter-microphone subtraction. The difference signal includes multiple harmonics. The method also includes determining whether a harmonicity of the difference signal exceeds a harmonicity threshold. The method also includes preserving the harmonics to determine an envelope. The method further applies the envelope to a noise-suppressed signal.

Patent
Jie Su1, Samuel Oyetunji1
08 Mar 2013
TL;DR: In this article, a controller configured to be coupled to an audio speaker is presented, where the controller receives an audio input signal, and based on a displacement transfer function associated with the audio speaker, processes the audio input signals to generate an output audio signal communicated to the speaker.
Abstract: In accordance with these and other embodiments of the present disclosure, systems and methods may include a controller configured to be coupled to an audio speaker, wherein the controller receives an audio input signal, and based on a displacement transfer function associated with the audio speaker, processes the audio input signal to generate an output audio signal communicated to the audio speaker, wherein the displacement transfer function correlates an amplitude and a frequency of the audio input signal to an expected displacement of the audio speaker in response to the amplitude and the frequency of the audio input signal.

Proceedings Article
01 Jan 2013
TL;DR: It is demonstrated that specific degradations can reduce or even reverse the performance difference between two competing methods, and it is shown that performance strongly depends on the combination of method and degradation applied.
Abstract: We introduce the Audio Degradation Toolbox (ADT) for the controlled degradation of audio signals, and propose its usage as a means of evaluating and comparing the robustness of audio processing algorithms. Music recordings encountered in practical applications are subject to varied, sometimes unpredictable degradation. For example, audio is degraded by low-quality microphones, noisy recording environments, MP3 compression, dynamic compression in broadcasting or vinyl decay. In spite of this, no standard software for the degradation of audio exists, and music processing methods are usually evaluated against clean data. The ADT fills this gap by providing Matlab scripts that emulate a wide range of degradation types. We describe 14 degradation units, and how they can be chained to create more complex, ‘real-world’ degradations. The ADT also provides functionality to adjust existing ground-truth, correcting for temporal distortions introduced by degradation. Using four different music informatics tasks, we show that performance strongly depends on the combination of method and degradation applied. We demonstrate that specific degradations can reduce or even reverse the performance difference between two competing methods. ADT source code, sounds, impulse responses and definitions are freely available for download.

Patent
10 Jan 2013
TL;DR: In this paper, a sound processing system suitable for use in a vehicle having multiple acoustic zones includes a plurality of microphone In-Car Communication (Mic-ICC) instances coupled and a multiplicity of loudspeaker In-car Communication (Ls-ICC) instances.
Abstract: An In-Car Communication (ICC) system supports the communication paths within a car by receiving the speech signals of a speaking passenger and playing it back for one or more listening passengers. Signal processing tasks are split into a microphone related part and into a loudspeaker related part. A sound processing system suitable for use in a vehicle having multiple acoustic zones includes a plurality of microphone In-Car Communication (Mic-ICC) instances coupled and a plurality of loudspeaker In-Car Communication (Ls-ICC) instances. The system further includes a dynamic audio routing matrix with a controller and coupled to the Mic-ICC instances, a mixer coupled to the plurality of Mic-ICC instances and a distributor coupled to the Ls-ICC instances.