scispace - formally typeset
Search or ask a question

Showing papers on "Audio signal processing published in 2017"


Journal ArticleDOI
TL;DR: This paper proposes to analyze a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering.
Abstract: Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. In addition, they are crucial preprocessing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multichannel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this paper, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: 1 the acoustic impulse response model, 2 the spatial filter design criterion, 3 the parameter estimation algorithm, and 4 optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.

452 citations


Journal ArticleDOI
TL;DR: An innovative audio-based system for activity analysis (and tracking) of construction heavy equipment that consists of multiple steps including filtering the audio signals, converting them into time-frequency representations, and window filtering the output of the classifier to differentiating between different patterns of activities.

89 citations


Journal ArticleDOI
TL;DR: A new architecture, design flow, and field-programmable gate array (FPGA) implementation analysis of a neuromorphic binaural auditory sensor, designed completely in the spike domain, is presented, allowing researchers to implement their own parameterized neuromorphic auditory systems in a low-cost FPGA in order to study the audio processing and learning activity that takes place in the brain.
Abstract: This paper presents a new architecture, design flow, and field-programmable gate array (FPGA) implementation analysis of a neuromorphic binaural auditory sensor, designed completely in the spike domain. Unlike digital cochleae that decompose audio signals using classical digital signal processing techniques, the model presented in this paper processes information directly encoded as spikes using pulse frequency modulation and provides a set of frequency-decomposed audio information using an address-event representation interface. In this case, a systematic approach to design led to a generic process for building, tuning, and implementing audio frequency decomposers with different features, facilitating synthesis with custom features. This allows researchers to implement their own parameterized neuromorphic auditory systems in a low-cost FPGA in order to study the audio processing and learning activity that takes place in the brain. In this paper, we present a 64-channel binaural neuromorphic auditory system implemented in a Virtex-5 FPGA using a commercial development board. The system was excited with a diverse set of audio signals in order to analyze its response and characterize its features. The neuromorphic auditory system response times and frequencies are reported. The experimental results of the proposed system implementation with 64-channel stereo are: a frequency range between 9.6 Hz and 14.6 kHz (adjustable), a maximum output event rate of 2.19 Mevents/s, a power consumption of 29.7 mW, the slices requirements of 11141, and a system clock frequency of 27 MHz.

66 citations


Journal ArticleDOI
TL;DR: The FRQA representation is a normalized state that facilitates basic audio signal operations targeting the amplitude and time parameters and can be employed as the major components to build advanced operations for specific applications as well as facilitate secure transmission of audio content in the quantum computing domain.

61 citations


Posted Content
TL;DR: Kre implements time-frequency conversions, normalisation, and data augmentation as Keras layers for audio and music signal preprocessing and reports simple benchmark results, showing real-time on-GPU preprocessing adds a reasonable amount of computation.
Abstract: We introduce Kapre, Keras layers for audio and music signal preprocessing. Music research using deep neural networks requires a heavy and tedious preprocessing stage, for which audio processing parameters are often ignored in parameter optimisation. To solve this problem, Kapre implements time-frequency conversions, normalisation, and data augmentation as Keras layers. We report simple benchmark results, showing real-time on-GPU preprocessing adds a reasonable amount of computation.

57 citations


Proceedings Article
17 Feb 2017
TL;DR: In this article, the authors proposed a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks, which is trained on pairs of low and high-quality audio examples and predicts missing samples within a low-resolution signal in an interpolation process similar to image super-resolution.
Abstract: We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks. Our model is trained on pairs of low and high-quality audio examples; at test-time, it predicts missing samples within a low-resolution signal in an interpolation process similar to image super-resolution. Our method is simple and does not involve specialized audio processing techniques; in our experiments, it outperforms baselines on standard speech and music benchmarks at upscaling ratios of 2x, 4x, and 6x. The method has practical applications in telephony, compression, and text-to-speech generation; it demonstrates the effectiveness of feed-forward convolutional architectures on an audio generation task.

54 citations


Posted Content
TL;DR: In this paper, the authors proposed a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks, which is trained on pairs of low and high-quality audio examples and predicts missing samples within a low-resolution signal in an interpolation process similar to image super-resolution.
Abstract: We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks. Our model is trained on pairs of low and high-quality audio examples; at test-time, it predicts missing samples within a low-resolution signal in an interpolation process similar to image super-resolution. Our method is simple and does not involve specialized audio processing techniques; in our experiments, it outperforms baselines on standard speech and music benchmarks at upscaling ratios of 2x, 4x, and 6x. The method has practical applications in telephony, compression, and text-to-speech generation; it demonstrates the effectiveness of feed-forward convolutional architectures on an audio generation task.

40 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: It is demonstrated that auditory and visual information play complementary roles in object perception, and further, that the representation learned on synthetic audio-visual data can transfer to real-world scenarios.
Abstract: Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging, due to the great difficulty in capturing large-scale, clean data of objects with both their appearance and the sound they make. In this paper, we present a novel, open-source pipeline that generates audiovisual data, purely from 3D object shapes and their physical properties. Through comparison with audio recordings and human behavioral studies, we validate the accuracy of the sounds it generates. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We demonstrate that auditory and visual information play complementary roles in object perception, and further, that the representation learned on synthetic audio-visual data can transfer to real-world scenarios.

34 citations


Journal ArticleDOI
TL;DR: A cascaded delay line reservoir computer capable of real-time audio processing on standard computing equipment, aimed at black-box system identification of nonlinear audio systems.
Abstract: Background: Real-time processing of audio or audio-like signals is a promising research topic for the field of machine learning, with many potential applications in music and communications. We present a cascaded delay line reservoir computer capable of real-time audio processing on standard computing equipment, aimed at black-box system identification of nonlinear audio systems. The cascaded reservoir blocks use two-pole filtered virtual neurons to match their timescales to that of the target signals. The reservoir blocks receive both the global input signal and the target estimate from the previous block (local input). The units in the cascade are trained in a successive manner on a single input output training pair, such that a successively better approximation of the target is reached. A cascade of 5 dual-input reservoir blocks of 100 neurons each is trained to mimic the distortion of a measured guitar amplifier. This cascade outperforms both a single delay reservoir having the same total number of neurons as well as a cascade with only single-input blocks. We show that the presented structure is a viable platform for real-time audio applications on present-day computing hardware. A benefit of this structure is that it works directly from the audio samples as input, avoiding computationally intensive preprocessing.

33 citations


Journal ArticleDOI
TL;DR: An overview of perceptually motivated techniques is presented, with a focus on multichannel audio recording and reproduction, audio source and reflection culling, and artificial reverberators.
Abstract: Developments in immersive audio technologies have been evolving in two directions: physically motivated systems and perceptually motivated systems. Physically motivated techniques aim to reproduce a physically accurate approximation of desired sound fields by employing a very high equipment load and sophisticated, computationally intensive algorithms. Perceptually motivated techniques, however, aim to render only the perceptually relevant aspects of the sound scene by means of modest computational and equipment load. This article presents an overview of perceptually motivated techniques, with a focus on multichannel audio recording and reproduction, audio source and reflection culling, and artificial reverberators.

32 citations


Book ChapterDOI
TL;DR: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics by focusing on frame theory in a filter bank approach, which is probably the most relevant view point for audio signal processing.
Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A new notion of equivalence of feature-network pairs is introduced and the relation of feature and networks for the example of mel-spectrogram input on the one hand and varying analysis windows on the other hand is shown.
Abstract: Convolutional Neural Networks have established a new standard in many machine learning applications not only in image but also in audio processing. In this contribution we investigate the interplay between the primary representation mapping a raw audio signal to some kind of image (feature) and the convolutional layers of an ensuing neural network. We introduce a new notion of equivalence of feature-network pairs and show the relation of feature and networks for the example of mel-spectrogram input on the one hand and varying analysis windows on the other hand.

Journal ArticleDOI
TL;DR: The design and implementation of a wireless acoustic sensor to be located at the edge of a WASN for recording and processing environmental sounds which can be applied to AAL systems for personal healthcare because it has the following significant advantages: low cost, small size, audio sampling and computation capabilities for audio processing.
Abstract: Ambient Assisted Living (AAL) has become an attractive research topic due to growing interest in remote monitoring of older people. Development in sensor technologies and advances in wireless communications allows to remotely offer smart assistance and monitor those people at their own home, increasing their quality of life. In this context, Wireless Acoustic Sensor Networks (WASN) provide a suitable way for implementing AAL systems which can be used to infer hazardous situations via environmental sounds identification. Nevertheless, satisfying sensor solutions have not been found with the considerations of both low cost and high performance. In this paper, we report the design and implementation of a wireless acoustic sensor to be located at the edge of a WASN for recording and processing environmental sounds which can be applied to AAL systems for personal healthcare because it has the following significant advantages: low cost, small size, audio sampling and computation capabilities for audio processing. The proposed wireless acoustic sensor is able to record audio samples at least to 10 kHz sampling frequency and 12-bit resolution. Also, it is capable of doing audio signal processing without compromising the sample rate and the energy consumption by using a new microcontroller released at the last quarter of 2016. The proposed low cost wireless acoustic sensor has been verified using four randomness tests for doing statistical analysis and a classification system of the recorded sounds based on audio fingerprints.

Patent
02 Mar 2017
TL;DR: In this article, a method and an apparatus for adjusting delay and gain parameters for calibrating a multichannel audio system to which a plurality of loudspeakers is connected is presented.
Abstract: A method and an apparatus for adjusting delay and gain parameters for calibrating a multichannel audio system to which a plurality of loudspeakers is connected A calibration process includes emitting a plurality of test tones by an audio processing device on a plurality of loudspeakers with predetermined timings and amplitude levels, according to a calibration signal A calibration device having a microphone captures the audio signal corresponding to the test tones from the listener's position The captured audio signal is analyzed, either by the calibration device or the audio processing device, to determine the delays between loudspeakers and difference of amplitude levels between loudspeakers Corresponding delay and gain parameters are determined and used by the audio processing device to correct the sound to be played back A calibration device and an audio processing device implementing the method are disclosed as well as a calibration signal utilized in the calibration process

Patent
10 May 2017
TL;DR: In this paper, an audio signal processing equipment consisting of a microphone array, an audio localization device, a camera, an image localization device and a sound source classifier is described.
Abstract: The invention discloses audio signal processing equipment and an audio signal processing method as well as electronic equipment The audio signal processing equipment comprises a microphone array, an audio localization device, a camera, an image localization device and a sound source classifier, wherein the microphone array comprises a plurality of directional microphones having different sound pickup areas; the audio localization device is used for identifying a first group of sound sources and for determining position of each sound source in an audio coordinate system; the camera is used for capturing scene images of a current scene, wherein the current scene at least covers the sound pickup areas of the plurality of directional microphones; the image localization device is used for identifying a second group of sound sources and for determining position of each sound source in an image coordinate system; and the sound source classifier is used for classifying each sound source in the first and second groups of sound sources in accordance with a registration relation between audio and the image coordinate system, the position of each sound source in the audio coordinate system as well as the position of each sound source in the image coordinate system Therefore, the precise classification of the sound sources can be achieved on the basis of double localization of the directional microphones and the camera

Patent
Wang Zhijian1
13 Jun 2017
TL;DR: In this article, an audio processing method and device based on artificial intelligence is described, which includes the steps that an audio file to be processed is converted into an image to process; the content feature of the image to be extracted; according to the style features and the content features, a target image is determined, and the style feature is obtained from a template image converted from a Template audio file; the target image was converted into the processed audio file.
Abstract: The invention discloses an audio processing method and device based on artificial intelligence. One concrete implement mode of the method includes the steps that an audio file to be processed is converted into an image to be processed; the content feature of the image to be processed is extracted; according to the style feature and the content feature of the image to be processed, a target image is determined, and the style feature is obtained from a template image converted from a template audio file; the target image is converted into the processed audio file. By means of the implement mode, the processed audio file has the template audio style without changing the content of the audio file to be processed, and audio processing efficiency and flexibility are improved.

Journal ArticleDOI
TL;DR: 3D audio enhances BAQ as well as OLE over both stereo and surround sound, and the BAQ- and OLE-based assessments turned out to deliver consistent and reliable results.
Abstract: During the past decades, spatial reproduction of audio signals has evolved from simple two-channel stereo to surround sound (e.g., 5.1 or 7.1) and, more recently, to three-dimensional (3D) sound including height speakers, such as 9.1 or 22.2. With increasing number of speakers, increasing spatial fidelity and listener envelopment are expected. This paper reviews popular methods for subjective assessment of audio. Moreover, it provides an experimental evaluation of the subjective quality provided by these formats, contrasting the well-known basic audio quality (BAQ) type of evaluation with the more recent evaluation of overall listening experience (OLE). Commonalities and differences in findings between both assessment approaches are discussed. The results of the evaluation indicate that 3D audio enhances BAQ as well as OLE over both stereo and surround sound. Furthermore, the BAQ- and OLE-based assessments turned out to deliver consistent and reliable results.

Proceedings Article
11 Apr 2017
TL;DR: This paper proposes using deep neural network (DNN) to learn the relationship between the noisy speech features and the correct VAD decision, and exceeds the classic, statistical model based VAD for both seen and unseen noises.
Abstract: Voice Activity Detectors (VAD) are important components in audio processing algorithms. In general, VADs are two way classifiers, flagging the audio frames where we have voice activity. Most of them are based on the signal energy and build statistical models of the noise background and the speech signal. In the process of derivation, we are limited to simplified statistical models and this limits the accuracy of the classification. Using more precise, but also more complex, statistical models makes the analytical derivation of the solution practically impossible. In this paper, we propose using deep neural network (DNN) to learn the relationship between the noisy speech features and the correct VAD decision. In most of the cases we need a causal algorithm, i.e. working in real time and using only current and past audio samples. This is why we use audio segments that consist only of current and previous audio frames, thus making possible real-time implementations. The proposed algorithm and DNN structure exceeds the classic, statistical model based VAD for both seen and unseen noises.

Journal ArticleDOI
TL;DR: Feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux, which highlighted the importance of source separation in the feature extraction.
Abstract: By varying the dynamics in a musical performance, the musician can convey structure and different expressions. Spectral properties of most musical instruments change in a complex way with the performed dynamics, but dedicated audio features for modeling the parameter are lacking. In this study, feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux. Previously, ground truths ratings of performed dynamics had been collected by asking listeners to rate how soft/loud the musicians played in a set of audio files. The ratings, averaged over subjects, were used to train three different machine learning models, using the audio features developed for the study as input. The highest result was produced from an ensemble of multilayer perceptrons with an R2 of 0.84. This result seems to be close to the upper bound, given the estimated uncertainty of the ground truth data. The result is well above that of individual human listeners of the previous listening experiment, and on par with the performance achieved from the average rating of six listeners. Features were analyzed with a factorial design, which highlighted the importance of source separation in the feature extraction.

Posted Content
TL;DR: It is demonstrated that deep neural networks that use lower bit precision significantly reduce the processing time (up to 30x), however, their performance impact is low only in the case of classification tasks such as those present in voice activity detection.
Abstract: While deep neural networks have shown powerful performance in many audio applications, their large computation and memory demand has been a challenge for real-time processing. In this paper, we study the impact of scaling the precision of neural networks on the performance of two common audio processing tasks, namely, voice-activity detection and single-channel speech enhancement. We determine the optimal pair of weight/neuron bit precision by exploring its impact on both the performance and processing time. Through experiments conducted with real user data, we demonstrate that deep neural networks that use lower bit precision significantly reduce the processing time (up to 30x). However, their performance impact is low (< 3.14%) only in the case of classification tasks such as those present in voice activity detection.

Patent
01 Feb 2017
TL;DR: In this paper, a method performed in an audio decoder for decoding M encoded audio channels representing N audio channels is disclosed, which includes analyzing the M audio channels to detect a location of a transient.
Abstract: A method performed in an audio decoder for decoding M encoded audio channels representing N audio channels is disclosed. The method includes receiving a bitstream containing the M encoded audio channels and a set of spatial parameters, decoding the M encoded audio channels, and extracting the set of spatial parameters from the bitstream. The method also includes analyzing the M audio channels to detect a location of a transient, decorrelating the M audio channels, and deriving N audio channels from the M audio channels and the set of spatial parameters. A first decorrelation technique is applied to a first subset of each audio channel and a second decorrelation technique is applied to a second subset of each audio channel. The first decorrelation technique represents a first mode of operation of a decorrelator, and the second decorrelation technique represents a second mode of operation of the decorrelator.

Patent
07 Dec 2017
TL;DR: In this paper, a system and method executed by audio processing software on one or more electronic devices in a computer system to process digital audio signals is described, where the system comprises a digitizer for digitizing a received audio signal; and processor for performing a plurality of audio processing functions on the digitized audio signals.
Abstract: A system and method executed by audio processing software on one or more electronic devices in a computer system to process digital audio signals. The system comprises a digitizer for digitizing a received audio signal; and processor for performing a plurality of audio processing functions on the digitized audio signals, each of the audio processing functions having at least one programmable parameter, and wherein each of the audio processing functions are categorized and grouped as audio objects, and organized into a channel strip, the channel strip processing digitized audio signals for a particular received audio signal, and wherein, the audio objects are fixed in order, so that the digitized received audio signals are processed by a predefined number of N audio objects, and wherein the N audio objects occur in a fixed sequence, and further wherein, the N audio objects comprise a first subset of non-exchangeable audio objects and a second subset of exchangeable audio objects, such that any one or more of the second subset of audio objects can be exchanged by a replacement audio object, and further wherein when the audio processing functions are programmed, they can be saved without compiling the audio processing software.

Patent
25 May 2017
TL;DR: In this article, the authors provide input and output mode control for audio processing on a user device by monitoring audio activity on a device having at least one microphone and a digital audio processing unit, collecting information from the monitoring of the activity, and determining a context for the audio processing.
Abstract: Systems and methods provide input and output mode control for audio processing on a user device. Audio processing may be configured by monitoring audio activity on a device having at least one microphone and a digital audio processing unit, collecting information from the monitoring of the activity, including an identification of at least one application utilizing audio processing, and determining a context for the audio processing, the context including at least one context resource having associated metadata. An audio configuration is determined based on the application and determined context, and an action is performed to control the audio processing mode. User controls providing addition mode control may be displayed automatically based on a current application and determined context.

Journal ArticleDOI
TL;DR: In this paper, calibrated features based on re-embedding technique improve performance of audio steganalysis and show that least significant bit is the most sensitive bit plane to data hiding algorithms, and therefore it can be employed as a universal embedding method.
Abstract: Calibration and higher-order statistics are standard components of image steganalysis. However, these techniques have not yet found adequate attention in audio steganalysis. Specifically, most of current studies are either non-calibrated or only based on noise removal. The goal of this study is to fill these gaps and to show that calibrated features based on re-embedding technique improve performance of audio steganalysis. Furthermore, the authors show that least significant bit is the most sensitive bit plane to data hiding algorithms, and therefore it can be employed as a universal embedding method. The proposed features also benefit from an efficient model which is tailored to the needs for audio steganalysis and represent the maximum deviation from human auditory system. Performance of the proposed method is evaluated on a wide range of data hiding algorithms in both targeted and universal paradigms. The results show the effectiveness of the proposed method in detecting the finest traces of data hiding algorithms in very low embedding rates. The system detects Steghide at capacity of 0.06 bit per symbol with sensitivity of 98.6% (music) and 78.5% (speech). These figures are, respectively, 7.1% and 27.5% higher than the state-of-the-art results based on R-Mel-frequency cepstral coefficient features.

Proceedings ArticleDOI
05 Jun 2017
TL;DR: The complete stand-alone system achieves 38mins of speech recording and energy-autonomous operation in room light and 4.7μW audio processing IC performs audio acquisition with 4–32× compression.
Abstract: We present a complete, fully functional energy-autonomous audio sensor node with 6×5×4mm3 form factor. The system uses a new audio processing IC integrated with a MEMS microphone, general purpose 32-bit processor, 8Mb Flash, RF transceiver with custom 3D antenna, PV cells for energy harvesting and battery. The 4.7μW audio processing IC performs audio acquisition with 4–32× compression. The complete stand-alone system achieves 38mins of speech recording and energy-autonomous operation in room light.

Journal ArticleDOI
TL;DR: Experimental results show that the transparency and imperceptibility of the proposed algorithm is satisfied, and that robustness is strong against popular audio signal processing attacks.
Abstract: Digital watermarking technology is concerned with solving the problem of copyright protection, data authentication, content identification, distribution, and duplication of the digital media due to the great developments in computers and Internet technology Recently, protection of digital audio signals has attracted the attention of researchers This paper proposes a new audio watermarking scheme based on discrete wavelet transform (DWT), singular value decomposition (SVD), and quantization index modulation (QIM) with a synchronization code embedded with two encrypted watermark images or logos inserted into a stereo audio signal In this algorithm, the original audio signal is split into blocks, and each block is decomposed with a two-level DWT, and then the approximate low-frequency sub-band coefficients are decomposed by SVD transform to obtain a diagonal matrix The prepared watermarking and synchronization code bit stream is embedded into the diagonal matrix using QIM After that, we perform inverse singular value decomposition (ISVD) and inverse discrete wavelet transform (IDWT) to obtain the watermarked audio signal The watermark can be blindly extracted without knowledge of the original audio signal Experimental results show that the transparency and imperceptibility of the proposed algorithm is satisfied, and that robustness is strong against popular audio signal processing attacks High watermarking payload is achieved through the proposed scheme

Patent
29 Aug 2017
TL;DR: In this article, a personalization processor receives user information and outputs a binaural parameter for controlling Binaural rendering on the basis of the user information, which is then used by a renderer.
Abstract: Disclosed is an audio signal processing apparatus. A personalization processor receives user information and outputs a binaural parameter for controlling binaural rendering on the basis of the user information. A binaural renderer performs binaural rendering of a source audio on the basis of the binaural parameter.

Patent
20 Jun 2017
TL;DR: In this article, an audio signal processing method, an apparatus and a terminal thereof, is described, which comprises the following steps of collecting audio signals played by a sound play apparatus; according to the audio signals, analyzing a sound effect of the sound play device; and adjusting a processing parameter of an audio signals processing module in the terminal so as to correct the sound effect.
Abstract: The invention provides an audio signal processing method, an apparatus and a terminal thereof. The method comprises the following steps of collecting audio signals played by a sound play apparatus; according to the audio signals, analyzing a sound effect of the sound play apparatus; according to the sound effect, adjusting a processing parameter of an audio signal processing module in the terminal so as to correct the sound effect. In an embodiment of the invention, according to the sound effect played by the loudspeaker or a telephone receiver, the processing parameter in the audio processing module is automatically adjusted so that each sound source possesses a corresponding optimal processing parameter; and therefore, when any sound sources are played, the sound effect played by the loudspeaker or the telephone receiver is in an optimal state.

Journal ArticleDOI
TL;DR: The IB principle is first concisely presented, and then the practical issues related to the application of IB principle to acoustic event detection are described in detail, including definitions of various variables, criterion for determining the number of acoustic events, tradeoff between amount of information preserved and compression of the initial representation, and detection steps.

Patent
25 Apr 2017
TL;DR: In this article, a method of wireless audio transmission and playback includes steps of dividing the audio data into audio segments, transmitting the audio segments to each of audio playback devices, and determining, by the host based on the acknowledgment(s) thus received, that at least one audio playback device has received a first specific audio segment.
Abstract: A method of wireless audio transmission and playback includes steps of: a) dividing, by a host, the audio data into audio segments; b) transmitting, by the host, the audio segments to each of audio playback devices; c) transmitting to the host, by each of the audio playback devices, with respect to each of the audio segments received thereby, an acknowledgment indicating that the audio playback device has received the audio segment; and d) when determining, by the host based on the acknowledgment(s) thus received, that at least one of the audio playback devices has received a first specific audio segment, controlling all of the audio playback devices having received the first audio segment to play the first audio segment synchronously with each other.