Showing papers on "Audio signal processing published in 2017"

PDF

Open Access

Journal Article•DOI•

A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation

[...]

Sharon Gannot¹, Emmanuel Vincent², Shmulik Markovich-Golan¹, Alexey Ozerov•Institutions (2)

Bar-Ilan University¹, French Institute for Research in Computer Science and Automation²

01 Apr 2017-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper proposes to analyze a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering.

...read moreread less

Abstract: Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. In addition, they are crucial preprocessing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multichannel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this paper, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: 1 the acoustic impulse response model, 2 the spatial filter design criterion, 3 the parameter estimation algorithm, and 4 optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.

...read moreread less

452 citations

Journal Article•DOI•

Activity analysis of construction equipment using audio signals and support vector machines

[...]

Chieh-Feng Cheng¹, Abbas Rashidi², Mark A. Davenport¹, David V. Anderson¹•Institutions (2)

Georgia Institute of Technology¹, University of Utah²

01 Sep 2017-Automation in Construction

TL;DR: An innovative audio-based system for activity analysis (and tracking) of construction heavy equipment that consists of multiple steps including filtering the audio signals, converting them into time-frequency representations, and window filtering the output of the classifier to differentiating between different patterns of activities.

...read moreread less

89 citations

Journal Article•DOI•

A Binaural Neuromorphic Auditory Sensor for FPGA: A Spike Signal Processing Approach

[...]

Angel Jimenez-Fernandez¹, Elena Cerezuela-Escudero¹, L. Miro-Amarante¹, Manuel Jesus Dominguez-Moralse¹, F. Gomez-Rodriguez¹, Alejandro Linares-Barranco¹, Gabriel Jimenez-Moreno¹ - Show less +3 more•Institutions (1)

University of Seville¹

01 Apr 2017-IEEE Transactions on Neural Networks

TL;DR: A new architecture, design flow, and field-programmable gate array (FPGA) implementation analysis of a neuromorphic binaural auditory sensor, designed completely in the spike domain, is presented, allowing researchers to implement their own parameterized neuromorphic auditory systems in a low-cost FPGA in order to study the audio processing and learning activity that takes place in the brain.

...read moreread less

Abstract: This paper presents a new architecture, design flow, and field-programmable gate array (FPGA) implementation analysis of a neuromorphic binaural auditory sensor, designed completely in the spike domain. Unlike digital cochleae that decompose audio signals using classical digital signal processing techniques, the model presented in this paper processes information directly encoded as spikes using pulse frequency modulation and provides a set of frequency-decomposed audio information using an address-event representation interface. In this case, a systematic approach to design led to a generic process for building, tuning, and implementing audio frequency decomposers with different features, facilitating synthesis with custom features. This allows researchers to implement their own parameterized neuromorphic auditory systems in a low-cost FPGA in order to study the audio processing and learning activity that takes place in the brain. In this paper, we present a 64-channel binaural neuromorphic auditory system implemented in a Virtex-5 FPGA using a commercial development board. The system was excited with a diverse set of audio signals in order to analyze its response and characterize its features. The neuromorphic auditory system response times and frequencies are reported. The experimental results of the proposed system implementation with 64-channel stereo are: a frequency range between 9.6 Hz and 14.6 kHz (adjustable), a maximum output event rate of 2.19 Mevents/s, a power consumption of 29.7 mW, the slices requirements of 11141, and a system clock frequency of 27 MHz.

...read moreread less

66 citations

Journal Article•DOI•

Flexible representation and manipulation of audio signals on quantum computers

[...]

Fei Yan¹, Abdullah M. Iliyasu², Yiming Guo¹, Huamin Yang¹•Institutions (2)

Changchun University of Science and Technology¹, Salman bin Abdulaziz University²

01 Dec 2017-Theoretical Computer Science

TL;DR: The FRQA representation is a normalized state that facilitates basic audio signal operations targeting the amplitude and time parameters and can be employed as the major components to build advanced operations for specific applications as well as facilitate secure transmission of audio content in the quantum computing domain.

...read moreread less

61 citations

Posted Content•

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

[...]

Keunwoo Choi, Deokjin Joo, Juho Kim

19 Jun 2017-arXiv: Sound

TL;DR: Kre implements time-frequency conversions, normalisation, and data augmentation as Keras layers for audio and music signal preprocessing and reports simple benchmark results, showing real-time on-GPU preprocessing adds a reasonable amount of computation.

...read moreread less

Abstract: We introduce Kapre, Keras layers for audio and music signal preprocessing. Music research using deep neural networks requires a heavy and tedious preprocessing stage, for which audio processing parameters are often ignored in parameter optimisation. To solve this problem, Kapre implements time-frequency conversions, normalisation, and data augmentation as Keras layers. We report simple benchmark results, showing real-time on-GPU preprocessing adds a reasonable amount of computation.

...read moreread less

57 citations

Proceedings Article•

Audio Super-Resolution using Neural Networks

[...]

Volodymyr Kuleshov¹, S. Zayd Enam, Stefano Ermon¹•Institutions (1)

Stanford University¹

17 Feb 2017

TL;DR: In this article, the authors proposed a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks, which is trained on pairs of low and high-quality audio examples and predicts missing samples within a low-resolution signal in an interpolation process similar to image super-resolution.

...read moreread less

Abstract: We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks. Our model is trained on pairs of low and high-quality audio examples; at test-time, it predicts missing samples within a low-resolution signal in an interpolation process similar to image super-resolution. Our method is simple and does not involve specialized audio processing techniques; in our experiments, it outperforms baselines on standard speech and music benchmarks at upscaling ratios of 2x, 4x, and 6x. The method has practical applications in telephony, compression, and text-to-speech generation; it demonstrates the effectiveness of feed-forward convolutional architectures on an audio generation task.

...read moreread less

54 citations

Posted Content•

Audio Super Resolution using Neural Networks

[...]

Volodymyr Kuleshov¹, S. Zayd Enam, Stefano Ermon¹•Institutions (1)

Stanford University¹

02 Aug 2017-arXiv: Sound

TL;DR: In this paper, the authors proposed a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks, which is trained on pairs of low and high-quality audio examples and predicts missing samples within a low-resolution signal in an interpolation process similar to image super-resolution.

...read moreread less

40 citations

Proceedings Article•DOI•

Generative Modeling of Audible Shapes for Object Perception

[...]

Zhoutong Zhang¹, Jiajun Wu¹, Qiujia Li², Zhengjia Huang, James Traer¹, Josh H. McDermott¹, Joshua B. Tenenbaum¹, William T. Freeman³ - Show less +4 more•Institutions (3)

Massachusetts Institute of Technology¹, University of Cambridge², Google³

01 Oct 2017

TL;DR: It is demonstrated that auditory and visual information play complementary roles in object perception, and further, that the representation learned on synthetic audio-visual data can transfer to real-world scenarios.

...read moreread less

Abstract: Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging, due to the great difficulty in capturing large-scale, clean data of objects with both their appearance and the sound they make. In this paper, we present a novel, open-source pipeline that generates audiovisual data, purely from 3D object shapes and their physical properties. Through comparison with audio recordings and human behavioral studies, we validate the accuracy of the sounds it generates. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We demonstrate that auditory and visual information play complementary roles in object perception, and further, that the representation learned on synthetic audio-visual data can transfer to real-world scenarios.

...read moreread less

34 citations

Journal Article•DOI•

Real-time Audio Processing with a Cascade of Discrete-Time Delay Line-Based Reservoir Computers

[...]

Lars Keuninckx¹, Jan Danckaert¹, Guy Van der Sande¹•Institutions (1)

Vrije Universiteit Brussel¹

07 Mar 2017-Cognitive Computation

TL;DR: A cascaded delay line reservoir computer capable of real-time audio processing on standard computing equipment, aimed at black-box system identification of nonlinear audio systems.

...read moreread less

Abstract: Background: Real-time processing of audio or audio-like signals is a promising research topic for the field of machine learning, with many potential applications in music and communications. We present a cascaded delay line reservoir computer capable of real-time audio processing on standard computing equipment, aimed at black-box system identification of nonlinear audio systems. The cascaded reservoir blocks use two-pole filtered virtual neurons to match their timescales to that of the target signals. The reservoir blocks receive both the global input signal and the target estimate from the previous block (local input). The units in the cascade are trained in a successive manner on a single input output training pair, such that a successively better approximation of the target is reached. A cascade of 5 dual-input reservoir blocks of 100 neurons each is trained to mimic the distortion of a measured guitar amplifier. This cascade outperforms both a single delay reservoir having the same total number of neurons as well as a cascade with only single-input blocks. We show that the presented structure is a viable platform for real-time audio applications on present-day computing hardware. A benefit of this structure is that it works directly from the audio samples as input, avoiding computationally intensive preprocessing.

...read moreread less

33 citations

Journal Article•DOI•

Perceptual Spatial Audio Recording, Simulation, and Rendering: An overview of spatial-audio techniques based on psychoacoustics

[...]

Huseyin Hacihabiboglu¹, Enzo De Sena², Zoran Cvetkovic³, James Johnston, Julius O. Smith⁴ - Show less +1 more•Institutions (4)

Middle East Technical University¹, Katholieke Universiteit Leuven², King's College London³, Stanford University⁴

25 Apr 2017-IEEE Signal Processing Magazine

TL;DR: An overview of perceptually motivated techniques is presented, with a focus on multichannel audio recording and reproduction, audio source and reflection culling, and artificial reverberators.

...read moreread less

Abstract: Developments in immersive audio technologies have been evolving in two directions: physically motivated systems and perceptually motivated systems. Physically motivated techniques aim to reproduce a physically accurate approximation of desired sound fields by employing a very high equipment load and sophisticated, computationally intensive algorithms. Perceptually motivated techniques, however, aim to render only the perceptually relevant aspects of the sound scene by means of modest computational and equipment load. This article presents an overview of perceptually motivated techniques, with a focus on multichannel audio recording and reproduction, audio source and reflection culling, and artificial reverberators.

...read moreread less

32 citations

Book Chapter•DOI•

Frame Theory for Signal Processing in Psychoacoustics

[...]

Peter Balazs¹, Nicki Holighaus¹, Thibaud Necciari¹, Diana T. Stoeva¹•Institutions (1)

Austrian Academy of Sciences¹

01 Jan 2017-arXiv: Sound

TL;DR: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics by focusing on frame theory in a filter bank approach, which is probably the most relevant view point for audio signal processing.

...read moreread less

Abstract: This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field.

...read moreread less

Proceedings Article•DOI•

Inside the spectrogram: Convolutional Neural Networks in audio processing

[...]

Monika Dörfler, Roswitha Bammer, Thomas Grill¹•Institutions (1)

Austrian Research Institute for Artificial Intelligence¹

01 Jul 2017

TL;DR: A new notion of equivalence of feature-network pairs is introduced and the relation of feature and networks for the example of mel-spectrogram input on the one hand and varying analysis windows on the other hand is shown.

...read moreread less

Abstract: Convolutional Neural Networks have established a new standard in many machine learning applications not only in image but also in audio processing. In this contribution we investigate the interplay between the primary representation mapping a raw audio signal to some kind of image (feature) and the convolutional layers of an ensuing neural network. We introduce a new notion of equivalence of feature-network pairs and show the relation of feature and networks for the example of mel-spectrogram input on the one hand and varying analysis windows on the other hand.

...read moreread less

Journal Article•DOI•

A Low Cost Wireless Acoustic Sensor for Ambient Assisted Living Systems

[...]

Miguel A. Quintana-Suárez, David Sánchez-Rodríguez, Itziar Alonso-González, Jesús B. Alonso-Hernández

27 Aug 2017-Applied Sciences

TL;DR: The design and implementation of a wireless acoustic sensor to be located at the edge of a WASN for recording and processing environmental sounds which can be applied to AAL systems for personal healthcare because it has the following significant advantages: low cost, small size, audio sampling and computation capabilities for audio processing.

...read moreread less

Abstract: Ambient Assisted Living (AAL) has become an attractive research topic due to growing interest in remote monitoring of older people. Development in sensor technologies and advances in wireless communications allows to remotely offer smart assistance and monitor those people at their own home, increasing their quality of life. In this context, Wireless Acoustic Sensor Networks (WASN) provide a suitable way for implementing AAL systems which can be used to infer hazardous situations via environmental sounds identification. Nevertheless, satisfying sensor solutions have not been found with the considerations of both low cost and high performance. In this paper, we report the design and implementation of a wireless acoustic sensor to be located at the edge of a WASN for recording and processing environmental sounds which can be applied to AAL systems for personal healthcare because it has the following significant advantages: low cost, small size, audio sampling and computation capabilities for audio processing. The proposed wireless acoustic sensor is able to record audio samples at least to 10 kHz sampling frequency and 12-bit resolution. Also, it is capable of doing audio signal processing without compromising the sample rate and the energy consumption by using a new microcontroller released at the last quarter of 2016. The proposed low cost wireless acoustic sensor has been verified using four randomness tests for doing statistical analysis and a classification system of the recorded sounds based on audio fingerprints.

...read moreread less

Patent•

Apparatus and method for determining delay and gain parameters for calibrating a multi channel audio system

[...]

Michel Kerdranvat, Christophe Cocault

02 Mar 2017

TL;DR: In this article, a method and an apparatus for adjusting delay and gain parameters for calibrating a multichannel audio system to which a plurality of loudspeakers is connected is presented.

...read moreread less

Abstract: A method and an apparatus for adjusting delay and gain parameters for calibrating a multichannel audio system to which a plurality of loudspeakers is connected A calibration process includes emitting a plurality of test tones by an audio processing device on a plurality of loudspeakers with predetermined timings and amplitude levels, according to a calibration signal A calibration device having a microphone captures the audio signal corresponding to the test tones from the listener's position The captured audio signal is analyzed, either by the calibration device or the audio processing device, to determine the delays between loudspeakers and difference of amplitude levels between loudspeakers Corresponding delay and gain parameters are determined and used by the audio processing device to correct the sound to be played back A calibration device and an audio processing device implementing the method are disclosed as well as a calibration signal utilized in the calibration process

...read moreread less

Patent•

Audio signal processing equipment and method as well as electronic equipment

[...]

Xu Rongqiang

10 May 2017

TL;DR: In this paper, an audio signal processing equipment consisting of a microphone array, an audio localization device, a camera, an image localization device and a sound source classifier is described.

...read moreread less

Abstract: The invention discloses audio signal processing equipment and an audio signal processing method as well as electronic equipment The audio signal processing equipment comprises a microphone array, an audio localization device, a camera, an image localization device and a sound source classifier, wherein the microphone array comprises a plurality of directional microphones having different sound pickup areas; the audio localization device is used for identifying a first group of sound sources and for determining position of each sound source in an audio coordinate system; the camera is used for capturing scene images of a current scene, wherein the current scene at least covers the sound pickup areas of the plurality of directional microphones; the image localization device is used for identifying a second group of sound sources and for determining position of each sound source in an image coordinate system; and the sound source classifier is used for classifying each sound source in the first and second groups of sound sources in accordance with a registration relation between audio and the image coordinate system, the position of each sound source in the audio coordinate system as well as the position of each sound source in the image coordinate system Therefore, the precise classification of the sound sources can be achieved on the basis of double localization of the directional microphones and the camera

...read moreread less

Patent•

Audio processing method and device based on artificial intelligence

[...]

Wang Zhijian¹•Institutions (1)

Baidu¹

13 Jun 2017

TL;DR: In this article, an audio processing method and device based on artificial intelligence is described, which includes the steps that an audio file to be processed is converted into an image to process; the content feature of the image to be extracted; according to the style features and the content features, a target image is determined, and the style feature is obtained from a template image converted from a Template audio file; the target image was converted into the processed audio file.

...read moreread less

Abstract: The invention discloses an audio processing method and device based on artificial intelligence. One concrete implement mode of the method includes the steps that an audio file to be processed is converted into an image to be processed; the content feature of the image to be processed is extracted; according to the style feature and the content feature of the image to be processed, a target image is determined, and the style feature is obtained from a template image converted from a template audio file; the target image is converted into the processed audio file. By means of the implement mode, the processed audio file has the template audio style without changing the content of the audio file to be processed, and audio processing efficiency and flexibility are improved.

...read moreread less

Journal Article•DOI•

Evaluation of Spatial/3D Audio: Basic Audio Quality Versus Quality of Experience

[...]

Michael Schoeffler¹, Andreas Silzle¹, Jürgen Herre¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Feb 2017-IEEE Journal of Selected Topics in Signal Processing

TL;DR: 3D audio enhances BAQ as well as OLE over both stereo and surround sound, and the BAQ- and OLE-based assessments turned out to deliver consistent and reliable results.

...read moreread less

Abstract: During the past decades, spatial reproduction of audio signals has evolved from simple two-channel stereo to surround sound (e.g., 5.1 or 7.1) and, more recently, to three-dimensional (3D) sound including height speakers, such as 9.1 or 22.2. With increasing number of speakers, increasing spatial fidelity and listener envelopment are expected. This paper reviews popular methods for subjective assessment of audio. Moreover, it provides an experimental evaluation of the subjective quality provided by these formats, contrasting the well-known basic audio quality (BAQ) type of evaluation with the more recent evaluation of overall listening experience (OLE). Commonalities and differences in findings between both assessment approaches are discussed. The results of the evaluation indicate that 3D audio enhances BAQ as well as OLE over both stereo and surround sound. Furthermore, the BAQ- and OLE-based assessments turned out to deliver consistent and reliable results.

...read moreread less

Proceedings Article•

DNN-based Causal Voice Activity Detector

[...]

Ivan Tashev¹, Seyedmahdad Mirsamadi²•Institutions (2)

Microsoft¹, University of Texas at Dallas²

11 Apr 2017

TL;DR: This paper proposes using deep neural network (DNN) to learn the relationship between the noisy speech features and the correct VAD decision, and exceeds the classic, statistical model based VAD for both seen and unseen noises.

...read moreread less

Abstract: Voice Activity Detectors (VAD) are important components in audio processing algorithms. In general, VADs are two way classifiers, flagging the audio frames where we have voice activity. Most of them are based on the signal energy and build statistical models of the noise background and the speech signal. In the process of derivation, we are limited to simplified statistical models and this limits the accuracy of the classification. Using more precise, but also more complex, statistical models makes the analytical derivation of the solution practically impossible. In this paper, we propose using deep neural network (DNN) to learn the relationship between the noisy speech features and the correct VAD decision. In most of the cases we need a causal algorithm, i.e. working in real time and using only current and past audio samples. This is why we use audio segments that consist only of current and previous audio frames, thus making possible real-time implementations. The proposed algorithm and DNN structure exceeds the classic, statistical model based VAD for both seen and unseen noises.

...read moreread less

Journal Article•DOI•

Predicting the perception of performed dynamics in music audio with ensemble learning.

[...]

Anders Elowsson¹, Anders Friberg¹•Institutions (1)

Royal Institute of Technology¹

30 Mar 2017-Journal of the Acoustical Society of America

TL;DR: Feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux, which highlighted the importance of source separation in the feature extraction.

...read moreread less

Abstract: By varying the dynamics in a musical performance, the musician can convey structure and different expressions. Spectral properties of most musical instruments change in a complex way with the performed dynamics, but dedicated audio features for modeling the parameter are lacking. In this study, feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux. Previously, ground truths ratings of performed dynamics had been collected by asking listeners to rate how soft/loud the musicians played in a set of audio files. The ratings, averaged over subjects, were used to train three different machine learning models, using the audio features developed for the study as input. The highest result was produced from an ensemble of multilayer perceptrons with an R2 of 0.84. This result seems to be close to the upper bound, given the estimated uncertainty of the ground truth data. The result is well above that of individual human listeners of the previous listening experiment, and on par with the performance achieved from the average rating of six listeners. Features were analyzed with a factorial design, which highlighted the importance of source separation in the feature extraction.

...read moreread less

Posted Content•

Precision Scaling of Neural Networks for Efficient Audio Processing

[...]

Jong Hwan Ko, Josh Fromm, Matthai Philipose, Ivan Tashev, Shuayb Zarar - Show less +1 more

04 Dec 2017-arXiv: Audio and Speech Processing

TL;DR: It is demonstrated that deep neural networks that use lower bit precision significantly reduce the processing time (up to 30x), however, their performance impact is low only in the case of classification tasks such as those present in voice activity detection.

...read moreread less

Abstract: While deep neural networks have shown powerful performance in many audio applications, their large computation and memory demand has been a challenge for real-time processing. In this paper, we study the impact of scaling the precision of neural networks on the performance of two common audio processing tasks, namely, voice-activity detection and single-channel speech enhancement. We determine the optimal pair of weight/neuron bit precision by exploring its impact on both the performance and processing time. Through experiments conducted with real user data, we demonstrate that deep neural networks that use lower bit precision significantly reduce the processing time (up to 30x). However, their performance impact is low (< 3.14%) only in the case of classification tasks such as those present in voice activity detection.

...read moreread less

Patent•

Reconstructing audio signals with multiple decorrelation techniques

[...]

Mark Franklin Davis¹•Institutions (1)

Dolby Laboratories¹

01 Feb 2017

TL;DR: In this paper, a method performed in an audio decoder for decoding M encoded audio channels representing N audio channels is disclosed, which includes analyzing the M audio channels to detect a location of a transient.

...read moreread less

Abstract: A method performed in an audio decoder for decoding M encoded audio channels representing N audio channels is disclosed. The method includes receiving a bitstream containing the M encoded audio channels and a set of spatial parameters, decoding the M encoded audio channels, and extracting the set of spatial parameters from the bitstream. The method also includes analyzing the M audio channels to detect a location of a transient, decorrelating the M audio channels, and deriving N audio channels from the M audio channels and the set of spatial parameters. A first decorrelation technique is applied to a first subset of each audio channel and a second decorrelation technique is applied to a second subset of each audio channel. The first decorrelation technique represents a first mode of operation of a decorrelator, and the second decorrelation technique represents a second mode of operation of the decorrelator.

...read moreread less

Patent•

Audio digital signal processor utilizing a hybrid network architecture

[...]

Fink Dennis

07 Dec 2017

TL;DR: In this paper, a system and method executed by audio processing software on one or more electronic devices in a computer system to process digital audio signals is described, where the system comprises a digitizer for digitizing a received audio signal; and processor for performing a plurality of audio processing functions on the digitized audio signals.

...read moreread less

Abstract: A system and method executed by audio processing software on one or more electronic devices in a computer system to process digital audio signals. The system comprises a digitizer for digitizing a received audio signal; and processor for performing a plurality of audio processing functions on the digitized audio signals, each of the audio processing functions having at least one programmable parameter, and wherein each of the audio processing functions are categorized and grouped as audio objects, and organized into a channel strip, the channel strip processing digitized audio signals for a particular received audio signal, and wherein, the audio objects are fixed in order, so that the digitized received audio signals are processed by a predefined number of N audio objects, and wherein the N audio objects occur in a fixed sequence, and further wherein, the N audio objects comprise a first subset of non-exchangeable audio objects and a second subset of exchangeable audio objects, such that any one or more of the second subset of audio objects can be exchanged by a replacement audio object, and further wherein when the audio processing functions are programmed, they can be saved without compiling the audio processing software.

...read moreread less

Patent•

Input/output mode control for audio processing

[...]

Randall Deetz¹, Trausti Thormundsson¹, Stuart Whitfield Hutson¹, Thorarinn Sveinsson¹, Yair Kerner¹ - Show less +1 more•Institutions (1)

Conexant¹

25 May 2017

TL;DR: In this article, the authors provide input and output mode control for audio processing on a user device by monitoring audio activity on a device having at least one microphone and a digital audio processing unit, collecting information from the monitoring of the activity, and determining a context for the audio processing.

...read moreread less

Abstract: Systems and methods provide input and output mode control for audio processing on a user device. Audio processing may be configured by monitoring audio activity on a device having at least one microphone and a digital audio processing unit, collecting information from the monitoring of the activity, including an identification of at least one application utilizing audio processing, and determining a context for the audio processing, the context including at least one context resource having associated metadata. An audio configuration is determined based on the application and determined context, and an action is performed to control the audio processing mode. User controls providing addition mode control may be displayed automatically based on a current application and determined context.

...read moreread less

Journal Article•DOI•

Universal Audio Steganalysis Based on Calibration and Reversed Frequency Resolution of Human Auditory System

[...]

Hamzeh Ghasemzadeh, Meisam K. Arjmandi

01 Oct 2017-Iet Signal Processing

TL;DR: In this paper, calibrated features based on re-embedding technique improve performance of audio steganalysis and show that least significant bit is the most sensitive bit plane to data hiding algorithms, and therefore it can be employed as a universal embedding method.

...read moreread less

Abstract: Calibration and higher-order statistics are standard components of image steganalysis. However, these techniques have not yet found adequate attention in audio steganalysis. Specifically, most of current studies are either non-calibrated or only based on noise removal. The goal of this study is to fill these gaps and to show that calibrated features based on re-embedding technique improve performance of audio steganalysis. Furthermore, the authors show that least significant bit is the most sensitive bit plane to data hiding algorithms, and therefore it can be employed as a universal embedding method. The proposed features also benefit from an efficient model which is tailored to the needs for audio steganalysis and represent the maximum deviation from human auditory system. Performance of the proposed method is evaluated on a wide range of data hiding algorithms in both targeted and universal paradigms. The results show the effectiveness of the proposed method in detecting the finest traces of data hiding algorithms in very low embedding rates. The system detects Steghide at capacity of 0.06 bit per symbol with sensitivity of 98.6% (music) and 78.5% (speech). These figures are, respectively, 7.1% and 27.5% higher than the state-of-the-art results based on R-Mel-frequency cepstral coefficient features.

...read moreread less

Proceedings Article•DOI•

A 6×5×4mm 3 general purpose audio sensor node with a 4.7μW audio processing IC

[...]

Minchang Cho¹, Sechang Oh¹, Seokhyeon Jeong¹, Yiqun Zhang¹, Inhee Lee¹, Yejoong Kim¹, Li-Xuan Chuo¹, Dongkwun Kim¹, Qing Dong¹, Yen-Po Chen¹, Martin Lim, Mike Daneman, David Blaauw¹, Dennis Sylvester¹, Hun-Seok Kim¹ - Show less +11 more•Institutions (1)

University of Michigan¹

05 Jun 2017

TL;DR: The complete stand-alone system achieves 38mins of speech recording and energy-autonomous operation in room light and 4.7μW audio processing IC performs audio acquisition with 4–32× compression.

...read moreread less

Abstract: We present a complete, fully functional energy-autonomous audio sensor node with 6×5×4mm3 form factor. The system uses a new audio processing IC integrated with a MEMS microphone, general purpose 32-bit processor, 8Mb Flash, RF transceiver with custom 3D antenna, PV cells for energy harvesting and battery. The 4.7μW audio processing IC performs audio acquisition with 4–32× compression. The complete stand-alone system achieves 38mins of speech recording and energy-autonomous operation in room light.

...read moreread less

Journal Article•DOI•

High payload multi-channel dual audio watermarking algorithm based on discrete wavelet transform and singular value decomposition

[...]

A. R. Elshazly¹, Mohamed E. Nasr², Mohammed M. Fouad¹, F. S. Abdel-Samie•Institutions (2)

Zagazig University¹, Tanta University²

27 Sep 2017-International Journal of Speech Technology

TL;DR: Experimental results show that the transparency and imperceptibility of the proposed algorithm is satisfied, and that robustness is strong against popular audio signal processing attacks.

...read moreread less

Abstract: Digital watermarking technology is concerned with solving the problem of copyright protection, data authentication, content identification, distribution, and duplication of the digital media due to the great developments in computers and Internet technology Recently, protection of digital audio signals has attracted the attention of researchers This paper proposes a new audio watermarking scheme based on discrete wavelet transform (DWT), singular value decomposition (SVD), and quantization index modulation (QIM) with a synchronization code embedded with two encrypted watermark images or logos inserted into a stereo audio signal In this algorithm, the original audio signal is split into blocks, and each block is decomposed with a two-level DWT, and then the approximate low-frequency sub-band coefficients are decomposed by SVD transform to obtain a diagonal matrix The prepared watermarking and synchronization code bit stream is embedded into the diagonal matrix using QIM After that, we perform inverse singular value decomposition (ISVD) and inverse discrete wavelet transform (IDWT) to obtain the watermarked audio signal The watermark can be blindly extracted without knowledge of the original audio signal Experimental results show that the transparency and imperceptibility of the proposed algorithm is satisfied, and that robustness is strong against popular audio signal processing attacks High watermarking payload is achieved through the proposed scheme

...read moreread less

Patent•

Binaural audio signal processing method and apparatus reflecting personal characteristics

[...]

Oh Hyun Oh, Lee Tae Gyu

29 Aug 2017

TL;DR: In this article, a personalization processor receives user information and outputs a binaural parameter for controlling Binaural rendering on the basis of the user information, which is then used by a renderer.

...read moreread less

Abstract: Disclosed is an audio signal processing apparatus. A personalization processor receives user information and outputs a binaural parameter for controlling binaural rendering on the basis of the user information. A binaural renderer performs binaural rendering of a source audio on the basis of the binaural parameter.

...read moreread less

Patent•

Audio signal processing method, apparatus and terminal thereof

[...]

Li Yingwei

20 Jun 2017

TL;DR: In this article, an audio signal processing method, an apparatus and a terminal thereof, is described, which comprises the following steps of collecting audio signals played by a sound play apparatus; according to the audio signals, analyzing a sound effect of the sound play device; and adjusting a processing parameter of an audio signals processing module in the terminal so as to correct the sound effect.

...read moreread less

Abstract: The invention provides an audio signal processing method, an apparatus and a terminal thereof. The method comprises the following steps of collecting audio signals played by a sound play apparatus; according to the audio signals, analyzing a sound effect of the sound play apparatus; according to the sound effect, adjusting a processing parameter of an audio signal processing module in the terminal so as to correct the sound effect. In an embodiment of the invention, according to the sound effect played by the loudspeaker or a telephone receiver, the processing parameter in the audio processing module is automatically adjusted so that each sound source possesses a corresponding optimal processing parameter; and therefore, when any sound sources are played, the sound effect played by the loudspeaker or the telephone receiver is in an optimal state.

...read moreread less

Journal Article•DOI•

Unsupervised detection of acoustic events using information bottleneck principle

[...]

Yanxiong Li¹, Qin Wang¹, Xianku Li¹, Xue Zhang¹, Yuhan Zhang¹, Aiwu Chen¹, Qianhua He¹, Qian Huang¹ - Show less +4 more•Institutions (1)

South China University of Technology¹

01 Apr 2017-Digital Signal Processing

TL;DR: The IB principle is first concisely presented, and then the practical issues related to the application of IB principle to acoustic event detection are described in detail, including definitions of various variables, criterion for determining the number of acoustic events, tradeoff between amount of information preserved and compression of the initial representation, and detection steps.

...read moreread less

Patent•

Method of wireless audio transmission and playback

[...]

Szu-Tung Lin

25 Apr 2017

TL;DR: In this article, a method of wireless audio transmission and playback includes steps of dividing the audio data into audio segments, transmitting the audio segments to each of audio playback devices, and determining, by the host based on the acknowledgment(s) thus received, that at least one audio playback device has received a first specific audio segment.

...read moreread less

Abstract: A method of wireless audio transmission and playback includes steps of: a) dividing, by a host, the audio data into audio segments; b) transmitting, by the host, the audio segments to each of audio playback devices; c) transmitting to the host, by each of the audio playback devices, with respect to each of the audio segments received thereby, an acknowledgment indicating that the audio playback device has received the audio segment; and d) when determining, by the host based on the acknowledgment(s) thus received, that at least one of the audio playback devices has received a first specific audio segment, controlling all of the audio playback devices having received the first audio segment to play the first audio segment synchronously with each other.

...read moreread less

Collapse