scispace - formally typeset
Search or ask a question

Showing papers on "Audio signal processing published in 2016"


01 Jan 2016
TL;DR: The digital signal processing a computer based approach is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: digital signal processing a computer based approach is available in our digital library an online access to it is set as public so you can download it instantly. Our books collection saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Merely said, the digital signal processing a computer based approach is universally compatible with any devices to read.

343 citations


01 Jan 2016
TL;DR: The advanced digital signal processing and noise reduction is universally compatible with any devices to read and can be downloaded instantly from the authors' digital library.
Abstract: advanced digital signal processing and noise reduction is available in our digital library an online access to it is set as public so you can download it instantly. Our books collection spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the advanced digital signal processing and noise reduction is universally compatible with any devices to read.

197 citations


01 Jan 2016
TL;DR: Thank you very much for reading advanced digital signal processing and noise reduction, maybe you have knowledge that, people have search hundreds of times for their chosen books, but end up in infectious downloads, instead they are facing with some infectious bugs inside their laptop.
Abstract: Thank you very much for reading advanced digital signal processing and noise reduction. Maybe you have knowledge that, people have search hundreds times for their chosen books like this advanced digital signal processing and noise reduction, but end up in infectious downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some infectious bugs inside their laptop.

195 citations


Journal ArticleDOI
TL;DR: A general taxonomy, inspired by the more widespread video surveillance field, is proposed to systematically describe the methods covering background subtraction, event classification, object tracking, and situation analysis, highlighting the target applications of each described method and providing the reader with a systematic and schematic view.
Abstract: Despite surveillance systems becoming increasingly ubiquitous in our living environment, automated surveillance, currently based on video sensory modality and machine intelligence, lacks most of the time the robustness and reliability required in several real applications. To tackle this issue, audio sensory devices have been incorporated, both alone or in combination with video, giving birth in the past decade, to a considerable amount of research. In this article, audio-based automated surveillance methods are organized into a comprehensive survey: A general taxonomy, inspired by the more widespread video surveillance field, is proposed to systematically describe the methods covering background subtraction, event classification, object tracking, and situation analysis. For each of these tasks, all the significant works are reviewed, detailing their pros and cons and the context for which they have been proposed. Moreover, a specific section is devoted to audio features, discussing their expressiveness and their employment in the above-described tasks. Differing from other surveys on audio processing and analysis, the present one is specifically targeted to automated surveillance, highlighting the target applications of each described method and providing the reader with a systematic and schematic view useful for retrieving the most suited algorithms for each specific requirement.

192 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: Madmom as mentioned in this paper is an open-source audio processing and music information retrieval library written in Python that facilitates fast prototyping of MIR applications, which can be seamlessly converted into callable processing pipelines through madmom's concept of Processors, which run transparently on multiple cores.
Abstract: In this paper, we present madmom, an open-source audio processing and music information retrieval (MIR) library written in Python. madmom features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters, which facilitates fast prototyping of MIR applications. Prototypes can be seamlessly converted into callable processing pipelines through madmom's concept of Processors, callable objects that run transparently on multiple cores. Processors can also be serialised, saved, and re-run to allow results to be easily reproduced anywhere. Apart from low-level audio processing, madmom puts emphasis on musically meaningful high-level features. Many of these incorporate machine learning techniques and madmom provides a module that implements some methods commonly used in MIR such as hidden Markov models and neural networks. Additionally, madmom comes with several state-of-the-art MIR algorithms for onset detection, beat, downbeat and meter tracking, tempo estimation, and chord recognition. These can easily be incorporated into bigger MIR systems or run as stand-alone programs.

146 citations


Posted Content
TL;DR: Madmom is an open-source audio processing and music information retrieval (MIR) library written in Python that features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters that facilitates fast prototyping of MIR applications.
Abstract: In this paper, we present madmom, an open-source audio processing and music information retrieval (MIR) library written in Python. madmom features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters, which facilitates fast prototyping of MIR applications. Prototypes can be seamlessly converted into callable processing pipelines through madmom's concept of Processors, callable objects that run transparently on multiple cores. Processors can also be serialised, saved, and re-run to allow results to be easily reproduced anywhere. Apart from low-level audio processing, madmom puts emphasis on musically meaningful high-level features. Many of these incorporate machine learning techniques and madmom provides a module that implements some in MIR commonly used methods such as hidden Markov models and neural networks. Additionally, madmom comes with several state-of-the-art MIR algorithms for onset detection, beat, downbeat and meter tracking, tempo estimation, and piano transcription. These can easily be incorporated into bigger MIR systems or run as stand-alone programs.

140 citations


Proceedings ArticleDOI
11 Apr 2016
TL;DR: This work exposes a serious vulnerability in FDM based additive manufacturing systems exploitable by physical-to-cyber attacks that may lead to theft of Intellectual Property (IP) and trade secrets.
Abstract: Additive manufacturing systems, such as 3D printers, emit sounds while creating objects. Our work demonstrates that these sounds carry process information that can be used to indirectly reconstruct the objects being printed, without requiring access to the original design. This is an example of a physical-to-cyber domain attack, where information gathered from the physical domain, such as acoustic side-channel, can be used to reveal information about the cyber domain. Our novel attack model consists of a pipeline of audio signal processing, machine learning algorithms, and context-based post-processing to improve the accuracy of the object reconstruction. In our experiments, we have successfully reconstructed the test objects (designed to test the attack model under various benchmark parameters) and their corresponding G-codes with an average accuracy for axis prediction of 78.35% and an average length prediction error of 17.82% on a Fused Deposition Modeling (FDM) based additive manufacturing system. Our work exposes a serious vulnerability in FDM based additive manufacturing systems exploitable by physical-to-cyber attacks that may lead to theft of Intellectual Property (IP) and trade secrets. To the best of our knowledge this kind of attack has not yet been explored in additive manufacturing systems.

137 citations



Patent
28 Jun 2016
TL;DR: In this paper, a system for detecting similar audio being received by separate voice activated electronic devices, and ignoring those commands, is described, where a voice activated device may be activated by a wakeword that is output by the additional electronic device, such as a television or radio, may capture audio of sound subsequently following the wakeword, and may send audio data representing the sound to a backend system.
Abstract: Systems and methods for detecting similar audio being received by separate voice activated electronic devices, and ignoring those commands, is described herein. In some embodiments, a voice activated electronic device may be activated by a wakeword that is output by the additional electronic device, such as a television or radio, may capture audio of sound subsequently following the wakeword, and may send audio data representing the sound to a backend system. Upon receipt, the backend system may, in parallel to performing automated speech recognition processing to the audio data, generate a sound profile of the audio data, and may compare that sound profile to sound profiles of recently received audio data and/or flagged sound profiles. If the generated sound profile is determined to match another sound profiles, then the automated speech recognition processing may be stopped, and the voice activated electronic device may be instructed to return to a keyword spotting mode. If the matching sound profile is not already stored in a database of known sound profiles, it can be stored for future comparisons.

79 citations


Patent
08 Apr 2016
TL;DR: In this article, a position predictor is used to determine predicted position data based on position data and an audio processing device is configured to generate an output spatialized audio signal based on the predicted location data.
Abstract: In a particular aspect, an audio processing device includes a position predictor configured to determine predicted position data based on position data. The audio processing device further includes a processor configured to generate an output spatialized audio signal based on the predicted position data.

70 citations


Proceedings ArticleDOI
20 Mar 2016
TL;DR: This work compares multiple stimulus listening tests performed in a lab environment to multiple stimulus listens performed in web environment on a population drawn from Mechanical Turk.
Abstract: Automated objective methods of audio evaluation are fast, cheap, and require little effort by the investigator. However, objective evaluation methods do not exist for the output of all audio processing algorithms, often have output that correlates poorly with human quality assessments, and require ground truth data in their calculation. Subjective human ratings of audio quality are the gold standard for many tasks, but are expensive, slow, and require a great deal of effort to recruit subjects and run listening tests. Moving listening tests from the lab to the micro-task labor market of Amazon Mechanical Turk speeds data collection and reduces investigator effort. However, it also reduces the amount of control investigators have over the testing environment, adding new variability and potential biases to the data. In this work, we compare multiple stimulus listening tests performed in a lab environment to multiple stimulus listening tests performed in web environment on a population drawn from Mechanical Turk.

Proceedings ArticleDOI
12 Sep 2016
TL;DR: This paper explores the possibility of eavesdropping on handwriting via nearby mobile devices based on audio signal processing and machine learning, and presents a proof-of-concept system, WritingHacker, which shows the usage of mobile devices to collect the sound of victims' handwriting, and to extract handwriting-specific features for machine learning based analysis.
Abstract: When filling out privacy-related forms in public places such as hospitals or clinics, people usually are not aware that the sound of their handwriting leaks personal information. In this paper, we explore the possibility of eavesdropping on handwriting via nearby mobile devices based on audio signal processing and machine learning. By presenting a proof-of-concept system, WritingHacker, we show the usage of mobile devices to collect the sound of victims' handwriting, and to extract handwriting-specific features for machine learning based analysis. WritingHacker focuses on the situation where the victim's handwriting follows certain print style. An attacker can keep a mobile device, such as a common smart-phone, touching the desk used by the victim to record the audio signals of handwriting. Then the system can provide a word-level estimate for the content of the handwriting. To reduce the impacts of various writing habits and writing locations, the system utilizes the methods of letter clustering and dictionary filtering. Our prototype system's experimental results show that the accuracy of word recognition reaches around 50% - 60% under certain conditions, which reveals the danger of privacy leakage through the sound of handwriting.

Patent
15 Nov 2016
TL;DR: In this article, the authors propose a method to enable the substantially simultaneous output of two or more audio sources to a user, such as a telephone ringing or an alarm, using currently available headphones.
Abstract: A headphone set can enable the substantially simultaneous output of two or more audio sources to a user. The audio from the multiple sources may be mixed so that a listener can hear and understand audio received from multiple audio sources. Thus, in certain embodiments, a user can consume media and also engage in conversation with other users or listen to ambient sounds, such as a telephone ringing or an alarm. Further, a mobile device, such as a mobile phone, can mix audio from multiple sources enabling a user to consume audio from the device while also listening to ambient sound or audio received from a source other than the device, using currently-available headphones.

Patent
Lae-Hoon Kim1, Erik Visser1, Asif Iqbal Mohammad1, Ian Ernan Liu1, Ye Jiang1 
21 Dec 2016
TL;DR: In this article, a signal processing system is configured to update one or more processing parameters while operating in a first operational mode, and to use a static version of the processing parameters during operation in a second operational mode.
Abstract: An apparatus includes multiple microphones to generate audio signals based on sound of a far-field acoustic environment. The apparatus also includes a signal processing system to process the audio signals to generate at least one processed audio signal. The signal processing system is configured to update one or more processing parameters while operating in a first operational mode and is configured to use a static version of the one or more processing parameters while operating in the second operational mode. The apparatus further includes a keyword detection system to perform keyword detection based on the at least one processed audio signal to determine whether the sound includes an utterance corresponding to a keyword and, based on a result of the keyword detection, to send a control signal to the signal processing system to change an operational mode of the signal processing system.

Journal ArticleDOI
TL;DR: The CNN classifier is shown to yield considerably better results compared to the logistic regression classifier, demonstrating the power of deep learning when applied to audio processing.
Abstract: Automatic detection of a baby cry in audio signals is an essential step in applications such as remote baby monitoring. It is also important for researchers, who study the relation between baby cry patterns and various health or developmental parameters. In this paper, we propose two machine-learning algorithms for automatic detection of baby cry in audio recordings. The first algorithm is a low-complexity logistic regression classifier, used as a reference. To train this classifier, we extract features such as Mel-frequency cepstrum coefficients, pitch and formants from the recordings. The second algorithm uses a dedicated convolutional neural network (CNN), operating on log Mel-filter bank representation of the recordings. Performance evaluation of the algorithms is carried out using an annotated database containing recordings of babies (0-6 months old) in domestic environments. In addition to baby cry, these recordings contain various types of domestic sounds, such as parents talking and door opening. The CNN classifier is shown to yield considerably better results compared to the logistic regression classifier, demonstrating the power of deep learning when applied to audio processing.

Patent
17 May 2016
TL;DR: In this paper, the authors proposed a method to detect and improve the quality of the audio signal using a mobile device by determining the loss in audio signal and enhancing audio by streaming the remainder portion of audio.
Abstract: Embodiments disclosed herein enable detection and improvement of the quality of the audio signal using a mobile device by determining the loss in the audio signal and enhancing audio by streaming the remainder portion of audio. Embodiments disclosed herein enable an improvement in the sound quality rendered by rendering devices by emitting an test audio signal from the source device, measuring the test audio signal using microphones, detecting variation in the frequency response, loudness and timing characteristics using impulse responses and correcting for them. Embodiments disclosed herein also compensate for the noise in the acoustic space by determining the reverberation and ambient noise levels and their frequency characteristics and changing the digital filters and volumes of the source signal to compensate for the varying noise levels.

Journal ArticleDOI
TL;DR: In this paper, a quantum representation of digital audio (QRDA) is proposed to present quantum audio, which uses two entangled qubit sequences to store the audio amplitude and time information.
Abstract: Multimedia refers to content that uses a combination of different content forms. It includes two main medias: image and audio. However, by contrast with the rapid development of quantum image processing, quantum audio almost never been studied. In order to change this status, a quantum representation of digital audio (QRDA) is proposed in this paper to present quantum audio. QRDA uses two entangled qubit sequences to store the audio amplitude and time information. The two qubit sequences are both in basis state: |0〉 and |1〉. The QRDA audio preparation from initial state |0〉 is given to store an audio in quantum computers. Then some exemplary quantum audio processing operations are performed to indicate QRDA’s usability.

Patent
08 Aug 2016
TL;DR: In this article, the authors describe an approach for calibrating a speaker in a device having a microphone by inputting an original audio signal to a processing component and an output stage of the device for playback through the speakers, receiving playback sound output from the speakers through the microphone to generate a microphone signal, and inputting the microphone signal into the processing component to calibrate the speakers for optimal playback of the original audio signals.
Abstract: Embodiments are described for calibrating a speaker in a device having a microphone by inputting an original audio signal to a processing component and an output stage of the device for playback through the speakers, receiving playback sound output from the speakers through the microphone to generate a microphone signal, and inputting the microphone signal into the processing component to calibrate the speakers for optimal playback of the original audio signal, wherein the processing component is configured to compare the original audio signal to the microphone signal and correct the microphone signal by one or more audio processing functions in accordance with a refresh schedule.

Patent
30 May 2016
TL;DR: In this paper, mobile devices may capture audio signals indicative of test audio received by an audio capture device of the mobile device; and send the captured audio and the zone designation to a sound processor to determine equalization settings for speakers of the zone of the venue.
Abstract: Mobile devices may capture audio signals indicative of test audio received by an audio capture device of the mobile device; and send the captured audio and the zone designation to a sound processor to determine equalization settings for speakers of the zone of the venue. An audio filtering device may receive the captured audio signals from the mobile devices; compare each of the captured audio signals with the test signal to determine an associated reliability of each of the captured audio signals; combine the captured audio signals into zone audio data; and transmit the zone audio data and associated reliability to a sound processor configured to determine equalization settings for the zone based on the captured audio signals and the test signal.

Book
09 Aug 2016
TL;DR: A Signal Processing Perspective of Financial Engineering provides straightforward and systematic access to financial engineering for researchers in signal processing and communications so that they can understand problems in financial engineering more easily and may even apply signal processing techniques to handle some financial problems.
Abstract: Despite the different nature of financial engineering and electrical engineering, both areas are intimately connected on a mathematical level. The foundations of financial engineering lie on the statistical analysis of numerical time series and the modeling of the behavior of the financial markets in order to perform predictions and systematically optimize investment strategies. Similarly, the foundations of electrical engineering, for instance, wireless communication systems, lie on statistical signal processing and the modeling of communication channels in order to perform predictions and systematically optimize transmission strategies. Both foundations are the same in disguise. It is often the case in science that the same or very similar methodologies are developed and applied independently in different areas. A Signal Processing Perspective of Financial Engineering is about investment in financial assets treated as a signal processing and optimization problem. It explores such connections and capitalizes on the existing mathematical tools developed in wireless communications and signal processing to solve real-life problems arising in the financial markets in an unprecedented way. A Signal Processing Perspective of Financial Engineering provides straightforward and systematic access to financial engineering for researchers in signal processing and communications so that they can understand problems in financial engineering more easily and may even apply signal processing techniques to handle some financial problems.

Proceedings ArticleDOI
01 Nov 2016
TL;DR: In this article, the authors proposed two machine learning algorithms for automatic detection of baby cry in audio recordings using a low-complexity logistic regression classifier and a dedicated convolutional neural network.
Abstract: Automatic detection of a baby cry in audio signals is an essential step in applications such as remote baby monitoring. It is also important for researchers, who study the relation between baby cry patterns and various health or developmental parameters. In this paper, we propose two machine-learning algorithms for automatic detection of baby cry in audio recordings. The first algorithm is a low-complexity logistic regression classifier, used as a reference. To train this classifier, we extract features such as Mel-frequency cepstrum coefficients, pitch and formants from the recordings. The second algorithm uses a dedicated convolutional neural network (CNN), operating on log Mel-filter bank representation of the recordings. Performance evaluation of the algorithms is carried out using an annotated database containing recordings of babies (0–6 months old) in domestic environments. In addition to baby cry, these recordings contain various types of domestic sounds, such as parents talking and door opening. The CNN classifier is shown to yield considerably better results compared to the logistic regression classifier, demonstrating the power of deep learning when applied to audio processing.

Patent
17 Jun 2016
TL;DR: In this article, a plug adapter is used to engage a wall outlet to receive power from the wall outlet and retain the device against a wall with respect to the wall plug with the purpose of detecting voice commands.
Abstract: A device comprising includes a housing and a plug adapter configured to engage a wall outlet to receive power from the wall outlet and retain the device against a wall with respect to the wall outlet. The device includes one or more speakers, one or more wireless transceivers for communicating over a wireless network, and one or more microphones. The device also includes an audio processing device and a processing unit. The audio processing device is configured to receive audio from the one or more microphones and detect voice commands. The processing unit is configured to, in response to the voice commands, trigger one or more of audio playback and a two-way voice call.

Patent
01 Apr 2016
TL;DR: In this paper, a decoding part for decoding an audio signal, a volume control part for controlling a volume of the audio signal; a filter part for amending a frequency characteristic of audio signals according to a frequency response characteristic of an audio output device from which the audio signals are generated; and an equalizing processing unit for equalizing audio signals based on an equal-loudness contour according to the volume of audio signal.
Abstract: The present invention relates to an electronic device and a method of controlling the same. The electronic device according to the present invention includes a decoding part for decoding an audio signal; a volume control part for controlling a volume of the audio signal; a filter part for amending a frequency characteristic of the audio signal according to a frequency response characteristic of an audio output device from which the audio signal is output, such that the audio signal is output with the same volume in all frequency bands; and an equalizing processing unit for equalizing the audio signal based on an equal-loudness contour according to the volume of the audio signal. Thus, the optimal equalizing is performed based on the output characteristics of the audio output device and the auditory property of a human according to each volume level, so that an audio effect and quality may be improved.


Journal ArticleDOI
08 Jul 2016-PLOS ONE
TL;DR: Although the classification accuracy achieved by the proposed method was comparable to the performance of those traditional techniques in quiet, the new feature was found to provide lower error rates of classification under noisy environments.
Abstract: Speaker identification under noisy conditions is one of the challenging topics in the field of speech processing applications. Motivated by the fact that the neural responses are robust against noise, this paper proposes a new speaker identification system using 2-D neurograms constructed from the responses of a physiologically-based computational model of the auditory periphery. The responses of auditory-nerve fibers for a wide range of characteristic frequency were simulated to speech signals to construct neurograms. The neurogram coefficients were trained using the well-known Gaussian mixture model-universal background model classification technique to generate an identity model for each speaker. In this study, three text-independent and one text-dependent speaker databases were employed to test the identification performance of the proposed method. Also, the robustness of the proposed method was investigated using speech signals distorted by three types of noise such as the white Gaussian, pink, and street noises with different signal-to-noise ratios. The identification results of the proposed neural-response-based method were compared to the performances of the traditional speaker identification methods using features such as the Mel-frequency cepstral coefficients, Gamma-tone frequency cepstral coefficients and frequency domain linear prediction. Although the classification accuracy achieved by the proposed method was comparable to the performance of those traditional techniques in quiet, the new feature was found to provide lower error rates of classification under noisy environments.

Patent
01 Feb 2016
TL;DR: In this article, an audio signal processing method which performs binaural filtering on an input audio signal, including a direction renderer which localizes a direction of a sound source of the audio signal and a distance renderer that reflects an effect in accordance with a distance between the sound source and a listener, is presented.
Abstract: The present invention relates to an audio signal processing apparatus and an audio signal processing method which perform binaural rendering. The present invention provides an audio signal processing apparatus which performs binaural filtering on an input audio signal, including: a direction renderer which localizes a direction of a sound source of the input audio signal and a distance renderer which reflects an effect in accordance with a distance between the sound source of the input audio signal and a listener, in which the distance renderer obtains information on a distance (an ipsilateral distance) and an incident angle (an ipsilateral incident angle) of the sound source with respect to an ipsilateral ear of the listener and information on a distance (a contralateral distance) and an incident angle (a contralateral incident angle) of the sound source with respect to a contralateral ear of the listener, determines an ipsilateral distance filter based on at least one of the obtained information of the ipsilateral distance and the ipsilateral incident angle, determines a contralateral distance filter based on at least one of the obtained information of the contralateral distance and the contralateral incident angle, and filters the input audio signal with the determined ipsilateral distance filer and contralateral distance filter, respectively, to generate an ipsilateral output signal and a contralateral output signal.

Journal ArticleDOI
TL;DR: This work is the first systematic evaluation of a PCG signal quality classification algorithm and assessment of the quality of PCG recordings captured by non-experts, using both a medical-grade digital stethoscope and a mobile phone.
Abstract: Mobile phones, due to their audio processing capabilities, have the potential to facilitate the diagnosis of heart disease through automated auscultation. However, such a platform is likely to be used by non-experts, and hence, it is essential that such a device is able to automatically differentiate poor quality from diagnostically useful recordings since non-experts are more likely to make poor-quality recordings. This paper investigates the automated signal quality assessment of heart sound recordings performed using both mobile phone-based and commercial medical-grade electronic stethoscopes. The recordings, each 60 s long, were taken from 151 random adult individuals with varying diagnoses referred to a cardiac clinic and were professionally annotated by five experts. A mean voting procedure was used to compute a final quality label for each recording. Nine signal quality indices were defined and calculated for each recording. A logistic regression model for classifying binary quality was the...

01 Jan 2016
TL;DR: Thank you very much for downloading speech and audio processing in adverse environments, this book will help people cope with some infectious bugs inside their desktop computer.
Abstract: Thank you very much for downloading speech and audio processing in adverse environments. As you may know, people have look numerous times for their chosen books like this speech and audio processing in adverse environments, but end up in harmful downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they cope with some infectious bugs inside their desktop computer.

Book
30 Aug 2016
TL;DR: This pragmatic and application driven focus, and concise explanations, are an essential resource for anyone who wants to rapidly gain a practical understanding of speech and audio processing and technology.
Abstract: With this comprehensive and accessible introduction to the field, you will gain all the skills and knowledge needed to work with current and future audio, speech, and hearing processing technologies. Topics covered include mobile telephony, human-computer interfacing through speech, medical applications of speech and hearing technology, electronic music, audio compression and reproduction, big data audio systems and the analysis of sounds in the environment. All of this is supported by numerous practical illustrations, exercises, and hands-on MATLAB® examples on topics as diverse as psychoacoustics (including some auditory illusions), voice changers, speech compression, signal analysis and visualisation, stereo processing, low-frequency ultrasonic scanning, and machine learning techniques for big data. With its pragmatic and application driven focus, and concise explanations, this is an essential resource for anyone who wants to rapidly gain a practical understanding of speech and audio processing and technology.

Patent
07 Jun 2016
TL;DR: In this paper, the relative volume of background sounds in the audio stream is selectively reduced based on real-world positioning of corresponding audio sources, including realworld and/or virtualized audio sources.
Abstract: A conferencing system includes a near-eye display device that displays video received from a remote communication device of a communication partner. An audio stream is transmitted to the remote communication device. The audio stream includes real-world sounds produced by one or more real-world audio sources captured by a spatially-diverse microphone array and virtual sounds produced by one or more virtual audio sources. A relative volume of background sounds in the audio stream is selectively reduced based, at least in part, on real-world positioning of corresponding audio sources, including real-world and/or virtualized audio sources.