Showing papers on "Speech coding published in 2011"

PDF

Open Access

Patent•

Speech and noise models for speech recognition

[...]

Matthew I. Lloyd¹, Trausti Kristjansson¹•Institutions (1)

13 Jun 2011

TL;DR: In this paper, a user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold, in response to determining that the background audio is below the defined threshold.

...read moreread less

Abstract: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

...read moreread less

276 citations

Patent•

Geotagged environmental audio for enhanced speech recognition accuracy

[...]

Trausti Kristjansson¹, Matthew I. Lloyd¹•Institutions (1)

Google¹

22 Mar 2011

273 citations

Journal Article•DOI•

Speech Quality Estimation: Models and Trends

[...]

Sebastian Möller, Wai-Yip Chan¹, Nicolas Côté², Tiago H. Falk³, Alexander Raake⁴, Marcel Wältermann⁵ - Show less +2 more•Institutions (5)

Queen's University¹, University of Western Brittany², Institut national de la recherche scientifique³, École Polytechnique Fédérale de Lausanne⁴, Deutsche Telekom⁵

01 Nov 2011-IEEE Signal Processing Magazine

TL;DR: This article presents a tutorial overview of models for estimating the quality experienced by users of speech transmission and communication services, serving as a guide to an appropriate usage of the multitude of current and emerging speech quality models.

...read moreread less

Abstract: This article presents a tutorial overview of models for estimating the quality experienced by users of speech transmission and communication services. Such models can be classified as either parametric or signal based. Signal-based models use input speech signals measured at the electrical or acoustic interfaces of the transmission channel. Parametric models, on the other hand, depend on signal and system parameters estimated during network planning or at run time. This tutorial describes the underlying principles as well as advantages and limitations of existing models. It also presents new developments, thus serving as a guide to an appropriate usage of the multitude of current and emerging speech quality models.

...read moreread less

135 citations

Journal Article•DOI•

Steganography in Inactive Frames of VoIP Streams Encoded by Source Codec

[...]

Yongfeng Huang¹, Shanyu Tang², Jian Yuan¹•Institutions (2)

Tsinghua University¹, London Metropolitan University²

01 Jun 2011-IEEE Transactions on Information Forensics and Security

TL;DR: It is revealed that, contrary to existing thought, the inactive frames of VoIP streams are more suitable for data embedding than the active frames of the streams; that is, steganography in the inactive audio frames attains a largerData embedding capacity than that in the active audio frames under the same imperceptibility.

...read moreread less

Abstract: This paper describes a novel high-capacity steganography algorithm for embedding data in the inactive frames of low bit rate audio streams encoded by G.723.1 source codec, which is used extensively in Voice over Internet Protocol (VoIP). This study reveals that, contrary to existing thought, the inactive frames of VoIP streams are more suitable for data embedding than the active frames of the streams; that is, steganography in the inactive audio frames attains a larger data embedding capacity than that in the active audio frames under the same imperceptibility. By analyzing the concealment of steganography in the inactive frames of low bit rate audio streams encoded by G.723.1 codec with 6.3 kb/s, the authors propose a new algorithm for steganography in different speech parameters of the inactive frames. Performance evaluation shows embedding data in various speech parameters led to different levels of concealment. An improved voice activity detection algorithm is suggested for detecting inactive audio frames taking into packet loss account. Experimental results show our proposed steganography algorithm not only achieved perfect imperceptibility but also gained a high data embedding rate up to 101 bits/frame, indicating that the data embedding capacity of the proposed algorithm is very much larger than those of previously suggested algorithms.

...read moreread less

127 citations

Journal Article•DOI•

Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals

[...]

Behnaz Ghoraani¹, Sridhar Krishnan¹•Institutions (1)

Ryerson University¹

01 Sep 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The results of the numerical simulation support the effectiveness of the proposed approach for environmental audio classification with over 10% accuracy-rate improvement compared to the MFCC features.

...read moreread less

Abstract: Audio feature extraction and classification are important tools for audio signal analysis in many applications, such as multimedia indexing and retrieval, and auditory scene analysis. However, due to the nonstationarities and discontinuities exist in these signals, their quantification and classification remains a formidable challenge. In this paper, we develop a new approach for audio feature extraction to effectively quantify these nonstationarities in an attempt to achieve high classification accuracy for environmental audio signals. Our approach consists of three stages: first we propose to construct the time-frequency matrix (TFM) of audio signals using matching-pursuit time-frequency distribution (MP-TFD) technique, and then apply the non-negative matrix decomposition (NMF) technique to decompose the TFM into its significant components. Finally, we propose seven novel features from the spectral and temporal structures of the decomposed vectors in a way that they successfully represent joint TF structure of the audio signal, and combine them with the Mel-frequency cepstral coefficients (MFCCs) features. These features are examined using a database of 192 environmental audio signals which includes 20 aircraft, 17 helicopter, 20 drum, 15 flute, 20 piano, 20 animal, 20 bird, and 20 insect sounds, and the speech of 20 males and 20 females. The results of the numerical simulation support the effectiveness of the proposed approach for environmental audio classification with over 10% accuracy-rate improvement compared to the MFCC features.

...read moreread less

124 citations

Journal Article•DOI•

Front end analysis of speech recognition: a review

[...]

M. A. Anusuya¹, S. K. Katti¹•Institutions (1)

Sri Jayachamarajendra College of Engineering¹

01 Jun 2011-International Journal of Speech Technology

TL;DR: The different aspects of front end analysis of speech recognition including sound characteristics, feature extraction techniques, spectral representations of the speech signal etc are discussed.

...read moreread less

Abstract: Automatic speech recognition (ASR) has made great strides with the development of digital signal processing hardware and software. But despite of all these advances, machines can not match the performance of their human counterparts in terms of accuracy and speed, especially in case of speaker independent speech recognition. So, today significant portion of speech recognition research is focused on speaker independent speech recognition problem. Before recognition, speech processing has to be carried out to get a feature vectors of the signal. So, front end analysis plays a important role. The reasons are its wide range of applications, and limitations of available techniques of speech recognition. So, in this report we briefly discuss the different aspects of front end analysis of speech recognition including sound characteristics, feature extraction techniques, spectral representations of the speech signal etc. We have also discussed the various advantages and disadvantages of each feature extraction technique, along with the suitability of each method to particular application.

...read moreread less

109 citations

Patent•

Audio signal processing apparatus and audio signal processing method

[...]

Wei-Han Liu¹, Hsiao-Yu Han•Institutions (1)

MediaTek¹

25 May 2011

TL;DR: In this paper, an audio signal processing apparatus consisting of a plurality of individual audio interfaces and an audio channel splitting unit is used for determining a total number of audio channels corresponding to the individual interfaces and generating a first output audio signal with a first number of channels according to an input audio signal and the total amount of channels.

...read moreread less

Abstract: An audio signal processing apparatus and an audio signal processing method are provide The audio signal processing apparatus comprises: a plurality of individual audio interfaces, an audio signal processing unit, and an audio channel splitting unit The audio signal processing unit is utilized for determining a total number of audio channels corresponding to the individual audio interfaces and generating a first output audio signal with a first number of audio channels according to an input audio signal and the total number of audio channels when the audio signal processing apparatus is operated under a first operational mode The audio channel splitting unit is coupled to the audio signal processing unit and the audio interfaces When the audio signal processing apparatus is operated under the first operational mode, the audio channel splitting unit splits the first output audio signal with the first number of audio channels to the audio interfaces, respectively

...read moreread less

82 citations

Patent•

Integrated local and cloud based speech recognition

[...]

Thomas M. Soemo¹, Leo Soong¹, Michael H. Kim¹, Chad R. Heinemann¹, Dax Hawkins¹ - Show less +1 more•Institutions (1)

Microsoft¹

02 Sep 2011

TL;DR: A system for integrating local speech recognition with cloud-based speech recognition in order to provide an efficient natural user interface is described in this article, where a computing device determines a direction associated with a particular person within an environment and generates an audio recording associated with the direction.

...read moreread less

Abstract: A system for integrating local speech recognition with cloud-based speech recognition in order to provide an efficient natural user interface is described In some embodiments, a computing device determines a direction associated with a particular person within an environment and generates an audio recording associated with the direction The computing device then performs local speech recognition on the audio recording in order to detect a first utterance spoken by the particular person and to detect one or more keywords within the first utterance The first utterance may be detected by applying voice activity detection techniques to the audio recording The first utterance and the one or more keywords are subsequently transferred to a server which may identify speech sounds within the first utterance associated with the one or more keywords and adapt one or more speech recognition techniques based on the identified speech sounds

...read moreread less

75 citations

Proceedings Article•

Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals

[...]

Marco Jeub¹, Christoph Nelke¹, Christophe Beaugeant², Peter Vary¹•Institutions (2)

RWTH Aachen University¹, Intel Mobile Communications²

01 Aug 2011

TL;DR: A novel dual-channel algorithm is proposed which estimates the coherent-to-diffuse energy ratio (CDR) of background noise in mixed noise fields based on an estimate of the noise field coherence from a noisy speech signal and a subsequent minima tracking in order to increase the estimation accuracy even in the presence of speech.

...read moreread less

Abstract: A novel dual-channel algorithm is proposed which estimates the coherent-to-diffuse energy ratio (CDR) of background noise in mixed noise fields. The algorithm is based on an estimate of the noise field coherence from a noisy speech signal and a subsequent minima tracking in order to increase the estimation accuracy even in the presence of speech. The obtained CDR estimate can be used, e.g., for the acoustic environment classification in hearing aids or to control speech enhancement algorithms such as noise reduction or speech dereverberation. Besides, the approach can be used to calculate an estimate of the direct-to-reverberant energy ratio (DRR) blindly from reverberant speech signals.

...read moreread less

74 citations

Patent•

Systems, methods, apparatus, and computer program products for wideband speech coding

[...]

Dai Yang¹, Daniel J. Sinder¹•Institutions (1)

Qualcomm¹

01 Jun 2011

TL;DR: In this paper, an excitation signal for a first frequency band of the audio signal is used to calculate the excitation signals for a second frequency band that is separated from the first band.

...read moreread less

Abstract: Methods of audio coding are described in which an excitation signal for a first frequency band of the audio signal is used to calculate an excitation signal for a second frequency band of the audio signal that is separated from the first frequency band.

...read moreread less

73 citations

Book•

Source Coding: Part I of Fundamentals of Source and Video Coding

[...]

Thomas Wiegand¹, Heiko Schwarz¹•Institutions (1)

Heinrich Hertz Institute¹

05 Jan 2011

TL;DR: Based on the fundamentals of information and rate distortion theory, the most relevant techniques used in source coding algorithms are described: entropy coding, quantization as well as predictive and transform coding.

...read moreread less

Abstract: Digital media technologies have become an integral part of the way we create, communicate, and consume information. At the core of these technologies are source coding methods that are described in this monograph. Based on the fundamentals of information and rate distortion theory, the most relevant techniques used in source coding algorithms are described: entropy coding, quantization as well as predictive and transform coding. The emphasis is put onto algorithms that are also used in video coding, which will be explained in the other part of this two-part monograph.

...read moreread less

Patent•

User profile based audio adjustment techniques

[...]

Ling Jun Wong¹, True Xiong¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

11 Mar 2011

TL;DR: In this paper, user profile based audio adjustment techniques are used to render various audio and audio/video content having different audio output parameter values in accordance with a user profile that characterizes a user's desired value and/or range of one or more of the output parameter levels.

...read moreread less

Abstract: Embodiments are directed toward user profile based audio adjustment techniques The techniques are used to render various audio and/or audio/video content having different audio output parameter values in accordance with a user profile that characterizes a user's desired value and/or range of one or more of the output parameter levels

...read moreread less

Patent•

System and method of smart audio logging for mobile devices

[...]

Te-Won Lee¹, Khaled Helmi El-Maleh¹, Heejong Yoo¹, Jong Won Shin¹•Institutions (1)

Qualcomm¹

08 Apr 2011

TL;DR: In this article, a mobile device that is capable of automatically starting and ending the recording of an audio signal captured by at least one microphone is presented, which can adjust a number of parameters related with audio logging based on the context information of the audio input signal.

...read moreread less

Abstract: A mobile device that is capable of automatically starting and ending the recording of an audio signal captured by at least one microphone is presented. The mobile device is capable of adjusting a number of parameters related with audio logging based on the context information of the audio input signal.

...read moreread less

Patent•

Geotagged and weighted environmental audio for enhanced speech recognition accuracy

[...]

Trausti Kristjansson¹, Matthew I. Lloyd¹•Institutions (1)

Google¹

30 Sep 2011

TL;DR: In this article, the authors proposed a method to enhance noisy speech recognition accuracy by receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile devices, selecting a subset of geotaggregated audio signals and weighting each geotagated audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated.

...read moreread less

Abstract: Enhancing noisy speech recognition accuracy by receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, selecting a subset of geotagged audio signals and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated, generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

...read moreread less

Journal Article•DOI•

A channel-selection criterion for suppressing reverberation in cochlear implants

[...]

Kostas Kokkinakis¹, Oldooz Hazrati, Philipos C. Loizou•Institutions (1)

University of Texas at Dallas¹

10 May 2011-Journal of the Acoustical Society of America

TL;DR: Results indicated that for all subjects tested, speech intelligibility decreased exponentially with an increase in reverberation time, and the proposed channel-selection criterion reduces the temporal envelope smearing effects introduced by reverberation and also diminishes the self-masking effects responsible for flattened formants.

...read moreread less

Abstract: Little is known about the extent to which reverberation affects speech intelligibility by cochlear implant (CI) listeners. Experiment 1 assessed CI users’ performance using Institute of Electrical and Electronics Engineers (IEEE) sentences corrupted with varying degrees of reverberation. Reverberation times of 0.30, 0.60, 0.80, and 1.0 s were used. Results indicated that for all subjects tested, speech intelligibility decreased exponentially with an increase in reverberation time. A decaying-exponential model provided an excellent fit to the data. Experiment 2 evaluated (offline) a speech coding strategy for reverberation suppression using a channel-selection criterion based on the signal-to-reverberant ratio (SRR) of individual frequency channels. The SRR reflects implicitly the ratio of the energies of the signal originating from the early (and direct) reflections and the signal originating from the late reflections. Channels with SRR larger than a preset threshold were selected, while channels with SRR smaller than the threshold were zeroed out. Results in a highly reverberant scenario indicated that the proposed strategy led to substantial gains (over 60 percentage points) in speech intelligibility over the subjects’ daily strategy. Further analysis indicated that the proposed channel-selection criterion reduces the temporal envelope smearing effects introduced by reverberation and also diminishes the self-masking effects responsible for flattened formants.

...read moreread less

Book Chapter•

Speech and Audio Signal Processing

[...]

D.A. van Leeuwen

01 Jan 2011

Patent•

Wireless sound transmission system and method

[...]

Amre El-Hoiydi

30 Mar 2011

TL;DR: In this article, a method for providing sound to at least one user, involves supplying audio signals from an audio signal source to a transmission unit; compressing the audio signals to generate compressed audio data; transmitting compressed audio audio data from the transmission unit to a receiver unit; and stimulating the hearing of the user(s) according to decompressed audio signals supplied from the receiver unit.

...read moreread less

Abstract: Method for providing sound to at least one user, involves supplying audio signals from an audio signal source to a transmission unit; compressing the audio signals to generate compressed audio data; transmitting compressed audio data from the transmission unit to at least one receiver unit; decompressing the compressed audio data to generate decompressed audio signals; and stimulating the hearing of the user(s) according to decompressed audio signals supplied from the receiver unit. During certain time periods, transmission of compressed audio data is interrupted, and instead, at least one control data block is generated by the transmission unit in such a manner that audio data transmission is replaced by control data block transmission, thereby temporarily interrupting flow of received compressed audio data, each control data block includes a marker recognized by the at least one receiver unit as a control data block and a command for control of the receiver unit.

...read moreread less

Patent•

Methods and apparatus for generating, updating and distributing speech recognition models

[...]

Craig L. Reding, Suzi Levas¹•Institutions (1)

Google¹

30 Dec 2011

TL;DR: In this paper, a shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facilities via a communications channel, e.g., the Internet.

...read moreread less

Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability. Voice dialing, telephone control and/or other services are provided by the speech processing facility in response to speech recognition results.

...read moreread less

Journal Article•DOI•

Derivative-based audio steganalysis

[...]

Qingzhong Liu¹, Andrew H. Sung², Mengyu Qiao³•Institutions (3)

Sam Houston State University¹, New Mexico Institute of Mining and Technology², South Dakota School of Mines and Technology³

02 Sep 2011-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: The second-order derivative-based audio steganalysis method gains a considerable advantage under all categories of signal complexity--especially for audio streams with high signal complexity, which are generally the most challenging for Steganalysis-and thereby significantly improves the state of the art in audio stegansalysis.

...read moreread less

Abstract: This article presents a second-order derivative-based audio steganalysis First, Mel-cepstrum coefficients and Markov transition features from the second-order derivative of the audio signal are extracted; a support vector machine is then applied to the features for discovering the existence of hidden data in digital audio streams Also, the relation between audio signal complexity and steganography detection accuracy, which is an issue relevant to audio steganalysis performance evaluation but so far has not been explored, is analyzed experimentally Results demonstrate that, in comparison with a recently proposed signal stream-based Mel-cepstrum method, the second-order derivative-based audio steganalysis method gains a considerable advantage under all categories of signal complexity--especially for audio streams with high signal complexity, which are generally the most challenging for steganalysis-and thereby significantly improves the state of the art in audio steganalysis

...read moreread less

Patent•

System and Methods for Continuous Audio Matching

[...]

Bernard Mont-Reynaud, Aaron Master, Timothy P. Stonehocker, Keyvan Mohajer

28 Jul 2011

TL;DR: In this article, the authors present a method for continuous monitoring of audio signals and identification of audio items within an audio signal, which utilizes predictive caching of fingerprints to improve the efficiency of audio identification.

...read moreread less

Abstract: The present invention relates to the continuous monitoring of an audio signal and identification of audio items within an audio signal. The technology disclosed utilizes predictive caching of fingerprints to improve efficiency. Fingerprints are cached for tracking an audio signal with known alignment and for watching an audio signal without known alignment, based on already identified fingerprints extracted from the audio signal. Software running on a smart phone or other battery-powered device cooperates with software running on an audio identification server.

...read moreread less

Patent•

System and method for engaging a person in the presence of ambient audio

[...]

Aron Weiss, Omri Halevi, Hezi Manus, Dave Springer

06 Sep 2011

TL;DR: In this paper, an audio signal of ambient audio is autonomously sampled in the vicinity of the mobile computer system to capture one or more audio samples of the audio signal, and the audio signature may be compared with multiple previously stored reference audio signatures.

...read moreread less

Abstract: A computerized method for engaging a user of a mobile computer system, The mobile computer system may be connectible to a server over a wide area network. An audio signal of ambient audio is autonomously sampled in the vicinity of the mobile computer system to capture one or more audio samples of the audio signal. The multiple samples of the audio signal are autonomously sampled without requiring any interaction from the user, thus avoiding an input from the user to capture each of the samples. The audio sample may be processed to extract an audio signature of the audio sample. The audio signature may be compared with multiple previously stored reference audio signatures. Upon matching the audio signature with at least one reference audio signature a matched reference audio signature may be produced.

...read moreread less

Proceedings Article•DOI•

Audio Oracle Analysis of Musical Information Rate

[...]

Shlomo Dubnov¹, Gérard Assayag², Arshia Cont²•Institutions (2)

University of California, San Diego¹, Centre national de la recherche scientifique²

18 Sep 2011

TL;DR: Using compression properties of AO, this formulation extends the notion of Information Rate to individual sequences and allows an optimal estimation of the AO threshold parameter and shows that changes in IR correspond to significant musical structures such as sections in a sonata form.

...read moreread less

Abstract: This paper presents a method for analysis of changes in information contents in music based on an audio representation called Audio Oracle (AO). Using compression properties of AO we estimate the amount of information that passes between the past and the present at every instance in a musical signal. This formulation extends the notion of Information Rate (IR) to individual sequences and allows an optimal estimation of the AO threshold parameter. We show that changes in IR correspond to significant musical structures such as sections in a sonata form. Relation to musical perception and applications for composition and improvisation are discussed in the paper.

...read moreread less

Journal Article•DOI•

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

[...]

Lei Xie¹, Zhong-Hua Fu¹, Wei Feng², Yong Luo¹•Institutions (2)

Northwestern Polytechnical University¹, City University of Hong Kong²

01 Mar 2011-Multimedia Systems

TL;DR: Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification.

...read moreread less

Abstract: Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.

...read moreread less

Patent•

Multi-channel audio over standard wireless protocol

[...]

Clarke S. Watson, Benjamin W. Cook, Axel D. Berny, Jason A. Trachewsky

06 Oct 2011

TL;DR: In this article, a wireless multi-channel audio system including an audio source with a wireless transceiver configured to communicate according to a standard wireless protocol and an audio controller is collectively configured to establish wireless communications with multiple audio sinks via a corresponding wireless link.

...read moreread less

Abstract: A wireless multi-channel audio system including an audio source with a wireless transceiver configured to communicate according to a standard wireless protocol and an audio controller, which are collectively configured to establish wireless communications with multiple audio sinks via a corresponding wireless link, to assign each audio sink a corresponding audio channel, to synchronize timing with each audio sink, and to transmit audio information for each audio channel to a corresponding audio sink via a corresponding wireless link. The audio source may inquire as to supported audio parameters, such as sample rate and audio codec, and selects a commonly supported configuration. The audio source may separate audio information into queues for each audio channel for each audio sink. The audio source transmits frames with timestamps and a common start time for synchronization, and the audio sinks use this information to synchronize timing and remain virtually synchronized with each other.

...read moreread less

Patent•

Speech audio processing

[...]

Willem M. Beltman¹, Matías Zañartu, Arijit Raychowdhury, Anand P. Rangarajan, Michael E. Deisher - Show less +1 more•Institutions (1)

Intel¹

30 Jun 2011

TL;DR: In this paper, a speech processing engine is provided that employs Kalman filtering with a particular speaker's glottal information to clean up an audio speech signal for more efficient automatic speech recognition.

...read moreread less

Abstract: A speech processing engine is provided that in some embodiments, employs Kalman filtering with a particular speaker's glottal information to clean up an audio speech signal for more efficient automatic speech recognition.

...read moreread less

Patent•

System and method for rapid customization of speech recognition models

[...]

Srinivas Bangalore¹, Robert M. Bell¹, Diamantino Caseiro¹, Mazin E. Gilbert¹, Patrick Haffner¹ - Show less +1 more•Institutions (1)

Nuance Communications¹

28 Mar 2011

TL;DR: In this paper, the authors present a method for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition model when a speech recognizer does not have access to a speech recognition system for that domain of the interest and when available domain specific data is below a minimum desired threshold.

...read moreread less

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.

...read moreread less

Journal Article•DOI•

Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion

[...]

Taras Butko¹, Climent Nadeu¹•Institutions (1)

Polytechnic University of Catalonia¹

17 Jun 2011-Eurasip Journal on Audio, Speech, and Music Processing

TL;DR: The evaluation of broadcast news audio segmentation systems carried out in the context of the Albayzín-2010 evaluation campaign is presented, with the aim of gaining an insight into the proposed solutions, and looking for directions which are promising.

...read moreread less

Abstract: Recently, audio segmentation has attracted research interest because of its usefulness in several applications like audio indexing and retrieval, subtitling, monitoring of acoustic scenes, etc. Moreover, a previous audio segmentation stage may be useful to improve the robustness of speech technologies like automatic speech recognition and speaker diarization. In this article, we present the evaluation of broadcast news audio segmentation systems carried out in the context of the Albayzin-2010 evaluation campaign. That evaluation consisted of segmenting audio from the 3/24 Catalan TV channel into five acoustic classes: music, speech, speech over music, speech over noise, and the other. The evaluation results displayed the difficulty of this segmentation task. In this article, after presenting the database and metric, as well as the feature extraction methods and segmentation techniques used by the submitted systems, the experimental results are analyzed and compared, with the aim of gaining an insight into the proposed solutions, and looking for directions which are promising.

...read moreread less

Patent•

Encoder for audio signal including generic audio and speech frames

[...]

Udar Mittal¹, Jonathan Alastair Gibbs¹, James P. Ashley¹•Institutions (1)

Motorola¹

01 Mar 2011

TL;DR: In this paper, a method for encoding audio frames by producing a first frame of coded audio samples by coding a first audio frame in a sequence of frames, producing at least a portion of a second frame of audio samples, and producing parameters for generating audio gap filler samples.

...read moreread less

Abstract: A method for encoding audio frames by producing a first frame of coded audio samples by coding a first audio frame in a sequence of frames, producing at least a portion of a second frame of coded audio samples by coding at least a portion of a second audio frame in the sequence of frames, and producing parameters for generating audio gap filler samples, wherein the parameters are representative of either a weighted segment of the first frame of coded audio samples or a weighted segment of the portion of the second frame of coded audio samples.

...read moreread less

Journal Article•DOI•

Single-Channel and Multi-Channel Sinusoidal Audio Coding Using Compressed Sensing

[...]

Anthony Griffin¹, Toni Hirvonen, Christos Tzagkarakis¹, Athanasios Mouchtaris¹, Panagiotis Tsakalides¹ - Show less +1 more•Institutions (1)

University of Crete¹

01 Jul 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper applies the CS methodology to sinusoidally modeled audio signals, and proposes encoding few randomly selected samples of the time-domain description of the sinusoidal component (per signal segment).

...read moreread less

Abstract: Compressed sensing (CS) samples signals at a much lower rate than the Nyquist rate if they are sparse in some basis. In this paper, the CS methodology is applied to sinusoidally modeled audio signals. As this model is sparse by definition in the frequency domain (being equal to the sum of a small number of sinusoids), we investigate whether CS can be used to encode audio signals at low bitrates. In contrast to encoding the sinusoidal parameters (amplitude, frequency, phase) as current state-of-the-art methods do, we propose encoding few randomly selected samples of the time-domain description of the sinusoidal component (per signal segment). The potential of applying compressed sensing both to single-channel and multi-channel audio coding is examined. The listening test results are encouraging, indicating that the proposed approach can achieve comparable performance to that of state-of-the-art methods. Given that CS can lead to novel coding systems where the sampling and compression operations are combined into one low-complexity step, the proposed methodology can be considered as an important step towards applying the CS framework to audio coding applications.

...read moreread less

Patent•

Automatic detection of audio compression parameters

[...]

Aaron M. Eppolito¹•Institutions (1)

Apple Inc.¹

23 Aug 2011

TL;DR: In this article, a method for dynamic range compression of the audio content is presented, based on the analysis of audio content, the method generates a setting for an audio compressor that compresses the dynamic range of audio contents.

...read moreread less

Abstract: For a media clip that includes audio content, a novel method for performing dynamic range compression of the audio content is presented. The method performs an analysis of the audio content. Based on the analysis of the audio content, the method generates a setting for an audio compressor that compresses the dynamic range of the audio content. The generated setting includes a set of audio compression parameters that include a noise gating threshold parameter (“noise gate”), a dynamic range compression threshold parameter (“threshold”), and a dynamic range compression ratio parameter (“ratio”).

...read moreread less

Collapse