Topic
Speech coding
About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.
Papers published on a yearly basis
Papers
More filters
••
IBM1
TL;DR: This study modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio in the CueVideo system.
Abstract: The role of audio in the context of multimedia applications involving video is becoming increasingly important. Many efforts in this area focus on audio data that contains some built-in semantic information structure such as in broadcast news, or focus on classification of audio that contains a single type of sound such as cleaar speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other types of background sounds. Segmentation of mixed audio has applications in detection of story boundaries in video, spoken document retrieval systems, audio retrieval systems etc. We modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio. Our preliminary experimental results show that we can achieve a classification accuracy of over 80% for such mixed audio. Our study also provides us with several helpful insights related to analyzing mixed audio in the context of real applications.
75 citations
01 Sep 1999
TL;DR: The second part of this paper will focus on the large number of possible choices for the quantization and coding methods for perceptual audio coding along with examples of real-world systems using these approaches.
Abstract: Perceptual audio coding has become an important key technology for many types of multimedia services these days. This paper provides a brief tutorial introduction into a number of issues as they arise in todayOs low bitrate audio coders. After discussing the Temporal Noise Shaping technology in the first part of this paper, the second part will focus on the large number of possible choices for the quantization and coding methods for perceptual audio coding along with examples of real-world systems using these approaches.
75 citations
••
TL;DR: The data from both experiments combined indicate that, in contrast to normal hearing, timing cues available from natural head-width delays do not offer binaural advantages with present methods of electrical stimulation, even when fine-timing cues are explicitly coded.
Abstract: Four adult bilateral cochlear implant users, with good open-set sentence recognition, were tested with three different sound coding strategies for binaural speech unmasking and their ability to localize 100 and 500 Hz click trains in noise. Two of the strategies tested were envelope-based strategies that are clinically widely used. The third was a research strategy that additionally preserved fine-timing cues at low frequencies. Speech reception thresholds were determined in diotic noise for diotic and interaurally time-delayed speech using direct audio input to a bilateral research processor. Localization in noise was assessed in the free field. Overall results, for both speech and localization tests, were similar with all three strategies. None provided a binaural speech unmasking advantage due to the application of 700 micros interaural time delay to the speech signal, and localization results showed similar response patterns across strategies that were well accounted for by the use of broadband interaural level cues. The data from both experiments combined indicate that, in contrast to normal hearing, timing cues available from natural head-width delays do not offer binaural advantages with present methods of electrical stimulation, even when fine-timing cues are explicitly coded.
75 citations
••
17 Oct 1999TL;DR: The thresholds from this algorithm are compared to those produced from a clean speech estimate from a variety of common spectral subtraction algorithms, and the relationship between those from the corrupted speech and corrupting noise is examined.
Abstract: We propose a new method for the estimation of clean speech masking thresholds for speech enhancement. These thresholds are applied to a perceptually based spectral subtraction algorithm to enhance speech in a non-stationary noise environment. In contrast to other approaches we do not directly use an estimate of the clean speech to obtain the masking thresholds, but examine the relationship between those from the corrupted speech and corrupting noise. The thresholds from this algorithm are compared to those produced from a clean speech estimate from a variety of common spectral subtraction algorithms.
74 citations
•
02 Mar 2005TL;DR: In this article, a method for reducing noise disturbance associated with an audio signal received through a microphone is provided, which initiates with magnifying a noise disturbance relative to a remaining component of the audio signal.
Abstract: A method for reducing noise disturbance associated with an audio signal received through a microphone is provided. The method initiates with magnifying a noise disturbance of the audio signal relative to a remaining component of the audio signal. Then, a sampling rate of the audio signal is decreased. Next, an even order derivative is applied to the audio signal having the decreased sampling rate to define a detection signal. Then, the noise disturbance of the audio signal is adjusted according to a statistical average of the detection signal. A system capable of canceling disturbances associated with an audio signal, a video game controller, and an integrated circuit for reducing noise disturbances associated with an audio signal are included.
74 citations