scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Proceedings ArticleDOI
TL;DR: A digital audio watermarking scheme of low complexity is proposed in this research as an effective way to deter users from misusing or illegally distributing audio data.
Abstract: Digital audio watermarking embeds inaudible information into digital audio data for the purposes of copyright protection, ownership verification, covert communication, and/or auxiliary data carrying. In this paper, we first describe the desirable characteristics of digital audio watermarks. Previous work on audio watermarking, which has primarily focused on the inaudibility of the embedded watermark and its robustness against attacks such as compression and noise, is then reviewed. In this research, special attention is paid to the synchronization attack caused by casual audio editing or malicious random cropping, which is a low-cost yet effective attack to watermarking algorithms developed before. A digital audio watermarking scheme of low complexity is proposed in this research as an effective way to deter users from misusing or illegally distributing audio data. The proposed scheme is based on audio content analysis using the wavelet filterbank while the watermark is embedded in the Fourier transform domain. A blind watermark detection technique is developed to identify the embedded watermark under various types of attacks.© (2000) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

62 citations

Journal ArticleDOI
TL;DR: The authors describe a novel approach to speech recognition by directly modeling the statistical characteristics of the speech waveforms, which allows them to remove the need for using speech preprocessors, which conventionally serve a role of converting speech waves into frame-based speech data subject to a subsequent modeling process.
Abstract: The authors describe a novel approach to speech recognition by directly modeling the statistical characteristics of the speech waveforms. This approach allows them to remove the need for using speech preprocessors, which conventionally serve a role of converting speech waveforms into frame-based speech data subject to a subsequent modeling process. Central to their method is the representation of the speech waveforms as the output of a time-varying filter excited by a Gaussian source time-varying in its power. In order to formulate a speech recognition algorithm based on this representation, the time variation in the characteristics of the filter and of the excitation source is described in a compact and parametric form of the Markov chain. They analyze in detail the comparative roles played by the filter modeling and by the source modeling in speech recognition performance. Based on the result of the analysis, they propose and evaluate a normalization procedure intended to remove the sensitivity of speech recognition accuracy to often uncontrollable speech power variations. The effectiveness of the proposed speech-waveform modeling approach is demonstrated in a speaker-dependent, discrete-utterance speech recognition task involving 18 highly confusable stop consonant-vowel syllables. The high accuracy obtained shows promising potentials of the proposed time-domain waveform modeling technique for speech recognition. >

62 citations

Proceedings ArticleDOI
20 Jun 1999
TL;DR: Novel solutions for pre-processing noisy speech prior to low bit rate speech coding using a new adaptive limiting algorithm for the a priori signal-to-noise ratio (SNR) estimate and a novel overlap/add scheme are presented.
Abstract: In this paper we present novel solutions for pre-processing noisy speech prior to low bit rate speech coding. We strive especially to improve the estimation of spectral parameters and to reduce the additional algorithmic delay caused by the enhancement pre-processor. While the former is achieved using a new adaptive limiting algorithm for the a priori signal-to-noise ratio (SNR) estimate, the latter makes use of a novel overlap/add scheme. Our enhancement techniques were evaluated in conjunction with the 2400 bps mixed excitation linear prediction (MELP) coder by means of formal and informal listening tests.

62 citations

Patent
Eiji Kawahara1
14 Jun 1999
TL;DR: In this article, an audio coding method which is capable of creating coded data of high-quality with no discontinuity in real time without being affected by processing ability of a CPU on a personal computer and how much another application occupies processing on the CPU, in a scheme in which a digital audio signal is divided into plural frequency bands and a coding process is performed for each subband.
Abstract: There is provided an audio coding method which is capable of creating coded data of high-quality with no discontinuity in real time without being affected by processing ability of a CPU on a personal computer and how much another application occupies processing on the CPU, in a scheme in which a digital audio signal is divided into plural frequency bands and a coding process is performed for each subband. In order to generate bit allocation information for each of plural frequency subbands into which a digital audio signal is divided, employed are a process for performing bit allocation with high efficiency using a relationship of a signal to mask based on a predetermined psychoacoustic model and a process for performing bit allocation with a lower load. According to processing amount information of the CPU which is occupied by a coding process, bit allocation means to-be-used is changed.

62 citations

Journal Article
TL;DR: In this paper, an overview of various nonlinear processing techniques applied to speech signals is presented, including speech coding, speech synthesis, speech and speaker recognition, voice analysis and enhancement, and analyses and simulation of dysphonic voices.
Abstract: This article presents an overview of various nonlinear processing techniques applied to speech signals. Evidence relating to the existence of nonlinearities in speech is presented, and the main differences between linear and nonlinear analysis are summarized. A brief review is given of the important nonlinear speech processing techniques reported to date, and their applications to speech coding, speech synthesis, speech and speaker recognition, voice analysis and enhancement, and analyses and simulation of dysphonic voices.

62 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108