Home
/
Topics
/
Speech coding

Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Patent•DOI•

Speech coding/decoding method having an excitation signal

[...]

Kazunori Ozawa¹•Institutions (1)

NEC¹

20 Jul 1990-Journal of the Acoustical Society of America

TL;DR: In this article, a speech coding method in which spectrum parameters representing a spectrum envelope and a pitch parameter representing a pitch are obtained from an input discrete speech signal is presented. And a frame interval is divided into subintervals in accordance with the pitch parameter.

...read moreread less

Abstract: A speech coding method in which spectrum parameter representing a spectrum envelope and a pitch parameter representing a pitch are obtained from an input discrete speech signal. A frame interval is divided into subintervals in accordance with the pitch parameter. A sound source signal in one of the subintervals is obtained by obtaining a multipulse with respect to a difference signal obtained by performing prediction on the basis of a past sound source signal. Correction information for correcting at least one of the amplitude and the phase of the sound source signal are obtained and output in other pitch intervals in the frame.

...read moreread less

183 citations

Patent•

Multi-channel audio decoder

[...]

Smyth Stephen M, Smyth Michael H, Smith William Paul

16 Dec 1997

TL;DR: A subband audio coder employs perfect/nonperfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean square error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio as mentioned in this paper.

...read moreread less

Abstract: A subband audio coder employs perfect/non-perfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean-square-error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio. The audio coder windows the multi-channel audio signal such that the frame size, i.e. number of bytes, is constrained to lie in a desired range, and formats the encoded data so that the individual subframes can be played back as they are received thereby reducing latency. Furthermore, the audio coder processes the baseband portion (0-24 kHz) of the audio bandwidth for sampling frequencies of 48 kHz and higher with the same encoding/decoding algorithm so that audio coder architecture is future compatible.

...read moreread less

183 citations

Proceedings Article•DOI•

Audio-visual deep learning for noise robust speech recognition

[...]

Jing Huang¹, Brian Kingsbury¹•Institutions (1)

IBM¹

26 May 2013

TL;DR: This work uses DBNs for audio-visual speech recognition; in particular, it uses deep learning from audio and visual features for noise robust speech recognition and test two methods for using DBN’s in a multimodal setting.

...read moreread less

Abstract: Deep belief networks (DBN) have shown impressive improvements over Gaussian mixture models for automatic speech recognition. In this work we use DBNs for audio-visual speech recognition; in particular, we use deep learning from audio and visual features for noise robust speech recognition. We test two methods for using DBNs in a multimodal setting: a conventional decision fusion method that combines scores from single-modality DBNs, and a novel feature fusion method that operates on mid-level features learned by the single-modality DBNs. On a continuously spoken digit recognition task, our experiments show that these methods can reduce word error rate by as much as 21% relative over a baseline multi-stream audio-visual GMM/HMM system.

...read moreread less

182 citations

Journal Article•DOI•

Theoretical analysis of the high-rate vector quantization of LPC parameters

[...]

Gardner William R¹, Bhaskar D. Rao•Institutions (1)

Qualcomm¹

01 Sep 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: A theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures is presented, and the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems is described.

...read moreread less

Abstract: The paper presents a theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures, and describes the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems. First, it is shown that in many high-rate VQ systems the quantization distortion approaches a simple quadratically weighted error measure, where the weighting matrix is a "sensitivity matrix" that is an extension of the concept of the scalar sensitivity. The approximate performance of VQ systems that train and quantize using mismatched distortion measures is derived, and is used to construct better distortion measures. Second, these results are used to determine the performance of LPC vector quantizers, as measured by the log spectral distortion (LSD) measure, which have been trained using other error measures, such as mean-squared (MSE) or weighted mean-squared error (WMSE) measures of LEPC parameters, reflection coefficients and transforms thereof, and line spectral pair (LSP) frequencies. Computationally efficient algorithms for computing the sensitivity matrices of these parameters are described. In particular, it is shown that the sensitivity matrix for the LSP frequencies is diagonal, implying that a WMSE measured LSP frequencies converges to the LSD measure in high-rate VQ systems. Experimental results to support the theoretical performance estimates are provided. >

...read moreread less

182 citations

Proceedings Article•DOI•

Robust spread-spectrum audio watermarking

[...]

Darko Kirovski¹, Henrique S. Malvar¹•Institutions (1)

Microsoft¹

07 May 2001

TL;DR: In this paper, the authors present several mechanisms that enable effective spread-spectrum audio watermarking systems: prevention against detection desynchronization, cepstrum filtering, and chess watermarks.

...read moreread less

Abstract: We present several mechanisms that enable effective spread-spectrum audio watermarking systems: prevention against detection desynchronization, cepstrum filtering, and chess watermarks. We have incorporated these techniques into a system capable of reliably detecting a watermark in an audio clip that has been modified using a composition of attacks that degrade the original audio characteristics well beyond the limit of acceptable quality. Such attacks include: fluctuating scaling in the time and frequency domain, compression, addition and multiplication of noise, resampling, requantization, normalization, filtering, and random cutting and pasting of signal samples.

...read moreread less

182 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
…
37
38
39
40
41
42
43
…
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics