Home
/
Topics
/
Voice activity detection

Topic

Voice activity detection

About: Voice activity detection is a research topic. Over the lifetime, 12784 publications have been published within this topic receiving 272632 citations. The topic is also known as: speech activity detection & speech detection.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

Automated Hate Speech Detection and the Problem of Offensive Language

[...]

Thomas Davidson¹, Dana Warmsley¹, Michael W. Macy¹, Ingmar Weber²•Institutions (2)

Cornell University¹, Khalifa University²

11 Mar 2017-arXiv: Computation and Language

TL;DR: This article used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and trained a multi-class classifier to distinguish hate speech from other offensive language, finding that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive.

...read moreread less

Abstract: A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can reliably separate hate speech from other offensive language and when this differentiation is more difficult. We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify.

...read moreread less

871 citations

Journal Article•DOI•

Digital Coding of Waveforms: Principles and Applications to Speech and Video

[...]

Nuggehally Sampath Jayant, P. Noll

01 Nov 1990-Signal Processing

869 citations

Posted Content•

MUSAN: A Music, Speech, and Noise Corpus.

[...]

David Snyder, Guoguo Chen, Daniel Povey

28 Oct 2015-arXiv: Sound

TL;DR: This report introduces a new corpus of music, speech, and noise suitable for training models for voice activity detection (VAD) and music/speech discrimination and demonstrates use of this corpus on Broadcast news and VAD for speaker identification.

...read moreread less

Abstract: This report introduces a new corpus of music, speech, and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. Our corpus is released under a flexible Creative Commons license. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises. We demonstrate use of this corpus for music/speech discrimination on Broadcast news and VAD for speaker identification.

...read moreread less

855 citations

Journal Article•DOI•

Speech enhancement using a soft-decision noise suppression filter

[...]

R. McAulay, M. Malpass¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1980-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this paper, a spectral decomposition of a frame of noisy speech is used to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise.

...read moreread less

Abstract: One way of enhancing speech in an additive acoustic noise environment is to perform a spectral decomposition of a frame of noisy speech and to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise. Using a two-state model for the speech event (speech absent or speech present) and using the maximum likelihood estimator of the magnitude of the speech spectrum results in a new class of suppression curves which permits a tradeoff of noise suppression against speech distortion. The algorithm has been implemented in real time in the time domain, exploiting the structure of the channel vocoder. Extensive testing has shown that the noise can be made imperceptible by proper choice of the suppression factor.

...read moreread less

854 citations

Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled

[...]

Recursive Averaging, Israel Cohen

01 Jan 2002

TL;DR: It is shown that in nonstationary noise environments and under low SNR conditions, the IMCRA approach is very effective, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.

...read moreread less

Abstract: Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper, we present an Improved Minima Con- trolled Recursive Averaging (IMCRA) approach, for noise es- timation in adverse environments involving non-stationary noise, weak speech components, and low input signal-to- noise ratio (SNR). The noise estimate is obtained by av- eraging past spectral power values, using a time-varying frequency-dependent smoothing parameter that is adjusted by the signal presence probability. The speech presence probability is controlled by the minima values of a smoothed periodogram. The proposed procedure comprises two iter- ations of smoothing and minimum tracking. The rst it- eration provides a rough voice activity detection in each frequency band. Then, smoothing in the second iteration excludes relatively strong speech components, which makes the minimum tracking during speech activity robust. We show that in non-stationary noise environments and under low SNR conditions, the IMCRA approach is very eectiv e. In particular, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.

...read moreread less

834 citations

1
…
2
3
4
5
6
7
8
…
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

13,169

Papers

294,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	121
2022	266
2021	301
2020	300
2019	262
2018	238

Voice activity detection

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics