Topic
Voice activity detection
About: Voice activity detection is a research topic. Over the lifetime, 12784 publications have been published within this topic receiving 272632 citations. The topic is also known as: speech activity detection & speech detection.
Papers published on a yearly basis
Papers
More filters
•
TL;DR: This article used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and trained a multi-class classifier to distinguish hate speech from other offensive language, finding that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive.
Abstract: A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can reliably separate hate speech from other offensive language and when this differentiation is more difficult. We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify.
871 citations
••
869 citations
•
TL;DR: This report introduces a new corpus of music, speech, and noise suitable for training models for voice activity detection (VAD) and music/speech discrimination and demonstrates use of this corpus on Broadcast news and VAD for speaker identification.
Abstract: This report introduces a new corpus of music, speech, and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. Our corpus is released under a flexible Creative Commons license. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises. We demonstrate use of this corpus for music/speech discrimination on Broadcast news and VAD for speaker identification.
855 citations
••
TL;DR: In this paper, a spectral decomposition of a frame of noisy speech is used to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise.
Abstract: One way of enhancing speech in an additive acoustic noise environment is to perform a spectral decomposition of a frame of noisy speech and to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise. Using a two-state model for the speech event (speech absent or speech present) and using the maximum likelihood estimator of the magnitude of the speech spectrum results in a new class of suppression curves which permits a tradeoff of noise suppression against speech distortion. The algorithm has been implemented in real time in the time domain, exploiting the structure of the channel vocoder. Extensive testing has shown that the noise can be made imperceptible by proper choice of the suppression factor.
854 citations
01 Jan 2002
TL;DR: It is shown that in nonstationary noise environments and under low SNR conditions, the IMCRA approach is very effective, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.
Abstract: Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper, we present an Improved Minima Con- trolled Recursive Averaging (IMCRA) approach, for noise es- timation in adverse environments involving non-stationary noise, weak speech components, and low input signal-to- noise ratio (SNR). The noise estimate is obtained by av- eraging past spectral power values, using a time-varying frequency-dependent smoothing parameter that is adjusted by the signal presence probability. The speech presence probability is controlled by the minima values of a smoothed periodogram. The proposed procedure comprises two iter- ations of smoothing and minimum tracking. The rst it- eration provides a rough voice activity detection in each frequency band. Then, smoothing in the second iteration excludes relatively strong speech components, which makes the minimum tracking during speech activity robust. We show that in non-stationary noise environments and under low SNR conditions, the IMCRA approach is very eectiv e. In particular, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.
834 citations