Proceedings ArticleDOI
Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings
Neeraj N Sajjan,Shobhana Ganesh,Neeraj Sharma,Sriram Ganapathy,Neville Ryant +4 more
- pp 5249-5253
TLDR
This paper proposes detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models that learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments.Abstract:
The detection of overlapping speech segments is of key importance in speech applications involving analysis of multi-party conversations. The detection problem is challenging because overlapping speech segments are typically captured as short speech utterances far-field microphone recordings. In this paper, we propose detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models. The neural network architecture learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments. In order to evaluate the model performance, we perform experiments on simulated overlapped speech generated from the TIMIT database, and natural multi-talker conversational speech in the augmented Multiparty Interaction (AMI) meeting corpus. The proposed approach yields improvements over a Gaussian mixture model based overlap detection system. Furthermore, as an application of overlap detection, integration of overlap detection into speaker diarization task is shown to give improvement in diarization error rate.read more
Citations
More filters
Book ChapterDOI
Detection of Overlapping Speech for the Purposes of Speaker Diarization
TL;DR: This paper employs a convolutional neural network for the detection of overlapping speech intervals and evaluates it in terms of the potential improvements to speaker diarization.
Proceedings ArticleDOI
Detecting and counting overlapping speakers in distant speech scenarios
TL;DR: A Temporal Convolu-tional Network (TCN) based method is designed to address the problem of detecting the activity and counting overlapping speakers in distant-microphone recordings, and it is shown that TCNs significantly outperform state-of-the-art methods on two real-world distant speech datasets.
Proceedings ArticleDOI
Enhancement and Analysis of Conversational Speech: JSALT 2017
Neville Ryanta,Elika Bergelson,Kenneth Church,Alejandrina Cristia,Jun Du,Sriram Ganapathy,Sanjeev Khudanpur,Diana Kowalski,Mahesh Krishnamoorthy,Rajat Kulshreshta,Mark Liberman,Yu-Ding Lu,Matthew Maciejewski,Florian Metze,Jan Profant,Lei Sun,Yu Tsao,Zhou Yu +17 more
TL;DR: During the JSALT Summer Workshop at CMU in 2017, an international team of researchers worked on several aspects of this problem, including calibration of the state of the art, detection of overlaps, enhancement of noisy recordings, and classification of shorter speech segments.
Proceedings ArticleDOI
End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform
TL;DR: An end-to-end framework for overlapped speech detection and speaker counting is proposed, which extracts features from the raw waveform directly and a curriculum learning strategy is applied to make better use of the training data.
References
More filters
Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Proceedings ArticleDOI
Speech recognition with deep recurrent neural networks
TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Proceedings Article
The Kaldi Speech Recognition Toolkit
Daniel Povey,Arnab Ghoshal,Gilles Boulianne,Lukas Burget,Ondrej Glembek,Nagendra Kumar Goel,Mirko Hannemann,Petr Motlicek,Yanmin Qian,Petr Schwarz,Jan Silovsky,Georg Stemmer,Karel Vesely +12 more
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Posted Content
Speech Recognition with Deep Recurrent Neural Networks
TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Posted Content
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
TL;DR: This paper proposes the convolutional LSTM (ConvLSTM) and uses it to build an end-to-end trainable model for the precipitation nowcasting problem and shows that it captures spatiotemporal correlations better and consistently outperforms FC-L STM and the state-of-the-art operational ROVER algorithm.