scispace - formally typeset
Proceedings ArticleDOI

Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings

TLDR
This paper proposes detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models that learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments.
Abstract
The detection of overlapping speech segments is of key importance in speech applications involving analysis of multi-party conversations. The detection problem is challenging because overlapping speech segments are typically captured as short speech utterances far-field microphone recordings. In this paper, we propose detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models. The neural network architecture learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments. In order to evaluate the model performance, we perform experiments on simulated overlapped speech generated from the TIMIT database, and natural multi-talker conversational speech in the augmented Multiparty Interaction (AMI) meeting corpus. The proposed approach yields improvements over a Gaussian mixture model based overlap detection system. Furthermore, as an application of overlap detection, integration of overlap detection into speaker diarization task is shown to give improvement in diarization error rate.

read more

Citations
More filters
Book ChapterDOI

Detection of Overlapping Speech for the Purposes of Speaker Diarization

TL;DR: This paper employs a convolutional neural network for the detection of overlapping speech intervals and evaluates it in terms of the potential improvements to speaker diarization.
Proceedings ArticleDOI

Detecting and counting overlapping speakers in distant speech scenarios

TL;DR: A Temporal Convolu-tional Network (TCN) based method is designed to address the problem of detecting the activity and counting overlapping speakers in distant-microphone recordings, and it is shown that TCNs significantly outperform state-of-the-art methods on two real-world distant speech datasets.
Proceedings ArticleDOI

End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform

TL;DR: An end-to-end framework for overlapped speech detection and speaker counting is proposed, which extracts features from the raw waveform directly and a curriculum learning strategy is applied to make better use of the training data.
References
More filters
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Proceedings ArticleDOI

Speech recognition with deep recurrent neural networks

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Proceedings Article

The Kaldi Speech Recognition Toolkit

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Posted Content

Speech Recognition with Deep Recurrent Neural Networks

TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Posted Content

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

TL;DR: This paper proposes the convolutional LSTM (ConvLSTM) and uses it to build an end-to-end trainable model for the precipitation nowcasting problem and shows that it captures spatiotemporal correlations better and consistently outperforms FC-L STM and the state-of-the-art operational ROVER algorithm.
Related Papers (5)