Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings

doi:10.1109/ICASSP.2018.8462548

Proceedings ArticleDOI

Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings

- pp 5249-5253

TLDR

This paper proposes detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models that learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments.

Abstract:

The detection of overlapping speech segments is of key importance in speech applications involving analysis of multi-party conversations. The detection problem is challenging because overlapping speech segments are typically captured as short speech utterances far-field microphone recordings. In this paper, we propose detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models. The neural network architecture learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments. In order to evaluate the model performance, we perform experiments on simulated overlapped speech generated from the TIMIT database, and natural multi-talker conversational speech in the augmented Multiparty Interaction (AMI) meeting corpus. The proposed approach yields improvements over a Gaussian mixture model based overlap detection system. Furthermore, as an application of overlap detection, integration of overlap detection into speaker diarization task is shown to give improvement in diarization error rate.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Detection of Overlapping Speech for the Purposes of Speaker Diarization

Marie Kunešová, +3 more

TL;DR: This paper employs a convolutional neural network for the detection of overlapping speech intervals and evaluates it in terms of the potential improvements to speaker diarization.

...read moreread less

Proceedings ArticleDOI

Detecting and counting overlapping speakers in distant speech scenarios

Samuele Cornell, +3 more

TL;DR: A Temporal Convolu-tional Network (TCN) based method is designed to address the problem of detecting the activity and counting overlapping speakers in distant-microphone recordings, and it is shown that TCNs significantly outperform state-of-the-art methods on two real-world distant speech datasets.

...read moreread less

Proceedings ArticleDOI

Enhancement and Analysis of Conversational Speech: JSALT 2017

Neville Ryanta, +17 more

TL;DR: During the JSALT Summer Workshop at CMU in 2017, an international team of researchers worked on several aspects of this problem, including calibration of the state of the art, detection of overlaps, enhancement of noisy recordings, and classification of shorter speech segments.

...read moreread less

Proceedings ArticleDOI

End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform

Wangyou Zhang, +3 more

TL;DR: An end-to-end framework for overlapped speech detection and speaker counting is proposed, which extracts features from the raw waveform directly and a curriculum learning strategy is applied to make better use of the training data.

...read moreread less

Proceedings ArticleDOI

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings

Shiliang Zhang, +6 more

References

PDF

Open Access

More filters

Journal Article

Visualizing Data using t-SNE

Laurens van der Maaten, +1 more

- 01 Jan 2008 -

Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

Proceedings ArticleDOI

Speech recognition with deep recurrent neural networks

Alex Graves, +2 more

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Proceedings Article

The Kaldi Speech Recognition Toolkit

Daniel Povey, +12 more

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

...read moreread less

Posted Content

Speech Recognition with Deep Recurrent Neural Networks

Alex Graves, +2 more

- 22 Mar 2013 -

arXiv: Neural and Evolutionary Computing

TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Posted Content

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

Xingjian Shi, +5 more

- 13 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes the convolutional LSTM (ConvLSTM) and uses it to build an end-to-end trainable model for the precipitation nowcasting problem and shows that it captures spatiotemporal correlations better and consistently outperforms FC-L STM and the state-of-the-art operational ROVER algorithm.

...read moreread less

Collapse

IEEE Transactions on Audio, Speech, and ...

Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection

Latane Bullock, +2 more

Unsupervised Learning of Overlapped Speech Model Parameters For Multichannel Speech Activity Detection in Meetings

Kornel Laskowski, +1 more

Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings

Citations

Detection of Overlapping Speech for the Purposes of Speaker Diarization

Detecting and counting overlapping speakers in distant speech scenarios

Enhancement and Analysis of Conversational Speech: JSALT 2017

End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform

Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings

References

Visualizing Data using t-SNE

Speech recognition with deep recurrent neural networks

The Kaldi Speech Recognition Toolkit

Speech Recognition with Deep Recurrent Neural Networks

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

Related Papers (5)

Librispeech: An ASR corpus based on public domain audio books

Detecting Overlapped Speech on Short Timeframes Using Deep Learning.

CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning

Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection

Unsupervised Learning of Overlapped Speech Model Parameters For Multichannel Speech Activity Detection in Meetings