Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination

doi:10.1109/TASL.2012.2194283

Journal ArticleDOI

Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination

Armando Muscariello, +2 more

- 01 Sep 2012 -

IEEE Transactions on Audio, Speech, and ...

- Vol. 20, Iss: 7, pp 2031-2044

TLDR

A computational architecture to discover and collect occurrences of speech repetitions, or motifs, in a totally unsupervised fashion, that is in the absence of acoustic, lexical or pronunciation modeling and training material is described and evaluated.

Abstract:

This paper describes and evaluates a computational architecture to discover and collect occurrences of speech repetitions, or motifs, in a totally unsupervised fashion, that is in the absence of acoustic, lexical or pronunciation modeling and training material. In the last few years, this task has known an increasing interest from the speech community because of a) its potential applicability in spoken document processing (as a preliminary step to summarization, topic clustering, etc.) and b) its novel methodology, that defines a new paradigm to speech processing that circumvents the issues common to all supervised, trained technologies. The contributions implied by the proposed system are two-fold: 1) the design of a discovery strategy that detects repetitions by extending matches of motif fragments, called seeds; 2) the implementation of template matching techniques to detect acoustically close segments, based on dynamic time warping (DTW) and self-similarity matrix (SSM) comparison of speech templates, in contrast to the decoding procedures of model-based recognition systems. The architecture is thoroughly evaluated on several hours of French broadcast news shows according to various parameter settings and acoustic features, namely mel-frequency cepstral coefficients (MFCCs) and different types of posteriorgrams: Gaussian mixture model (GMM)-based, and phone-based posteriors, in both language-matched and mismatched conditions. The evaluation highlights a) the improved robustness of the system that jointly employs DTW and SSM and b) the relevant impact of language-specific features to acoustic similarity detection based on template matching.

Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination

Citations

The Zero Resource Speech Challenge 2015

Unsupervised word discovery from speech using automatic segmentation into syllable-like units

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.

Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems

Language independent search in MediaEval's spoken web search task

References

Histograms of oriented gradients for human detection

Fast algorithm for detecting community structure in networks.

Social Signal Processing

Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm.

Cross-View Action Recognition from Temporal Self-similarities

Related Papers (5)

Unsupervised Pattern Discovery in Speech

NLP on Spoken Documents Without ASR

Towards Spoken Term Discovery At Scale With Zero Resources

Efficient spoken term discovery using randomized algorithms

Spoken WordCloud: Clustering recurrent patterns in speech