scispace - formally typeset
Journal ArticleDOI

Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination

TLDR
A computational architecture to discover and collect occurrences of speech repetitions, or motifs, in a totally unsupervised fashion, that is in the absence of acoustic, lexical or pronunciation modeling and training material is described and evaluated.
Abstract
This paper describes and evaluates a computational architecture to discover and collect occurrences of speech repetitions, or motifs, in a totally unsupervised fashion, that is in the absence of acoustic, lexical or pronunciation modeling and training material. In the last few years, this task has known an increasing interest from the speech community because of a) its potential applicability in spoken document processing (as a preliminary step to summarization, topic clustering, etc.) and b) its novel methodology, that defines a new paradigm to speech processing that circumvents the issues common to all supervised, trained technologies. The contributions implied by the proposed system are two-fold: 1) the design of a discovery strategy that detects repetitions by extending matches of motif fragments, called seeds; 2) the implementation of template matching techniques to detect acoustically close segments, based on dynamic time warping (DTW) and self-similarity matrix (SSM) comparison of speech templates, in contrast to the decoding procedures of model-based recognition systems. The architecture is thoroughly evaluated on several hours of French broadcast news shows according to various parameter settings and acoustic features, namely mel-frequency cepstral coefficients (MFCCs) and different types of posteriorgrams: Gaussian mixture model (GMM)-based, and phone-based posteriors, in both language-matched and mismatched conditions. The evaluation highlights a) the improved robustness of the system that jointly employs DTW and SSM and b) the relevant impact of language-specific features to acoustic similarity detection based on template matching.

read more

Citations
More filters
Proceedings ArticleDOI

The Zero Resource Speech Challenge 2015

TL;DR: The Interspeech 2015 Zero Resource Speech Challenge aims at discovering subword and word units from raw speech The challenge provides the first unified and open source suite of evaluation metrics and data sets to compare and analyse the results of unsupervised linguistic unit discovery algorithms as discussed by the authors.
Proceedings ArticleDOI

Unsupervised word discovery from speech using automatic segmentation into syllable-like units

TL;DR: A syllable-based approach to unsupervised pattern discovery from speech is presented, able to limit potential word onsets and offsets to a finite number of candidate locations by first segmenting speech into syllables-like units.
Proceedings ArticleDOI

Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.

TL;DR: Experimental results show that the unsupervised DPGMM posteriorgrams obviously outperformMFCC, and perform comparably to the posterior grams derived from language-mismatched phoneme recognizers in terms of the error rate of ABX discrimination test.
Proceedings Article

Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems

TL;DR: This work first transforms the speech-based output into a symbolic representation and compute five types of evaluation metrics on this representation: the quality of acoustic matching, thequality of the clusters found, and theQuality of the alignment with real words (type, token, and boundary scores).
Journal ArticleDOI

Language independent search in MediaEval's spoken web search task

TL;DR: This paper presents the 2011 and 2012 MediaEval results, and compares the relative merits and weaknesses of approaches developed by participants, providing analysis and directions for future research, in order to improve voice access to spoken information in low resource settings.
References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

Fast algorithm for detecting community structure in networks.

TL;DR: An algorithm is described which gives excellent results when tested on both computer-generated and real-world networks and is much faster, typically thousands of times faster, than previous algorithms.
Book

Social Signal Processing

TL;DR: It is argued that next-generation computing needs to include the essence of social intelligence - the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement - in order to become more effective and more efficient.
Journal ArticleDOI

Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm.

TL;DR: A new algorithm for the discovery of rigid patterns (motifs) in biological sequences that is combinatorial in nature and able to produce all patterns that appear in at least a (user-defined) minimum number of sequences, yet it manages to be very efficient by avoiding the enumeration of the entire pattern space.
Book ChapterDOI

Cross-View Action Recognition from Temporal Self-similarities

TL;DR: An action descriptor is developed that captures the structure of temporal similarities and dissimilarities within an action sequence that relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition.