scispace - formally typeset
Search or ask a question
Author

DG Ollason

Bio: DG Ollason is an academic researcher. The author has an hindex of 2, co-authored 2 publications receiving 3092 citations.

Papers
More filters
16 Sep 1995
TL;DR: The Fundamentals of HTK: General Principles of HMMs, Recognition and Viterbi Decoding, and Continuous Speech Recognition.
Abstract: 1 The Fundamentals of HTK 2 1.1 General Principles of HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Isolated Word Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Output Probability Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Baum-Welch Re-Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Recognition and Viterbi Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Continuous Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7 Speaker Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2,095 citations


Cited by
More filters
Proceedings Article
01 Jan 2011
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Abstract: We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

5,857 citations

Book
01 Jan 2000
TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.
Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

3,794 citations

Book
09 Feb 2012
TL;DR: A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.
Abstract: Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

2,101 citations

Proceedings ArticleDOI
01 Jan 2015
TL;DR: A brief overview of the librosa library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.
Abstract: This document describes version 0.4.0 of librosa: a Python pack- age for audio and music signal processing. At a high level, librosa provides implementations of a variety of common functions used throughout the field of music information retrieval. In this document, a brief overview of the library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.

1,793 citations

Journal ArticleDOI
TL;DR: This work proposes to use the Fisher Kernel framework as an alternative patch encoding strategy: it describes patches by their deviation from an “universal” generative Gaussian mixture model, and reports experimental results showing that the FV framework is a state-of-the-art patch encoding technique.
Abstract: A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements This leads to the popular Bag-of-Visual words representation In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an "universal" generative Gaussian mixture model This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization We report experimental results on five standard datasets--PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K--with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique

1,594 citations