scispace - formally typeset
Search or ask a question
Author

Trym Holter

Bio: Trym Holter is an academic researcher from SINTEF. The author has contributed to research in topics: Closed captioning & User requirements document. The author has an hindex of 3, co-authored 8 publications receiving 95 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A maximum likelihood based algorithm for fully automatic data-driven modelling of pronunciation, given a set of subword hidden Markov models (HMMs) and acoustic tokens of a word to create a consistent framework for optimisation of automatic speech recognition systems.

63 citations

Proceedings Article
01 Jan 2000
TL;DR: The experiments showed that the turn error rate was more than twice as large for the real dialogues as for theWoZ calls, i.e., 13.3% versus 5.7%.
Abstract: This paper describes the development and testing of a pilot spoken dialogue system for bus travel information in the city of Trondheim, Norway. The system driven dialogue was designed on the basis of analyzed recordings from both human-human operator dialogues, Wizard-of-Oz (WoZ) dialogues, and a text-based inquiry system for the web. The dialogue system employs a flexible speech recognizer and an utterance concatenation procedure for speech output. Even though the system is intended for research only, it has been accessible through a public phone number since October 1999. During this period all dialogues have been recorded. From these, approximately 350 dialogues were selected for annotation and comparison to 120 dialogues from the WoZ recordings. The experiments showed that the turn error rate was more than twice as large for the real dialogues as for the WoZ calls, i.e., 13.3% versus 5.7%. Thus, the WoZ results did not give a reliable estimate for the true performance. Our experiments indicate that the current flexible speech recognizer should be further optimized.

13 citations

Proceedings Article
01 Jan 2000
TL;DR: This application will provide the hearing impaired with an option to read captions for live broadcast programs, i.e., when off-line captioning is not feasible.
Abstract: A system for on-line generation of closed captions (subtitles) for broadcast of live TV-programs is described. During broadcast, a commentator formulates a possibly condensed, but semantically correct version of the original speech. These compressed phrases are recognized by a continuous speech recognizer, and the resulting captions are fed into the teletext system. This application will provide the hearing impaired with an option to read captions for live broadcast programs, i.e., when off-line captioning is not feasible. The main advantage in using a speech recognizer rather than a stenography-based system (e.g., Velotype) is the relaxed requirements for commentator training. Also, the amount of text generated by a system based on stenography tends to be large, thus making it harder to read.

9 citations

Proceedings Article
01 Jan 2000
TL;DR: A statistical framework for modeling (and decoding) semantic concepts based on discrete hidden Markov models (DHMMs) is described, where each semantic concept class is modeled as a multi-state DHMM, where the observations are the recognized words.
Abstract: A key issue in a spoken dialogue system is the successful semantic interpretation of the output from the speech recognizer. Extracting the semantic concepts, i.e. the meaningful phrases, of an utterance is traditionally performed using rule based methods. In this paper we describe a statistical framework for modeling (and decoding) semantic concepts based on discrete hidden Markov models (DHMMs). Each semantic concept class is modeled as a multi-state DHMM, where the observations are the recognized words. The proposed decoding procedure is capable of parsing an utterance into a sequence of phrases, each belonging to a different concept class. The phrase sequence will correspond to a concept segmentation and class identification, whilst the semantic entities constituting each phrase contain the semantic value. The algorithm has been tested on a dialogue system for bus route information in Norwegian. The results confirm the applicability of the procedure. Semantically relevant concepts in input inquiries could be identified with 6.9% error rate on the sentence level. The corresponding segmentation error rate was 8.6% when concept segmentation information was available during training. Without this information, i.e. if the training was performed in an embedded mode, the segmentation error rate increased to 23.5%.

3 citations

Book ChapterDOI
15 Sep 2004
TL;DR: A mobile context aware system for maintenance work based on electronically tagged equipment and handheld wireless terminals with a multimodal user interface with particular attention to voice interaction in noisy industrial scenarios is proposed.
Abstract: Maintenance workers in the oil and process industry have typically had minimal IT support, relying on paper-based solutions both for the information they need to bring into the field and for data capture. This paper proposes a mobile context aware system for maintenance work based on electronically tagged equipment and handheld wireless terminals with a multimodal user interface. Particular attention has been given to voice interaction in noisy industrial scenarios, utilising the PARAT earplug. A proof-of-concept demonstrator of the system has been developed. The paper presents the demonstrator architecture and experiences gained through this work.

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This contribution provides an overview of the publications on pronunciation variation modeling in automatic speech recognition, paying particular attention to the papers in this special issue and the papers presented at 'the Rolduc workshop'.

259 citations

Journal ArticleDOI
Erik G. Nilsson1
TL;DR: The structure of the patterns collection is presented - the patterns are suggested solutions to problems that are grouped into a set of problem areas that are further grouped into three main problem areas - a structure which is valuable both as an index to identifying patterns to use, and it gives a fairly comprehensive overview of issues when designing user interfaces for mobile applications.

197 citations

01 Jan 1999
TL;DR: Problems with the phoneme as the basic subword unit in speech recognition are raised, suggesting that finer-grained control is needed to capture the sort of pronunciation variability observed in spontaneous speech.
Abstract: The notion that a word is composed of a sequence of phone segments, sometimes referred to as ‘beads on a string’, has formed the basis of most speech recognition work for over 15 years. However, as more researchers tackle spontaneous speech recognition tasks, that view is being called into question. This paper raises problems with the phoneme as the basic subword unit in speech recognition, suggesting that finer-grained control is needed to capture the sort of pronunciation variability observed in spontaneous speech. We offer two different alternatives – automatically derived subword units and linguistically motivated distinctive feature systems – and discuss current work in these directions. In addition, we look at problems that arise in acoustic modeling when trying to incorporate higher-level structure with these two strategies.

151 citations

01 Jan 1999
TL;DR: This dissertation examines how pronunciations vary in this speaking style, and how speaking rate and word predictability can be used to predict when greater pronunciation variation can be expected, and suggests that for spontaneous speech, it may be appropriate to build models for syllables and words that can dynamically change the pronunciation used in the speech recognizer based on the extended context.
Abstract: As of this writing, the automatic recognition of spontaneous speech by computer is fraught with errors; many systems transcribe one out of every three to five words incorrectly, whereas humans can transcribe spontaneous speech with one error in twenty words or better. This high error rate is due in part to the poor modeling of pronunciations within spontaneous speech. This dissertation examines how pronunciations vary in this speaking style, and how speaking rate and word predictability can be used to predict when greater pronunciation variation can be expected. It includes an investigation of the relationship between speaking rate, word predictability, pronunciations, and errors made by speech recognition systems. The results of these studies suggest that for spontaneous speech, it may be appropriate to build models for syllables and words that can dynamically change the pronunciations used in the speech recognizer based on the extended context (including surrounding words, phones, speaking rate, etc.). Implementation of new pronunciation models automatically derived from data within the ICSI speech recognition system has shown a 4–5% relative improvement on the Broadcast News recognition task. Roughly two thirds of these gains can be attributed to static baseform improvements; adding the ability to dynamically adjust pronunciations within the recognizer provides the other third of the improvement. The Broadcast News task also allows for comparison of performance on different styles of speech: the new pronunciation models do not help for pre-planned speech, but they provide a significant gain for spontaneous speech. Not only do the automatically learned pronunciation models capture some of the linguistic variation due to the speaking style, but they also represent variation in the acoustic model due to channel effects. The largest improvement was seen in the telephone speech condition, in which 12% of the errors produced by the baseline system were corrected.

87 citations

Journal ArticleDOI
TL;DR: This work constructs a Markov chain Monte Carlo (MCMC) sampling scheme, where sampling from all the posterior probability distributions is very easy and has been tested in extensive computer simulations on finite discrete-valued observed data.
Abstract: Hidden Markov models (HMMs) represent a very important tool for analysis of signals and systems. In the past two decades, HMMs have attracted the attention of various research communities, including the ones in statistics, engineering, and mathematics. Their extensive use in signal processing and, in particular, speech processing is well documented. A major weakness of conventional HMMs is their inflexibility in modeling state durations. This weakness can be avoided by adopting a more complicated class of HMMs known as nonstationary HMMs. We analyze nonstationary HMMs whose state transition probabilities are functions of time that indirectly model state durations by a given probability mass function and whose observation spaces are discrete. The objective of our work is to estimate all the unknowns of a nonstationary HMM, which include its parameters and the state sequence. To that end, we construct a Markov chain Monte Carlo (MCMC) sampling scheme, where sampling from all the posterior probability distributions is very easy. The proposed MCMC sampling scheme has been tested in extensive computer simulations on finite discrete-valued observed data, and some of the simulation results are presented.

83 citations