Home
/
Authors
/
Trym Holter

Author

Trym Holter

Bio: Trym Holter is an academic researcher from SINTEF. The author has contributed to research in topics: Closed captioning & User requirements document. The author has an hindex of 3, co-authored 8 publications receiving 95 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Maximum likelihood modelling of pronunciation variation

[...]

Trym Holter¹, Torbjørn Svendsen²•Institutions (2)

SINTEF¹, Norwegian University of Science and Technology²

01 Nov 1999-Speech Communication

TL;DR: A maximum likelihood based algorithm for fully automatic data-driven modelling of pronunciation, given a set of subword hidden Markov models (HMMs) and acoustic tokens of a word to create a consistent framework for optimisation of automatic speech recognition systems.

...read moreread less

63 citations

Proceedings Article•

TABOR - a norwegian spoken dialogue system for bus travel information.

[...]

Magne Hallstein Johnsen, Torbjørn Svendsen, Tore Amble, Trym Holter, Erik Harborg - Show less +1 more

01 Jan 2000

TL;DR: The experiments showed that the turn error rate was more than twice as large for the real dialogues as for theWoZ calls, i.e., 13.3% versus 5.7%.

...read moreread less

Abstract: This paper describes the development and testing of a pilot spoken dialogue system for bus travel information in the city of Trondheim, Norway. The system driven dialogue was designed on the basis of analyzed recordings from both human-human operator dialogues, Wizard-of-Oz (WoZ) dialogues, and a text-based inquiry system for the web. The dialogue system employs a flexible speech recognizer and an utterance concatenation procedure for speech output. Even though the system is intended for research only, it has been accessible through a public phone number since October 1999. During this period all dialogues have been recorded. From these, approximately 350 dialogues were selected for annotation and comparison to 120 dialogues from the WoZ recordings. The experiments showed that the turn error rate was more than twice as large for the real dialogues as for the WoZ calls, i.e., 13.3% versus 5.7%. Thus, the WoZ results did not give a reliable estimate for the true performance. Our experiments indicate that the current flexible speech recognizer should be further optimized.

...read moreread less

13 citations

Proceedings Article•

ASR-based subtitling of live TV-programs for the hearing impaired.

[...]

Trym Holter, Erik Harborg, Magne Hallstein Johnsen, Torbjørn Svendsen

01 Jan 2000

TL;DR: This application will provide the hearing impaired with an option to read captions for live broadcast programs, i.e., when off-line captioning is not feasible.

...read moreread less

Abstract: A system for on-line generation of closed captions (subtitles) for broadcast of live TV-programs is described. During broadcast, a commentator formulates a possibly condensed, but semantically correct version of the original speech. These compressed phrases are recognized by a continuous speech recognizer, and the resulting captions are fed into the teletext system. This application will provide the hearing impaired with an option to read captions for live broadcast programs, i.e., when off-line captioning is not feasible. The main advantage in using a speech recognizer rather than a stenography-based system (e.g., Velotype) is the relaxed requirements for commentator training. Also, the amount of text generated by a system based on stenography tends to be large, thus making it harder to read.

...read moreread less

9 citations

Proceedings Article•

Stochastic modeling of semantic content for use IN a spoken dialogue system.

[...]

Magne Hallstein Johnsen, Trym Holter, Torbjørn Svendsen, Erik Harborg

01 Jan 2000

TL;DR: A statistical framework for modeling (and decoding) semantic concepts based on discrete hidden Markov models (DHMMs) is described, where each semantic concept class is modeled as a multi-state DHMM, where the observations are the recognized words.

...read moreread less

Abstract: A key issue in a spoken dialogue system is the successful semantic interpretation of the output from the speech recognizer. Extracting the semantic concepts, i.e. the meaningful phrases, of an utterance is traditionally performed using rule based methods. In this paper we describe a statistical framework for modeling (and decoding) semantic concepts based on discrete hidden Markov models (DHMMs). Each semantic concept class is modeled as a multi-state DHMM, where the observations are the recognized words. The proposed decoding procedure is capable of parsing an utterance into a sequence of phrases, each belonging to a different concept class. The phrase sequence will correspond to a concept segmentation and class identification, whilst the semantic entities constituting each phrase contain the semantic value. The algorithm has been tested on a dialogue system for bus route information in Norwegian. The results confirm the applicability of the procedure. Semantically relevant concepts in input inquiries could be identified with 6.9% error rate on the sentence level. The corresponding segmentation error rate was 8.6% when concept segmentation information was available during training. Without this information, i.e. if the training was performed in an embedded mode, the segmentation error rate increased to 23.5%.

...read moreread less

3 citations

Book Chapter•DOI•

A Multimodal Context Aware Mobile Maintenance Terminal for Noisy Environments

[...]

Fredrik Vraalsen¹, Trym Holter¹, Ingrid Storruste Svagård¹, Øyvind Kvennås¹•Institutions (1)

SINTEF¹

15 Sep 2004

TL;DR: A mobile context aware system for maintenance work based on electronically tagged equipment and handheld wireless terminals with a multimodal user interface with particular attention to voice interaction in noisy industrial scenarios is proposed.

...read moreread less

Abstract: Maintenance workers in the oil and process industry have typically had minimal IT support, relying on paper-based solutions both for the information they need to bring into the field and for data capture. This paper proposes a mobile context aware system for maintenance work based on electronically tagged equipment and handheld wireless terminals with a multimodal user interface. Particular attention has been given to voice interaction in noisy industrial scenarios, utilising the PARAT earplug. A proof-of-concept demonstrator of the system has been developed. The paper presents the demonstrator architecture and experiences gained through this work.

...read moreread less

3 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Modeling pronunciation variation for ASR

[...]

Helmer Strik¹, Catia Cucchiarini¹•Institutions (1)

Radboud University Nijmegen¹

01 Nov 1999-Speech Communication

TL;DR: This contribution provides an overview of the publications on pronunciation variation modeling in automatic speech recognition, paying particular attention to the papers in this special issue and the papers presented at 'the Rolduc workshop'.

...read moreread less

259 citations

Journal Article•DOI•

Design patterns for user interface for mobile applications

[...]

Erik G. Nilsson¹•Institutions (1)

SINTEF¹

01 Dec 2009-Advances in Engineering Software

TL;DR: The structure of the patterns collection is presented - the patterns are suggested solutions to problems that are grouped into a set of problem areas that are further grouped into three main problem areas - a structure which is valuable both as an index to identifying patterns to use, and it gives a fairly comprehensive overview of issues when designing user interfaces for mobile applications.

...read moreread less

197 citations

Moving beyond the 'beads-on-a-string' model of speech

[...]

Mari Ostendorf

01 Jan 1999

TL;DR: Problems with the phoneme as the basic subword unit in speech recognition are raised, suggesting that finer-grained control is needed to capture the sort of pronunciation variability observed in spontaneous speech.

...read moreread less

Abstract: The notion that a word is composed of a sequence of phone segments, sometimes referred to as ‘beads on a string’, has formed the basis of most speech recognition work for over 15 years. However, as more researchers tackle spontaneous speech recognition tasks, that view is being called into question. This paper raises problems with the phoneme as the basic subword unit in speech recognition, suggesting that finer-grained control is needed to capture the sort of pronunciation variability observed in spontaneous speech. We offer two different alternatives – automatically derived subword units and linguistically motivated distinctive feature systems – and discuss current work in these directions. In addition, we look at problems that arise in acoustic modeling when trying to incorporate higher-level structure with these two strategies.

...read moreread less

151 citations

Dynamic pronunciation models for automatic speech recognition

[...]

John Eric Fosler-Lussier, Nelson Morgan

01 Jan 1999

TL;DR: This dissertation examines how pronunciations vary in this speaking style, and how speaking rate and word predictability can be used to predict when greater pronunciation variation can be expected, and suggests that for spontaneous speech, it may be appropriate to build models for syllables and words that can dynamically change the pronunciation used in the speech recognizer based on the extended context.

...read moreread less

Abstract: As of this writing, the automatic recognition of spontaneous speech by computer is fraught with errors; many systems transcribe one out of every three to five words incorrectly, whereas humans can transcribe spontaneous speech with one error in twenty words or better. This high error rate is due in part to the poor modeling of pronunciations within spontaneous speech. This dissertation examines how pronunciations vary in this speaking style, and how speaking rate and word predictability can be used to predict when greater pronunciation variation can be expected. It includes an investigation of the relationship between speaking rate, word predictability, pronunciations, and errors made by speech recognition systems. The results of these studies suggest that for spontaneous speech, it may be appropriate to build models for syllables and words that can dynamically change the pronunciations used in the speech recognizer based on the extended context (including surrounding words, phones, speaking rate, etc.). Implementation of new pronunciation models automatically derived from data within the ICSI speech recognition system has shown a 4–5% relative improvement on the Broadcast News recognition task. Roughly two thirds of these gains can be attributed to static baseform improvements; adding the ability to dynamically adjust pronunciations within the recognizer provides the other third of the improvement. The Broadcast News task also allows for comparison of performance on different styles of speech: the new pronunciation models do not help for pre-planned speech, but they provide a significant gain for spontaneous speech. Not only do the automatically learned pronunciation models capture some of the linguistic variation due to the speaking style, but they also represent variation in the acoustic model due to channel effects. The largest improvement was seen in the telephone speech condition, in which 12% of the errors produced by the baseline system were corrected.

...read moreread less

87 citations

Journal Article•DOI•

An MCMC sampling approach to estimation of nonstationary hidden Markov models

[...]

Petar M. Djuric¹, Joon-Hwa Chun¹•Institutions (1)

State University of New York System¹

01 May 2002-IEEE Transactions on Signal Processing

TL;DR: This work constructs a Markov chain Monte Carlo (MCMC) sampling scheme, where sampling from all the posterior probability distributions is very easy and has been tested in extensive computer simulations on finite discrete-valued observed data.

...read moreread less

Abstract: Hidden Markov models (HMMs) represent a very important tool for analysis of signals and systems. In the past two decades, HMMs have attracted the attention of various research communities, including the ones in statistics, engineering, and mathematics. Their extensive use in signal processing and, in particular, speech processing is well documented. A major weakness of conventional HMMs is their inflexibility in modeling state durations. This weakness can be avoided by adopting a more complicated class of HMMs known as nonstationary HMMs. We analyze nonstationary HMMs whose state transition probabilities are functions of time that indirectly model state durations by a given probability mass function and whose observation spaces are discrete. The objective of our work is to estimate all the unknowns of a nonstationary HMM, which include its parameters and the state sequence. To that end, we construct a Markov chain Monte Carlo (MCMC) sampling scheme, where sampling from all the posterior probability distributions is very easy. The proposed MCMC sampling scheme has been tested in extensive computer simulations on finite discrete-valued observed data, and some of the simulation results are presented.

...read moreread less

83 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Collapse