scispace - formally typeset
Search or ask a question

Showing papers by "Timothy J. Hazen published in 2008"


Journal ArticleDOI
TL;DR: The issue of handling the errorful or incomplete output provided by ASR systems for spoken audio documents is focused on, focusing on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.
Abstract: Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range from Web and local network search to searching for personal information residing on one's own hard-drive. Speech search has received less attention perhaps because large collections of spoken material have previously not been available. However, with cheaper storage and increased broadband access, there has been a subsequent increase in the availability of online spoken audio content such as news broadcasts, podcasts, and academic lectures. A variety of personal and commercial uses also exist. As data availability increases, the lack of adequate technology for processing spoken documents becomes the limiting factor to large-scale access to spoken content. In this article, we strive to discuss the technical issues involved in the development of information retrieval systems for spoken audio documents, concentrating on the issue of handling the errorful or incomplete output provided by ASR systems. We focus on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.

173 citations


Proceedings ArticleDOI
12 May 2008
TL;DR: This paper investigates a discriminative approach to feature weighting for topic identification using minimum classification error (MCE) training and learns feature weights by optimizing an objective loss function directly related to the classification error rate of the topic identification system.
Abstract: In this paper we investigate a discriminative approach to feature weighting for topic identification using minimum classification error (MCE) training. Our approach learns feature weights by optimizing an objective loss function directly related to the classification error rate of the topic identification system. Topic identification experiments are performed on spoken conversations from the Fisher corpus. Features drawn from both word and phone lattices generated via automatic speech recognition are investigated. Under various different conditions, our new feature weighting scheme reduces our classification error rate between 9% and 23% relative to our baseline naive Bayes system using feature selection.

16 citations


Proceedings ArticleDOI
22 Sep 2008
TL;DR: The discriminative minimum classification error (MCE) training approach is applied to the problem of learning an appropriate feature space normalization for use with an SVM classifier and results are presented showing significant error rate reductions.
Abstract: The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content words in the SVM kernel function score if the feature space is not normalized properly In this paper we apply the discriminative minimum classification error (MCE) training approach to the problem of learning an appropriate feature space normalization for use with an SVM classifier Results are presented showing significant error rate reductions for an SVM-based system on a topic identification task using the Fisher corpus of audio recordings of humanhuman conversations

5 citations