Top 4 papers published by Timothy J. Hazen from Microsoft in 2008

Journal Article•DOI•

Retrieval and browsing of spoken content

[...]

Ciprian Chelba¹, Timothy J. Hazen², Murat Saraclar³•Institutions (3)

Johns Hopkins University¹, Bundelkhand University², Philips³

18 Apr 2008-IEEE Signal Processing Magazine

TL;DR: The issue of handling the errorful or incomplete output provided by ASR systems for spoken audio documents is focused on, focusing on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.

...read moreread less

Abstract: Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range from Web and local network search to searching for personal information residing on one's own hard-drive. Speech search has received less attention perhaps because large collections of spoken material have previously not been available. However, with cheaper storage and increased broadband access, there has been a subsequent increase in the availability of online spoken audio content such as news broadcasts, podcasts, and academic lectures. A variety of personal and commercial uses also exist. As data availability increases, the lack of adequate technology for processing spoken documents becomes the limiting factor to large-scale access to spoken content. In this article, we strive to discuss the technical issues involved in the development of information retrieval systems for spoken audio documents, concentrating on the issue of handling the errorful or incomplete output provided by ASR systems. We focus on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.

...read moreread less

173 citations

Proceedings Article•DOI•

Discriminative feature weighting using MCE training for topic identification of spoken audio recordings

[...]

Timothy J. Hazen¹, Anna Margolis¹•Institutions (1)

Massachusetts Institute of Technology¹

12 May 2008

TL;DR: This paper investigates a discriminative approach to feature weighting for topic identification using minimum classification error (MCE) training and learns feature weights by optimizing an objective loss function directly related to the classification error rate of the topic identification system.

...read moreread less

Abstract: In this paper we investigate a discriminative approach to feature weighting for topic identification using minimum classification error (MCE) training. Our approach learns feature weights by optimizing an objective loss function directly related to the classification error rate of the topic identification system. Topic identification experiments are performed on spoken conversations from the Fisher corpus. Features drawn from both word and phone lattices generated via automatic speech recognition are investigated. Under various different conditions, our new feature weighting scheme reduces our classification error rate between 9% and 23% relative to our baseline naive Bayes system using feature selection.

...read moreread less

16 citations

Proceedings Article•DOI•

A hybrid SVM/MCE training approach for vector space topic identification of spoken audio recordings.

[...]

Timothy J. Hazen¹, Fred Richardson¹•Institutions (1)

Massachusetts Institute of Technology¹

22 Sep 2008

TL;DR: The discriminative minimum classification error (MCE) training approach is applied to the problem of learning an appropriate feature space normalization for use with an SVM classifier and results are presented showing significant error rate reductions.

...read moreread less

Abstract: The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content words in the SVM kernel function score if the feature space is not normalized properly In this paper we apply the discriminative minimum classification error (MCE) training approach to the problem of learning an appropriate feature space normalization for use with an SVM classifier Results are presented showing significant error rate reductions for an SVM-based system on a topic identification task using the Fisher corpus of audio recordings of humanhuman conversations

...read moreread less

5 citations

Transactions associate editors

[...]

01 Jan 2008

Showing papers by "Timothy J. Hazen published in 2008"