An experimental study of an audio indexing system for the web.

Open AccessProceedings Article

An experimental study of an audio indexing system for the web.

Beth Logan, +3 more

- pp 676-679

Chats0

TLDR

A speech recognition based audio search engine for indexing spoken documents found on the World Wide Web, focusing on the speech recognition and retrieval aspects, and the results of retrieval experiments demonstrate that the system can index effectively.

Abstract:

We have developed a speech recognition based audio search engine for indexing spoken documents found on the World Wide Web Our site (http://wwwcompaqcom/speechbot) indexes around 20 news and talk radio shows covering a wide range of topics, speaking styles and acoustic conditions from a selection of public Web sites with multimedia archives In this paper, we describe our system and its performance, focusing on the speech recognition and retrieval aspects We describe our training procedure in some detail and report our historical error rate since the site launch We also investigate the impact of Out Of Vocabulary (OOV) words Finally we report the results of retrieval experiments which demonstrate that our system can index effectively

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Vocabulary independent spoken term detection

Jonathan Mamou, +2 more

TL;DR: This work presents a vocabulary independent system that can handle arbitrary queries, exploiting the information provided by having both word transcripts and phonetic transcripts, in order to retrieve information from speech data.

...read moreread less

Journal ArticleDOI

Spoken content retrieval: beyond cascading speech recognition with text retrieval

Lin-Shan Lee, +3 more

- 01 Sep 2015 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: This overview article is intended to provide a thorough overview of the concepts, principles, approaches, and achievements of major technical contributions along this line of investigation.

...read moreread less

Proceedings ArticleDOI

Vocabulary-independent search in spontaneous speech

F. Seide, +3 more

TL;DR: This work presents a vocabulary-independent system to index and to search rapidly spontaneous speech, and introduces a new method of phonetic word-fragment lattice generation, which uses longer-span language knowledge than a phoneme recognizer.

...read moreread less

Journal ArticleDOI

Speechbot: an experimental speech-based search engine for multimedia content on the web

J.-M. Van Thong, +5 more

- 01 Mar 2002 -

IEEE Transactions on Multimedia

TL;DR: This paper uses speech recognition technology to index spoken audio and video files from the World Wide Web when no transcriptions are available, and shows that, even if the transcription is inaccurate, it can still achieve good retrieval performance for typical user queries.

...read moreread less

Proceedings ArticleDOI

Fast Vocabulary-Independent Audio Search Using Path-Based Graph Indexing

Olivier Siohan, +1 more

TL;DR: A fast vocabulary independent audio search approach that operates on phonetic lattices and is suitable for any query, inspired by a general graph indexing method that defines an automatic procedure to select a small number of paths as indexing features, keeping the index size small while allowing fast retrieval of the lattices matching a given query.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Introduction to Modern Information Retrieval

Gerard Salton, +1 more

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.

...read moreread less

Analysis of a Very Large AltaVista Query Log

Craig Silverstein, +3 more

TL;DR: In this paper, an analysis of a 280 GB AltaVista search engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks is presented, which represents approximately 285 million user sessions, each an attempt to fill a single information need.

...read moreread less

1998 TREC-7 Spoken Document Retrieval Track Overview and Results

John S. Garofolo, +4 more

TL;DR: The 1998 TREC-7 Spoken Document Retrieval (SDR) Track which implemented an evaluation of retrieval of broadcast news excerpts using a combination of automatic speech recognition and information retrieval technologies is described.

...read moreread less

Patent

Method for indexing information of a database

Michael Burrows

TL;DR: In this paper, an indexing method is provided for a database storing information as records at unique addresses, where pairs are generated for each record, each pair includes a word representing a portion of the information of the record and an associated location.

...read moreread less

Proceedings ArticleDOI

The Cambridge University spoken document retrieval system

S.E. Johnson, +4 more

TL;DR: The retrieval performance over a wide range of speech transcription error rates is presented and a number of recognition error metrics that more accurately reflect the impact of transcription errors on retrieval accuracy are defined and computed.

...read moreread less

An experimental study of an audio indexing system for the web.

Citations

Vocabulary independent spoken term detection

Spoken content retrieval: beyond cascading speech recognition with text retrieval

Vocabulary-independent search in spontaneous speech

Speechbot: an experimental speech-based search engine for multimedia content on the web

Fast Vocabulary-Independent Audio Search Using Path-Based Graph Indexing

References

Introduction to Modern Information Retrieval

Analysis of a Very Large AltaVista Query Log

1998 TREC-7 Spoken Document Retrieval Track Overview and Results

Method for indexing information of a database

The Cambridge University spoken document retrieval system

Related Papers (5)

Lattice-Based Search for Spoken Utterance Retrieval

Subword-based approaches for spoken document retrieval

Vocabulary independent spoken term detection

The SRI/OGI 2006 spoken term detection system.

Rapid and accurate spoken term detection.