scispace - formally typeset
Search or ask a question
Author

Anna Litvinova

Bio: Anna Litvinova is an academic researcher. The author has contributed to research in topics: RealAudio & Audio mining. The author has an hindex of 1, co-authored 1 publications receiving 39 citations.

Papers
More filters
Proceedings Article
12 Apr 2000
TL;DR: This site indexes several talk and news radio shows covering a wide range of topics and speaking styles from a selection of public Web sites with multimedia archives, and shows that, even if the transcription is inaccurate, it can still achieve good retrieval performance for typical user queries.
Abstract: We have developed an audio search engine incorporating speech recognition technology. This allows indexing of spoken documents from the World Wide Web when no transcription is available. This site indexes several talk and news radio shows covering a wide range of topics and speaking styles from a selection of public Web sites with multimedia archives. Our Web site is similar in spirit to normal Web search sites; it contains an index, not the actual multimedia content. The audio from these shows suffers in acoustic quality due to bandwidth limitations, coding, compression, and poor acoustic conditions. The shows are typically sampled at 8 kHz and transmitted, RealAudio compressed, at 6.5 kbps. Our word-error rate results using appropriately trained acoustic models show remarkable resilience to the high compression, though many factors combine to increase the average word-error rates over standard broadcast news benchmarks. We show that, even if the transcription is inaccurate, we can still achieve good retrieval performance for typical user queries (69%). Because the archive is large - over 5000 hours of content (and growing at a rate of 100 hours per week), totaling 47 million words and growing rapidly - we measure performance in terms of the precision of the top-ranked matches returned to the user.

39 citations


Cited by
More filters
Patent
02 Jun 2006
TL;DR: In this article, a computerized method and apparatus for providing a virtual media channel based on media search is described, and the steps of obtaining a set of rules that define instructions for obtaining media content that comprise the content for a media channel, the set including at least one rule with instructions to include media content resulting from a search; searching for candidate media content according to a search query defined by the at least rule; and merging one or more of the candidate media contents resulting from the search into the content of the media channel.
Abstract: A computerized method and apparatus for providing a virtual media channel based on media search is featured. The method and apparatus features the steps of, or structure for, obtaining a set of rules that define instructions for obtaining media content that comprise the content for a media channel, the set including at least one rule with instructions to include media content resulting from a search; searching for candidate media content according to a search query defined by the at least one rule; and merging one or more of the candidate media content resulting from the search into the content for the media channel. The candidate media content can include segments of the media content resulting from the search. The set of rules can additionally include a rule with instructions to add media content from a predetermined location.

96 citations

Patent
31 Mar 2006
TL;DR: In this article, a computerized method and apparatus is disclosed for dynamic presentation of advertising, factual, informational content and combinations thereof, in particular the advertising content is dynamically presented according to the playback of corresponding segments identified within a media file or stream.
Abstract: A computerized method and apparatus is disclosed for dynamic presentation of advertising, factual, informational content and combinations thereof. In particular, the advertising content is dynamically presented according to the playback of corresponding segments identified within a media file or stream.

86 citations

Patent
27 Mar 2002
TL;DR: An electronic document searching system or word searching system which when given an input, expands the input as a function of acoustic similarity and/or word sequence occurrence frequency is described in this article.
Abstract: An electronic document searching system or word searching system which when given an input, expands the input as a function of acoustic similarity and/or word sequence occurrence frequency. Results of the system are alternative input words or phrases. The alternative input words or phrases are output from the system for further processing.

76 citations

Book ChapterDOI
25 Aug 2008
TL;DR: Enriched transcriptions, that is enhancing the automatic word transcripts with meta-data derived from the audio data is discussed, followed by some hightlights of recent progress and remaining challenges in speech recognition.
Abstract: This paper addresses some of the recent trends in speech processing, with a focus on speech-to-text transcription as a means to facilitate access to multimedia information in a multilingual context. A brief overview of automatic speech recognition is given along with indicative performance measures for a range of tasks. Enriched transcriptions, that is enhancing the automatic word transcripts with meta-data derived from the audio data is discussed, followed by some hightlights of recent progress and remaining challenges in speech recognition.

63 citations

Journal ArticleDOI
TL;DR: This work presents several novel approaches to the Out of Vocabulary (OOV) query problem for spoken audio: indexing based on syllable-like units called particles and query expansion according to acoustic confusability for a word index.
Abstract: We present several novel approaches to the Out of Vocabulary (OOV) query problem for spoken audio: indexing based on syllable-like units called particles and query expansion according to acoustic confusability for a word index. We also examine linear and OOV-based combination of indexing schemes. We experiment on 75 h of broadcast news, comparing our techniques to a word index, a phoneme index and a phoneme index queried with phoneme sequences. Our results show that our approaches are superior to both a word index and a phoneme index for OOV words, and have comparable performance to the sequence of phonemes scheme. The particle system has worse performance than the acoustic query expansion scheme. The best system uses word queries for in-vocabulary words and a linear combination of the phoneme sequence scheme and acoustic query expansion for OOV words. Using the best possible weights for linear combination, this system improves the average precision from 0.35 for a word index to 0.40, a result only obtainable if the weights could be learnt on a development query set. The next best system used a word index for in-vocabulary words and the phoneme sequence system otherwise and had average precision of 0.39.

60 citations