Search or ask a question

Showing papers by "Pedro J. Moreno published in 1998"

PDF

Open Access

Proceedings Article•

A recursive algorithm for the forced alignment of very long audio segments.

[...]

Pedro J. Moreno, Christopher Frank Joerg, Jean-Manuel Van Thong, Oren Glickman

01 Jan 1998

TL;DR: The key idea of this algorithm is to turn the forced alignment problem into a recursive speech recognition problem with a gradually restricting dictionary and language model, which is tolerant to acoustic noise and errors or gaps in the text transcript or audio tracks.

...read moreread less

Abstract: In this paper we address the problem of aligning very long (often more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recursive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem into a recursive speech recognition problem with a gradually restricting dictionary and language model. The algorithm is tolerant to acoustic noise and errors or gaps in the text transcript or audio tracks. We report experimental results on a 3 hour audio file containing TV and radio broadcasts. We will show accurate alignments on speech under a variety of real acoustic conditions such as speech over music and speech over telephone lines. We also report results when the same audio stream has been corrupted with white additive noise or compressed using a popular web encoding format such as RealAudio. This algorithm has been used in our internal multimedia indexing project. It has processed more than 200 hours of audio from varied sources, such as WGBH NOVA documentaries and NPR web audio files. The system aligns speech media content in about one to five times realtime, depending on the acoustic conditions of the audio signal.

...read moreread less

171 citations

Journal Article•DOI•

Data-driven environmental compensation for speech recognition: a unified approach

[...]

Pedro J. Moreno¹, Bhiksha Raj¹, Richard M. Stern¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 1998-Speech Communication

TL;DR: This paper presents the multivaRiate gAussian-based cepsTral normaliZation (RATZ) family of algorithms which modify incoming cepstral features, along with the STAR (STAtistical Reestimation)family of algorithms, which modify the internal statistics of the classifier.

...read moreread less

68 citations

Patent•

Environmentally compensated speech processing

[...]

Brian S. Eberman¹, Pedro J. Moreno¹•Institutions (1)

Hewlett-Packard¹

05 Jun 1998

TL;DR: In this paper, first vectors representing clean speech signals are stored in a vector codebook. Second vectors are determined from dirty speech signals. Third vectors are predicated, based on estimated noise and distortion parameters.

...read moreread less

Abstract: In a computerized method for processing speech signals, first vectors representing clean speech signals are stored in a vector codebook. Second vectors are determined from dirty speech signals. Noise and distortion parameters are estimated from the second vectors. Third vectors are predicated, based on estimated noise and distortion parameters. The third vectors are used to correct the first vectors. The third vectors can then be applied to the second vectors to produce corrected vectors. The corrected vectors and the first vectors can be compared to identify first vectors which resemble the corrected vectors.

...read moreread less

59 citations

Patent•

An Umgebungsgeräusche angepasste Sprachverarbeitung

[...]

Brian S. Eberman, Pedro J. Moreno

05 Jun 1998

1 citations