scispace - formally typeset
Search or ask a question

Showing papers by "Lawrence K. Saul published in 1998"


Proceedings Article
24 Jul 1998
TL;DR: Using methods from large deviation theory, rigorous bounds on marginal probabilities such as Pr[children] are derived and rates of convergence for the accuracy of the authors' bounds as a function of network size are proved.
Abstract: We study two-layer belief networks of binary random variables in which the conditional probabilities Pr [child|parents] depend monotonically on weighted sums of the parents. In large networks where exact probabilistic inference is intractable, we show how to compute upper and lower bounds on many probabilities of interest. In particular, using methods from large deviation theory, we derive rigorous bounds on marginal probabilities such as Pr[children] and prove rates of convergence for the accuracy of our bounds as a function of network size. Our results apply to networks with generic transfer function parameterizations of the conditional probability tables, such as sigmoid and noisy-OR. They also explicitly illustrate the types of averaging behavior that can simplify the problem of inference in large networks.

51 citations


Patent
Lawrence K. Saul1
11 Dec 1998
TL;DR: In this article, a method and apparatus for speech recognition using Markov processes on curves is presented, which operate such that input speech utterances are received and represented as multidimensional curves.
Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.

13 citations


Proceedings Article
01 Dec 1998
TL;DR: Algorithms for approximate probabilistic inference that exploit averaging phenomena occurring at nodes with large numbers of parents are given, which compute rigorous lower and upper bounds on marginal probabilities of interest, and prove that these bounds become exact in the limit of large networks.
Abstract: We study probabilistic inference in large, layered Bayesian networks represented as directed acyclic graphs. We show that the intractability of exact inference in such networks does not preclude their effective use. We give algorithms for approximate probabilistic inference that exploit averaging phenomena occurring at nodes with large numbers of parents. We show that these algorithms compute rigorous lower and upper bounds on marginal probabilities of interest, prove that these bounds become exact in the limit of large networks, and provide rates of convergence.

11 citations


Proceedings Article
01 Dec 1998
TL;DR: On two tasks--recognizing New Jersey town names and connected alpha-digits--the use of Markov processes on curves (MPCs) for automatic speech recognition is described, and it is found that MPCs yield lower word error rates than comparably trained hidden Markov models.
Abstract: We investigate a probabilistic framework for automatic speech recognition based on the intrinsic geometric properties of curves. In particular, we analyze the setting in which two variables-one continuous (x), one discrete (s)-evolve jointly in time. We suppose that the vector x traces out a smooth multidimensional curve and that the variable s evolves stochastically as a function of the arc length traversed along this curve. Since arc length does not depend on the rate at which a curve is traversed, this gives rise to a family of Markov processes whose predictions, Pr[s|x], are invariant to nonlinear warpings of time. We describe the use of such models, known as Markov processes on curves (MPCs), for automatic speech recognition, where x are acoustic feature trajectories and s are phonetic transcriptions. On two tasks--recognizing New Jersey town names and connected alpha-digits--we find that MPCs yield lower word error rates than comparably trained hidden Markov models.

9 citations


Proceedings Article
24 Jul 1998
TL;DR: The approach is to model the conditional random process that generates segments of constant s along the curve of x, and shows how to learn the parameters of these Markov processes from labeled and/or unla-beled examples of segmented curves.
Abstract: We study the classiication problem that arises when two variables|one continuous (x), one discrete (s)|evolve jointly in time. We suppose that the vector x traces out a smooth multidimensionalcurve, to each point of which the variable s attaches a discrete label. The trace of s thus partitions the curve into diierent segments whose boundaries occur where s changes value. We consider how to learn the mapping between x and s from examples of segmented curves. Our approach is to model the conditional random process that generates segments of constant s along the curve of x. We suppose that the variable s evolves stochastically as a function of the arc length traversed by x. Since arc length does not depend on the rate at which a curve is traversed, this gives rise to a family of Markov processes whose predictions , Prrs j x], are invariant to nonlinear warpings (or reparameterizations) of time. We show how to learn the parameters of these Markov processes from labeled and/or unla-beled examples of segmented curves. The resulting models are motivated for automatic speech recognition, where x are acoustic features and s are phonetic transcriptions.

3 citations