The power of amnesia: learning probabilistic automata with variable memory length

doi:10.1007/BF00114008

Open AccessJournal ArticleDOI

The power of amnesia: learning probabilistic automata with variable memory length

Dana Ron, +2 more

- Vol. 25, Iss: 2, pp 117-149

Chats0

TLDR

It is proved that the algorithm presented can efficiently learn distributions generated by PSAs, and it is shown that for any target PSA, the KL-divergence between the distributiongenerated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity.

Abstract:

We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second application we construct a simple stochastic model for E.coli DNA.

The power of amnesia: learning probabilistic automata with variable memory length

Citations

Dynamic bayesian networks: representation, inference and learning

Detecting intrusions using system calls: alternative data models

Microbial gene identification using interpolated Markov models

The Hierarchical Hidden Markov Model: Analysis and Applications

Anomaly Detection for Discrete Sequences: A Survey

References

Maximum likelihood from incomplete data via the EM algorithm

Elements of information theory

A tutorial on hidden Markov models and selected applications in speech recognition

Dynamic Programming

What is dynamic programming

Related Papers (5)

The context-tree weighting method: basic properties

A tutorial on hidden Markov models and selected applications in speech recognition

Data Compression Using Adaptive Coding and Partial String Matching

Compression of individual sequences via variable-rate coding

Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology