Variable length Markov chains
Peter Bühlmann,Abraham J. Wyner +1 more
TLDR
In this paper, the authors studied the structural properties of stationary variable length Markov chains (VLMCs) on a finite space and proposed a new bootstrap scheme based on fitted VLMCs.Abstract:
We study estimation in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of high order, but with memory of variable length yielding a much bigger and structurally richer class of models than ordinary high-order Markov chains. From an algorithmic view, the VLMC model class has attracted interest in information theory and machine learning, but statistical properties have not yet been explored. Provided that good estimation is available, the additional structural richness of the model class enhances predictive power by finding a better trade-off between model bias and variance and allowing better structural description which can be of specific interest. The latter is exemplified with some DNA data. A version of the tree-structured context algorithm, proposed by Rissanen in an information theoretical set-up is shown to have new good asymptotic properties for estimation in the class of VLMCs. This remains true even when the underlying model increases in dimensionality. Furthermore, consistent estimation of minimal state spaces and mixing properties of fitted models are given. We also propose a new bootstrap scheme based on fitted VLMCs. We show its validity for quite general stationary categorical time series and for a broad range of statistical procedures.read more
Citations
More filters
Journal ArticleDOI
Improving RNA-Seq expression estimates by correcting for fragment bias
Adam B Roberts,Cole Trapnell,Cole Trapnell,Julie Donaghey,John L. Rinn,John L. Rinn,Lior Pachter +6 more
TL;DR: Improvements in expression estimates as measured by correlation with independently performed qRT-PCR are found and correction of bias leads to improved replicability of results across libraries and sequencing technologies.
Journal ArticleDOI
Computational Mechanics: Pattern and Prediction, Structure and Simplicity
TL;DR: This paper showed that the causal-state representation of ∈-machine is the minimal one consistent with accurate prediction and established several results on ∈machine optimality and uniqueness and on how ∆-machines compare to alternative representations.
Journal ArticleDOI
HIBAG—HLA genotype imputation with attribute bagging
Xiuwen Zheng,Judong Shen,Charles J. Cox,Jonathan Wakefield,Margaret G. Ehm,Matthew R. Nelson,Bruce S. Weir +6 more
TL;DR: HIBAG, HLA Imputation using attribute BAGging, is proposed, that makes predictions by averaging HLA-type posterior probabilities over an ensemble of classifiers built on bootstrap samples, providing a readily available imputation approach without the need to have access to large training data sets.
Causal architecture, complexity and self-organization in time series and cellular automata
TL;DR: This work develops computational mechanics for four increasingly sophisticated types of process—memoryless transducers, time series, transducers with memory, and cellular automata, and proves the optimality and uniqueness of the e-machine's representation of the causal architecture, and gives reliable algorithms for pattern discovery.
Journal ArticleDOI
Bootstraps for Time Series
TL;DR: It is argued that two types of sieves outperform the block method, each of them in its own important niche, namely linear and categorical processes, respectively.
References
More filters
Journal ArticleDOI
Classification and Regression Trees.
Book
Classification and regression trees
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Journal ArticleDOI
Bootstrap Methods: Another Look at the Jackknife
TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.
Journal ArticleDOI
The Jackknife and the Bootstrap for General Stationary Observations
TL;DR: In this article, the authors extend the jackknife and the bootstrap method of estimating standard errors to the case where the observations form a general stationary sequence, and they show that consistency is obtained if $l = l(n) \rightarrow \infty$ and $l(n)/n \ rightarrow 0$.