scispace - formally typeset
Search or ask a question

Showing papers in "Bernoulli in 2013"


Journal ArticleDOI
TL;DR: In this paper, the post-l1-penalized estimators in high-dimensional linear regression models are used to estimate the probability of a linear regression model to be true.
Abstract: Note: new title. Former title = Post-l1-Penalized Estimators in High-Dimensional Linear Regression Models. First Version submitted March 29, 2010; Orig. date Jan 4, 2009; this revision June 14, 2011

366 citations


Journal ArticleDOI
TL;DR: It is proved that, to obtain an O(1) acceptance probability as the dimension d of the state space tends to, the leapfrog step-size h should be scaled as h=l ×d−1/ 4, which means that in high dimensions, HMC requires O(d1/ 4 ) steps to traverse the statespace.
Abstract: We investigate the properties of the Hybrid Monte Carlo algorithm (HMC) in high dimensions. HMC develops a Markov chain reversible w.r.t. a given target distribution . by using separable Hamiltonian dynamics with potential -log .. The additional momentum variables are chosen at random from the Boltzmann distribution and the continuous-time Hamiltonian dynamics are then discretised using the leapfrog scheme. The induced bias is removed via a Metropolis- Hastings accept/reject rule. In the simplified scenario of independent, identically distributed components, we prove that, to obtain an O(1) acceptance probability as the dimension d of the state space tends to ., the leapfrog step-size h should be scaled as h=l ×d−1/ 4 . Therefore, in high dimensions, HMC requires O(d1/ 4 ) steps to traverse the state space. We also identify analytically the asymptotically optimal acceptance probability, which turns out to be 0.651 (to three decimal places). This is the choice which optimally balances the cost of generating a proposal, which decreases as l increases (because fewer steps are required to reach the desired final integration time), against the cost related to the average number of proposals required to obtain acceptance, which increases as l increases

320 citations


Journal ArticleDOI
TL;DR: In this article, the authors review characterizations of positive definite functions on spheres in terms of Gegenbauer expansions, and apply them to dimension walks, where monotonicity properties of the Geggenbauer coefficients guarantee positive definiteness in higher dimensions.
Abstract: Isotropic positive definite functions on spheres play important roles in spatial statistics, where they occur as the correlation functions of homogeneous random fields and star-shaped random particles. In approximation theory, strictly positive definite functions serve as radial basis functions for interpolating scattered data on spherical domains. We review characterizations of positive definite functions on spheres in terms of Gegenbauer expansions and apply them to dimension walks, where monotonicity properties of the Gegenbauer coefficients guarantee positive definiteness in higher dimensions. Subject to a natural support condition, isotropic positive definite functions on the Euclidean space $\mathbb{R} ^{3}$, such as Askey’s and Wendland’s functions, allow for the direct substitution of the Euclidean distance by the great circle distance on a one-, two- or three-dimensional sphere, as opposed to the traditional approach, where the distances are transformed into each other. Completely monotone functions are positive definite on spheres of any dimension and provide rich parametric classes of such functions, including members of the powered exponential, Matern, generalized Cauchy and Dagum families. The sine power family permits a continuous parameterization of the roughness of the sample paths of a Gaussian process. A collection of research problems provides challenges for future work in mathematical analysis, probability theory and spatial statistics.

182 citations


Journal ArticleDOI
Peter Bühlmann1
TL;DR: In this article, a method for constructing p-values for general hypotheses in a high-dimensional linear model is proposed, where hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters.
Abstract: We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters. Furthermore, when considering many hypotheses, we show how to adjust for multiple testing taking dependence among the p-values into account. Our technique is based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions. We prove strong error control for our p-values and provide sufficient conditions for detection: for the former, we do not make any assumption on the size of the true underlying regression coefficients while regarding the latter, our procedure might not be optimal in terms of power. We demonstrate the method in simulated examples and a real data application.

146 citations


Journal ArticleDOI
TL;DR: In this article, the multivariate Bernoulli distribution is used to estimate the structure of graphs with binary nodes and its statistical properties regarding independence and uncorrelatedness of the nodes are demonstrated.
Abstract: In this paper, we consider the multivariate Bernoulli distribution as a model to estimate the structure of graphs with binary nodes. This distribution is discussed in the framework of the exponential family, and its statistical properties regarding independence of the nodes are demonstrated. Importantly the model can estimate not only the main effects and pairwise interactions among the nodes but also is capable of modeling higher order interactions, allowing for the existence of complex clique effects. We compare the multivariate Bernoulli model with existing graphical inference models – the Ising model and the multivariate Gaussian model, where only the pairwise interactions are considered. On the other hand, the multivariate Bernoulli distribution has an interesting property in that independence and uncorrelatedness of the component random variables are equivalent. Both the marginal and conditional distributions of a subset of variables in the multivariate Bernoulli distribution still follow the multivariate Bernoulli distribution. Furthermore, the multivariate Bernoulli logistic model is developed under generalized linear model theory by utilizing the canonical link function in order to include covariate information on the nodes, edges and cliques. We also consider variable selection techniques such as LASSO in the logistic model to impose sparsity structure on the graph. Finally, we discuss extending the smoothing spline ANOVA approach to the multivariate Bernoulli logistic model to enable estimation of non-linear effects of the predictor variables.

136 citations


Journal ArticleDOI
TL;DR: In this article, a compact formula for the Parisian ruin probability for a spectrally negative Levy process is given, defined by the probability that the process exhibits an excursion below zero which length exceeds a certain fixed period r.
Abstract: In this note we give, for a spectrally negative Levy process, a compact formula for the Parisian ruin probability, which is defined by the probability that the process exhibits an excursion below zero which length exceeds a certain fixed period r. The formula involves only the scale function of the spectrally negative Levy process

111 citations


Journal ArticleDOI
TL;DR: This paper considers testing a covariance matrix in the high dimensional setting where the dimension p can be comparable or much larger than the sample size n and introduces a test based on a U -statistic, which is shown to be rate optimal over this asymptotic regime.
Abstract: This paper considers testing a covariance matrixin the high dimensional setting where the dimension p can be comparable or much larger than the sample size n. The problem of testing the hypothesis H0 : � = � 0 for a given covariance matrix � 0 is studied from a minimax point of view. We first characterize the boundary that separates the testable region from the non-testable region by the Frobenius norm when the ratio between the dimension p over the sample size n is bounded. A test based on a U -statistic is introduced and is shown to be rate optimal over this asymptotic regime. Furthermore, it is shown that the power of this test uniformly dominates that of the corrected likelihood ratio test (CLRT) over the entire asymptotic regime under which the CLRT is applicable. The power of the U -statistic based test is also analyzed when p/n is unbounded.

105 citations


Journal ArticleDOI
TL;DR: Some of the statistical consequences of computational perspectives on scability, in particular divide-and-conquer methodology and hierarchies of convex relaxations are investigated, with the goal of identifying “time-data tradeoffs.
Abstract: How should statistical procedures be designed so as to be scalable computationally to the massive datasets that are increasingly the norm? When coupled with the requirement that an answer to an inferential question be delivered within a certain time budget, this question has significant repercussions for the field of statistics. With the goal of identifying “time-data tradeoffs,” we investigate some of the statistical consequences of computational perspectives on scability, in particular divide-and-conquer methodology and hierarchies of convex relaxations. The fields of computer science and statistics have undergone mostly separate evolutions during their respective histories. This is changing, due in part to the phenomenon of “Big Data.” Indeed, science and technology are currently generating very large datasets and the gatherers of these data have increasingly ambitious inferential goals, trends which point towards a future in which statistics will be forced to deal with problems of scale in order to remain relevant. Currently the field seems little prepared to meet this challenge. To the key question “Can you guarantee a certain level of inferential accuracy within a certain time budget even as the data grow in size?” the field is generally silent. Many statistical procedures either have unknown runtimes or runtimes that render the procedure unusable on large-scale data. Although the field of sequential analysis provides tools to assess risk after a certain number of data points have arrived, this is different from an algorithmic analysis that predicts a relationship between time and risk. Faced with this situation, gatherers of large-scale data are often forced to turn to ad hoc procedures that perhaps do provide algorithmic guarantees but which may provide no statistical guarantees and which in fact may have poor or even disastrous statistical properties. On the other hand, the field of computer science is also currently poorly equipped to provide solutions to the inferential problems associated with Big Data. Database researchers rarely view the data in a database as noisy measurements on an underlying population about which inferential statements are desired. Theoretical computer scientists are able to provide analyses of the resource requirements of algorithms (e.g., time and space), and are often able to provide comparative analyses of different algorithms for solving a given problem, but these problems rarely refer to inferential goals. In particular, the notion that it may be possible to save on computation because of the growth of statistical power as problem instances grow in size is not (yet) a common perspective in computer science. In this paper we discuss some recent research initiatives that aim to draw computer science and statistics closer together, with particular reference to “Big Data” problems. There are two main underlying perspectives driving these initiatives, both of which present interesting conceptual challenges for statistics. The first is that large computational problems are often usefully addressed via some notion of “divide-and-conquer.” That is, the large problem is divided into subproblems that are hopefully simpler than the original problem, these subproblems are solved

102 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce the class of volatility modulated Levy-driven Volterra processes and their important subclass of Levy semistationary processes as a new framework for modelling energy spot prices.
Abstract: This paper introduces the class of volatility modulated Levy-driven Volterra ($\mathcal{VMLV}$) processes and their important subclass of Levy semistationary ($\mathcal{LSS}$) processes as a new framework for modelling energy spot prices. The main modelling idea consists of four principles: First, deseasonalised spot prices can be modelled directly in stationarity. Second, stochastic volatility is regarded as a key factor for modelling energy spot prices. Third, the model allows for the possibility of jumps and extreme spikes and, lastly, it features great flexibility in terms of modelling the autocorrelation structure and the Samuelson effect. We provide a detailed analysis of the probabilistic properties of $\mathcal{VMLV}$ processes and show how they can capture many stylised facts of energy markets. Further, we derive forward prices based on our new spot price models and discuss option pricing. An empirical example based on electricity spot prices from the European Energy Exchange confirms the practical relevance of our new modelling framework.

101 citations


Journal ArticleDOI
TL;DR: In this paper, extremal quantile regression estimators of a response variable given a vector of covariates in the general setting, whether the conditional extreme-value index is positive, negative, or zero, were investigated.
Abstract: Nonparametric regression quantiles obtained by inverting a kernel estimator of the conditional distribution of the response are long established in statistics. Attention has been, however, restricted to ordinary quantiles staying away from the tails of the conditional distribution. The purpose of this paper is to extend their asymptotic theory far enough into the tails. We focus on extremal quantile regression estimators of a response variable given a vector of covariates in the general setting, whether the conditional extreme-value index is positive, negative, or zero. Specifically, we elucidate their limit distributions when they are located in the range of the data or near and even beyond the sample boundary, under technical conditions that link the speed of convergence of their (intermediate or extreme) order with the oscillations of the quantile function and a von-Mises property of the conditional distribution. A simulation experiment and an illustration on real data were proposed. The real data are the American electric data where the estimation of conditional extremes is found to be of genuine interest.

91 citations


Journal ArticleDOI
TL;DR: Under appropriate conditions, it is shown that the local solution obtained by this procedure recovers the set of nonzero coefficients without suffering from the bias of Lasso relaxation, which complements parameter estimation results of this procedure.
Abstract: A number of recent work studied the effectiveness of feature selection using Lasso. It is known that under the restricted isometry properties (RIP), Lasso does not generally lead to the exact recovery of the set of nonzero coefficients, due to the looseness of convex relaxation. This paper considers the feature selection property of nonconvex regularization, where the solution is given by a multi-stage convex relaxation scheme. The nonconvex regularizer requires two tuning parameters (compared to one tuning parameter for Lasso). Although the method is more complex than Lasso, we show that under appropriate conditions including the dependence of a tuning parameter on the support set size, the local solution obtained by this procedure recovers the set of nonzero coefficients without suffering from the bias of Lasso relaxation, which complements parameter estimation results of this procedure in (J. Mach. Learn. Res. 11 (2011) 1087–1107).

Journal ArticleDOI
TL;DR: In this paper, the stepwise semiparametric estimator (SSP) is proposed for the Gaussian copula and shown to be computationally tractable even in high dimensions.
Abstract: We explore various estimators for the parameters of a pair-copula construction (PCC), among those the stepwise semiparametric (SSP) estimator, designed for this dependence structure. We present its asymptotic properties, as well as the estimation algorithm for the two most common types of PCCs. Compared to the considered alternatives, that is, maximum likelihood, inference functions for margins and semiparametric estimation, SSP is in general asymptotically less efficient. As we show in a few examples, this loss of efficiency may however be rather low. Furthermore, SSP is semiparametrically efficient for the Gaussian copula. More importantly, it is computationally tractable even in high dimensions, as opposed to its competitors. In any case, SSP may provide start values, required by the other estimators. It is also well suited for selecting the pair-copulae of a PCC for a given data set.

Journal ArticleDOI
TL;DR: In this paper, a technique for computing the cumulants of the Rosenblatt distribution in terms of shifted chi-squared distributions is described, and the coefficients of this expansion are derived to obtain the Levy-Khintchine formula and derive asymptotic properties of the Levy measure.
Abstract: This paper studies various distributional properties of the Rosenblatt distribution. We begin by describing a technique for computing the cumulants. We then study the expansion of the Rosenblatt distribution in terms of shifted chi-squared distributions. We derive the coefficients of this expansion and use these to obtain the Levy–Khintchine formula and derive asymptotic properties of the Levy measure. This allows us to compute the cumulants, moments, coefficients in the chi-square expansion and the density and cumulative distribution functions of the Rosenblatt distribution with a high degree of precision. Tables are provided and software written to implement the methods described here is freely available by request from the authors.

Journal ArticleDOI
TL;DR: In this article, a test procedure was proposed to compute the asymptotically sharp detection boundary (a) so that the maximal testing risk tends to $0$ as $M\to\infty, $N\to ∞, $P=n/N/to0, $q=m/M/to 0,1) under some additional constraints.
Abstract: We observe a $N\times M$ matrix $Y_{ij}=s_{ij}+\xi_{ij}$ with $\xi_{ij}\sim{ \mathcal {N}}(0,1)$ i.i.d. in $i,j$, and $s_{ij}\in\mathbb{R}$. We test the null hypothesis $s_{ij}=0$ for all $i,j$ against the alternative that there exists some submatrix of size $n\times m$ with significant elements in the sense that $s_{ij}\ge a>0$. We propose a test procedure and compute the asymptotical detection boundary $a$ so that the maximal testing risk tends to $0$ as $M\to\infty$, $N\to\infty$, $p=n/N\to0$, $q=m/M\to0$. We prove that this boundary is asymptotically sharp minimax under some additional constraints. Relations with other testing problems are discussed. We propose a testing procedure which adapts to unknown $(n,m)$ within some given set and compute the adaptive sharp rates. The implementation of our test procedure on synthetic data shows excellent behavior for sparse, not necessarily squared matrices. We extend our sharp minimax results in different directions: first, to Gaussian matrices with unknown variance, next, to matrices of random variables having a distribution from an exponential family (non-Gaussian) and, finally, to a two-sided alternative for matrices with Gaussian elements.

Journal ArticleDOI
TL;DR: In this article, the geometric median is defined as the minimizer of a simple convex functional that is differentiable everywhere when the distribution has no atoms, and it is possible to estimate it with online gradient algorithms.
Abstract: With the progress of measurement apparatus and the development of automatic sensors it is not unusual anymore to get thousands of samples of observations taking values in high dimension spaces such as functional spaces. In such large samples of high dimensional data, outlying curves may not be uncommon and even a few individuals may corrupt simple statistical indicators such as the mean trajectory. We focus here on the estimation of the geometric median which is a direct generalization of the real median and has nice robustness properties. The geometric median being defined as the minimizer of a simple convex functional that is differentiable everywhere when the distribution has no atoms, it is possible to estimate it with online gradient algorithms. Such algorithms are very fast and can deal with large samples. Furthermore they also can be simply updated when the data arrive sequentially. We state the almost sure consistency and the L2 rates of convergence of the stochastic gradient estimator as well as the asymptotic normality of its averaged version. We get that the asymptotic distribution of the averaged version of the algorithm is the same as the classic estimators which are based on the minimization of the empirical loss function. The performances of our averaged sequential estimator, both in terms of computation speed and accuracy of the estimations, are evaluated with a small simulation study. Our approach is also illustrated on a sample of more 5000 individual television audiences measured every second over a period of 24 hours.

Journal ArticleDOI
TL;DR: In this paper, the authors set up a mathematical framework under which theoretical properties may be discussed and established sufficient conditions to ensure that the attributes required by each item are learnable from the data.
Abstract: Cognitive assessment is a growing area in psychological and educational measurement, where tests are given to assess mastery/deficiency of attributes or skills. A key issue is the correct identification of attributes associated with items in a test. In this paper, we set up a mathematical framework under which theoretical properties may be discussed. We establish sufficient conditions to ensure that the attributes required by each item are learnable from the data.

Journal ArticleDOI
TL;DR: In this article, a Lamperti type representation for real-valued self-similar Markov processes, killed at their hitting time of zero, was obtained, and the characteristics of the underlying processes can be computed explicitly.
Abstract: In this paper, we obtain a Lamperti type representation for real-valued self-similar Markov processes, killed at their hitting time of zero. Namely, we represent real-valued self-similar Markov processes as time changed multiplicative invariant processes. Doing so, we complete Kiu’s work [Stochastic Process. Appl. 10 (1980) 183–191], following some ideas in Chybiryakov [Stochastic Process. Appl. 116 (2006) 857–872] in order to characterize the underlying processes in this representation. We provide some examples where the characteristics of the underlying processes can be computed explicitly.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the effect of allowing the components to grow at different rates, and characterize the link between these marginal growth rates and the multivariate tail probability decay rate.
Abstract: Existing theory for multivariate extreme values focuses upon characterizations of the distributional tails when all components of a random vector, standardized to identical margins, grow at the same rate. In this paper, we consider the effect of allowing the components to grow at different rates, and characterize the link between these marginal growth rates and the multivariate tail probability decay rate. Our approach leads to a whole class of univariate regular variation conditions, in place of the single but multivariate regular variation conditions that underpin the current theories. These conditions are indexed by a homogeneous function and an angular dependence function, which, for asymptotically independent random vectors, mirror the role played by the exponent measure and Pickands’ dependence function in classical multivariate extremes. We additionally offer an inferential approach to joint survivor probability estimation. The key feature of our methodology is that extreme set probabilities can be estimated by extrapolating upon rays emanating from the origin when the margins of the variables are exponential. This offers an appreciable improvement over existing techniques where extrapolation in exponential margins is upon lines parallel to the diagonal.

Journal ArticleDOI
TL;DR: alal. as discussed by the authors describe a few less familiar mod- els (Averaging, Compulsive Gambler, Deference, Fashionista) suggested by the social network picture, as well as a few familiar ones.
Abstract: arXiv:1309.6766v1 [math.ST] 26 Sep 2013 Bernoulli 19(4), 2013, 1122–1149 DOI: 10.3150/12-BEJSP04 Interacting particle systems as stochastic social dynamics DAVID ALDOUS Statistics Dept., U.C. Berkeley, CA 94720, USA. E-mail: aldous@stat.berkeley.edu; url: www.stat.berkeley.edu/˜aldous/ The style of mathematical models known to probabilists as Interacting Particle Systems and exemplified by the Voter, Exclusion and Contact processes have found use in many academic disciplines. In many such disciplines the underlying conceptual picture is of a social network, where individuals meet pairwise and update their “state” (opinion, activity etc) in a way de- pending on the two previous states. This picture motivates a precise general setup we call Finite Markov Information Exchange (FMIE) processes. We briefly describe a few less familiar mod- els (Averaging, Compulsive Gambler, Deference, Fashionista) suggested by the social network picture, as well as a few familiar ones. Keywords: epidemic; interacting particle system; Markov chain; social network; voter model 1. Introduction What is the most broad-ranging currently active field of applied probability? Measur- ing breadth by the number of different academic disciplines where it appears, it seems hard to beat a field appearing in Physics, Computer Science and Electrical Engineering, Economics and Finance, Psychology and Sociology, Population Genetics and Epidemi- ology and Ecology. Curiously, the field has no good name, though readers of Bernoulli will most likely know it, under the continuing influence of Liggett’s 1985 book [37], as Interacting Particle Systems. It turns out that mathematically similar toy models of different real-world entities have been repeatedly re-invented in the different disciplines mentioned above, and literally thousands of papers have been written since 2000 on discipline-dependent variant models intended as more realistic. Rather than attempting a brief overview of the whole field as it stands today (on which I am certainly not an expert), let me take an opposite approach, imagining starting with a blank slate. I will set out in the next section a particular conceptual setting – what sorts of things are we trying to model? – then a particular technical setup, and then indicate what sort of mathematical results one might seek. Given this manifesto, the rest of the article describes some basic models, gives pointers to the existing literature and presents This is an electronic reprint of the original article published by the ISI/BS in Bernoulli, 2013, Vol. 19, No. 4, 1122–1149. This reprint differs from the original in pagination and typographic detail. c 2013 ISI/BS

Journal ArticleDOI
TL;DR: In this article, the authors studied the asymptotic mean square prediction error in the functional linear model with functional outputs and proved the central limit theorem for the predictor with respect to convex and exponential inequalities for the eigenvalues.
Abstract: We study prediction in the functional linear model with functional outputs. We provide the asymptotic mean square prediction error with constants. The rates we obtain are optimal in minimax sense and generalize those found, when the output is real. Conversely to previous works, our main results hold with no prior assumptions on the rate of decay of the eigenvalues of the input. This allows to consider a class of parameters which is wider than those needed in previous papers on this topic. The methods of proofs are based on convex and exponential inequalities for the eigenvalues. We also prove a central limit theorem for the predictor which improves results by Cardot, Mas and Sarda (2007) in the simpler model with scalar outputs and shows that no weak convergence result can be obtained for the bare estimate (without weak topologies or smooth norms).

Journal ArticleDOI
TL;DR: In this article, the problem of nonparametric drift estimation for one-dimensional, ergodic diffusion models from discrete-time, low-frequency data is studied and conditions for posterior consistency are given.
Abstract: We study Bayes procedures for the problem of nonparametric drift estimation for one-dimensional, ergodic diffusion models from discrete-time, low-frequency data. We give conditions for posterior consistency and verify these conditions for concrete priors, including priors based on wavelet expansions.

Journal ArticleDOI
TL;DR: In this paper, a simple continuous time model for modeling the lead-lag effect between two financial assets is proposed, where a two-dimensional process (X,Y) is a semi-martingale with respect to a certain filtration.
Abstract: We propose a simple continuous time model for modeling the lead-lag effect between two financial assets. A two-dimensional process $(X_{t},Y_{t})$ reproduces a lead-lag effect if, for some time shift $\vartheta\in\mathbb{R} $, the process $(X_{t},Y_{t+\vartheta})$ is a semi-martingale with respect to a certain filtration. The value of the time shift $\vartheta$ is the lead-lag parameter. Depending on the underlying filtration, the standard no-arbitrage case is obtained for $\vartheta=0$. We study the problem of estimating the unknown parameter $\vartheta\in\mathbb{R}$, given randomly sampled non-synchronous data from $(X_{t})$ and $(Y_{t})$. By applying a certain contrast optimization based on a modified version of the Hayashi–Yoshida covariation estimator, we obtain a consistent estimator of the lead-lag parameter, together with an explicit rate of convergence governed by the sparsity of the sampling design.

Journal ArticleDOI
TL;DR: An efficient computational algorithm is developed which uses existing quantile regression code and can be efficiently implemented for bootstrap-type inference, and it is shown that this algorithm is consistent and asymptotically normal.
Abstract: We propose a censored quantile regression estimator motivated by unbiased estimating equations. Under the usual conditional independence assumption of the survival time and the censoring time given the covariates, we show that the proposed estimator is consistent and asymptotically normal. We develop an efficient computational algorithm which uses existing quantile regression code. As a result, bootstrap-type inference can be efficiently implemented. We illustrate the finite-sample performance of the proposed method by simulation studies and analysis of a survival data set.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of simultaneous variable selection and estimation in additive, partially linear models for longitudinal/clustered data and propose an estimation procedure via polynomial splines to estimate the nonparametric components and apply proper penalty functions to achieve sparsity in the linear part.
Abstract: We consider the problem of simultaneous variable selection and estimation in additive, partially linear models for longitudinal/clustered data. We propose an estimation procedure via polynomial splines to estimate the nonparametric components and apply proper penalty functions to achieve sparsity in the linear part. Under reasonable conditions, we obtain the asymptotic normality of the estimators for the linear components and the consistency of the estimators for the nonparametric components. We further demonstrate that, with proper choice of the regularization parameter, the penalized estimators of the non-zero coefficients achieve the asymptotic oracle property. The finite sample behavior of the penalized estimators is evaluated with simulation studies and illustrated by a longitudinal CD4 cell count data set.

Journal ArticleDOI
TL;DR: In this paper, the existence of the density associated with the exponential functional of the Levy process was studied and it was shown that the density of the function satisfies an integral equation that generalizes that reported by Carmona et al. [7].
Abstract: In this paper, we study the existence of the density associated with the exponential functional of the Levy process $\xi$, \[I_{\mathbf{e} _{q}}:=\int_{0}^{\mathbf{e} _{q}}\mathrm{e}^{\xi_{s}}\,\mathrm{d}s,\] where $\mathbf{e} _{q}$ is an independent exponential r.v. with parameter $q\geq0$. In the case where $\xi$ is the negative of a subordinator, we prove that the density of $I_{\mathbf{e}_{q}}$, here denoted by $k$, satisfies an integral equation that generalizes that reported by Carmona et al. [7]. Finally, when $q=0$, we describe explicitly the asymptotic behavior at $0$ of the density $k$ when $\xi$ is the negative of a subordinator and at $\infty$ when $\xi$ is a spectrally positive Levy process that drifts to $+\infty$.

Journal ArticleDOI
TL;DR: In this paper, a non-asymptotic analysis of MCMC estimators is presented for geometrically and polynomially ergodic Markov chains.
Abstract: We address the problem of upper bounding the mean square error of MCMC estimators. Our analysis is non-asymptotic. We first establish a general result valid for essentially all ergodic Markov chains encountered in Bayesian computation and a possibly unbounded target function f: The bound is sharp in the sense that the leading term is exactly �2 as(P; f)=n, where �2 as(P; f) is the CLT asymptotic variance. Next, we proceed to specific assumptions and give explicit computable bounds for geometrically and polynomially ergodic Markov chains. As a corollary we provide results on confidence estimation.

Journal ArticleDOI
TL;DR: In this paper, a fully data-driven estimator for the inverse problem is proposed, where the error density φ is unknown and the additive measurement error is independent of X. The objective of this paper is the construction of a fully-data-driven estimation procedure when the estimation procedure is unknown, and it does not require any prior knowledge of the error distribution.
Abstract: We consider a circular deconvolution problem, where the density f of a circular random variable X has to be estimated nonparametrically based on an iid. sample from a noisy observation Y of X. The additive measurement error is supposed to be independent of X. The objective of this paper is the construction of a fully data-driven estimation procedure when the error density φ is unknown. However, we suppose that in addition to the iid. sample from Y , we have at our disposal an additional iid. sample independently drawn from the error distribution. First, we develop a minimax theory in terms of both sample sizes. However, the proposed orthogonal series estimator requires an optimal choice of a dimension parameter depending on certain characteristics of f and φ, which are not known in practice. The main issue addressed in our work is the adaptive choice of this dimension parameter using a model selection approach. In a first step, we develop a penalized minimum contrast estimator supposing the degree of ill-posedness of the underlying inverse problem to be known, which amounts to assuming partial knowledge of the error distribution. We show that this data-driven estimator can attain the lower risk bound up to a constant in both sample sizes n and m over a wide range of density classes covering in particular ordinary and super smooth densities. Finally, by randomizing the penalty and the collection of models, we modify the estimator such that it does not require any prior knowledge of the error distribution anymore. Even when dispensing with any hypotheses on φ,this fully data-driven estimator still preserves minimax optimality in almost the same cases as the partially adaptive estimator.

Journal ArticleDOI
TL;DR: In this paper, the authors develop a new formulation of Stein's method to obtain computable upper bounds on the total variation distance between the geometric distribution and a distribution of interest, which reduces the problem to the construction of a coupling between the original distribution and the discrete equilibrium distribution from renewal theory.
Abstract: We develop a new formulation of Stein's method to obtain computable upper bounds on the total variation distance between the geometric distribution and a distribution of interest. Our framework reduces the problem to the construction of a coupling between the original distribution and the "discrete equilibrium" distribution from renewal theory.We illustrate the approach in four non-trivial examples: the geometric sum of independent, non-negative, integer-valued random variables having common mean, the generation size of the critical Galton-Watson process conditioned on non-extinction, the in-degree of a randomly chosen node in the uniform attachment random graph model and the total degree of both a fixed and randomly chosen node in the preferential attachment random graph model. © 2013 ISI/BS.

Journal ArticleDOI
TL;DR: In this article, a Kolmogorov-Smirnov-type distance between the true time varying spectral density and its best approximation through a stationary spectral density is used to test the assumption of stationarity in locally stationary processes.
Abstract: In this paper we investigate the problem of testing the assumption of stationarity in locally stationary processes. The test is based on an estimate of a Kolmogorov–Smirnov type distance between the true time varying spectral density and its best approximation through a stationary spectral density. Convergence of a time varying empirical spectral process indexed by a class of certain functions is proved, and furthermore the consistency of a bootstrap procedure is shown which is used to approximate the limiting distribution of the test statistic. Compared to other methods proposed in the literature for the problem of testing for stationarity the new approach has at least two advantages: On one hand, the test can detect local alternatives converging to the null hypothesis at any rate $g_{T}\to0$ such that $g_{T}T^{1/2}\to\infty$, where $T$ denotes the sample size. On the other hand, the estimator is based on only one regularization parameter while most alternative procedures require two. Finite sample properties of the method are investigated by means of a simulation study, and a comparison with several other tests is provided which have been proposed in the literature.

Journal ArticleDOI
TL;DR: In this paper, the authors provide existence results and comparison principles for solutions of backward stochastic difference equations (BS$\Delta$Es) and then prove convergence of these to solutions of BSDEs when the mesh size of the time-discretizaton goes to zero.
Abstract: We provide existence results and comparison principles for solutions of backward stochastic difference equations (BS$\Delta$Es) and then prove convergence of these to solutions of backward stochastic differential equations (BSDEs) when the mesh size of the time-discretizaton goes to zero. The BS$\Delta$Es and BSDEs are governed by drivers $f^{N}(t,\omega,y,z)$ and $f(t,\omega,y,z),$ respectively. The new feature of this paper is that they may be non-Lipschitz in $z$. For the convergence results it is assumed that the BS$\Delta$Es are based on $d$-dimensional random walks $W^{N}$ approximating the $d$-dimensional Brownian motion $W$ underlying the BSDE and that $f^{N}$ converges to $f$. Conditions are given under which for any bounded terminal condition $\xi$ for the BSDE, there exist bounded terminal conditions $\xi^{N}$ for the sequence of BS$\Delta$Es converging to $\xi$, such that the corresponding solutions converge to the solution of the limiting BSDE. An important special case is when $f^{N}$ and $f$ are convex in $z.$ We show that in this situation, the solutions of the BS$\Delta$Es converge to the solution of the BSDE for every uniformly bounded sequence $\xi^{N}$ converging to $\xi$. As a consequence, one obtains that the BSDE is robust in the sense that if $(W^{N},\xi^{N})$ is close to $(W,\xi)$ in distribution, then the solution of the $N$th BS$\Delta$E is close to the solution of the BSDE in distribution too.