scispace - formally typeset
Search or ask a question

Showing papers in "Annals of the Institute of Statistical Mathematics in 2010"


Journal ArticleDOI
TL;DR: A generalised two-filter smoothing formula is proposed which only requires approximating probability distributions and applies to any state–space model, removing the need to make restrictive assumptions used in previous approaches to this problem.
Abstract: Two-filter smoothing is a principled approach for performing optimal smoothing in non-linear non-Gaussian state-space models where the smoothing dis- tributions are computed through the combination of 'forward' and 'backward' time filters. The 'forward' filter is the standard Bayesian filter but the 'backward' filter, generally referred to as the backward information filter, is not a probability measure on the space of the hidden Markov process. In cases where the backward information filter can be computed in closed form, this technical point is not important. However, forgeneralstate-spacemodelswherethereisnoclosedformexpression,thisprohibits the use of flexible numerical techniques such as Sequential Monte Carlo (SMC) to approximate the two-filter smoothing formula. We propose here a generalised two- filter smoothing formula which only requires approximating probability distributions and applies to any state-space model, removing the need to make restrictive assump- tions used in previous approaches to this problem. SMC algorithms are developed to implement this generalised recursion and we illustrate their performance on various problems.

337 citations


Journal ArticleDOI
TL;DR: In this paper, a new model averaging estimator based on model selection with Akaike's AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly "best" model from a (often large) set of models employing many predictor variables.
Abstract: In situations where limited knowledge of a system exists and the ratio of data points to variables is small, variable selection methods can often be misleading. Freedman (Am Stat 37:152–155, 1983) demonstrated how common it is to select completely unrelated variables as highly “significant” when the number of data points is similar in magnitude to the number of variables. A new type of model averaging estimator based on model selection with Akaike’s AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly “best” model from a (often large) set of models employing many predictor variables. The new model averaging estimator helps reduce these problems and provides confidence interval coverage at the nominal level while traditional stepwise selection has poor inferential properties.

288 citations


Journal ArticleDOI
TL;DR: A method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data, is proposed and found that the method selected the correct clustering variables, and led to improvements in classification performance and in accuracy of the choice of the number of classes.
Abstract: We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable’s usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNPs.

145 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed an alternative derivation of the hazard function of T*(θ petertodd 0) under the one-parameter model, where the distribution of T * is doubly truncated to an interval (U*, V*) depending on Z*, where Z* is the redshift of a quasar and θ isEnabled 0 is the true value of evolution parameter.
Abstract: One of the principal goals of the quasar investigations is to study luminosity evolution. A convenient one-parameter model for luminosity says that the expected log luminosity, T*, increases linearly as θ 0· log(1 + Z*), and T*(θ 0) = T* − θ 0· log(1 + Z*) is independent of Z*, where Z* is the redshift of a quasar and θ 0 is the true value of evolution parameter. Due to experimental constraints, the distribution of T* is doubly truncated to an interval (U*, V*) depending on Z*, i.e., a quadruple (T*, Z*, U*, V*) is observable only when U* ≤ T* ≤ V*. Under the one-parameter model, T*(θ 0) is independent of (U*(θ 0), V*(θ 0)), where U*(θ 0) = U* − θ 0· log(1 + Z*) and V*(θ 0) = V* − θ 0· log(1 + Z*). Under this assumption, the nonparametric maximum likelihood estimate (NPMLE) of the hazard function of T*(θ 0) (denoted by ĥ) was developed by Efron and Petrosian (J Am Stat Assoc 94:824–834, 1999). In this note, we present an alternative derivation of ĥ. Besides, the NPMLE of distribution function of T*(θ 0), $${\hat F}$$ , will be derived through an inverse-probability-weighted (IPW) approach. Based on Theorem 3.1 of Van der Laan (1996), we prove the consistency and asymptotic normality of the NPMLE $${\hat F}$$ under certain condition. For testing the null hypothesis $${H_{\theta_0}: T^{\ast}(\theta_0) = T^{\ast}-\theta_0\cdot \log(1 + Z^{\ast})}$$ is independent of Z*, (Efron and Petrosian in J Am Stat Assoc 94:824–834, 1999). proposed a truncated version of the Kendall’s tau statistic. However, when T* is exponential distributed, the testing procedure is futile. To circumvent this difficulty, a modified testing procedure is proposed. Simulations show that the proposed test works adequately for moderate sample size.

71 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied multivariate normal models that are described by linear constraints on the inverse of the covariance matrix, and examined the problems at the interface of statistics and optimization from the perspective of convex algebraic geometry.
Abstract: We study multivariate normal models that are described by linear constraints on the inverse of the covariance matrix. Maximum likelihood estimation for such models leads to the problem of maximizing the determinant function over a spectrahedron, and to the problem of characterizing the image of the positive definite cone under an arbitrary linear projection. These problems at the interface of statistics and optimization are here examined from the perspective of convex algebraic geometry.

68 citations


Journal ArticleDOI
TL;DR: In this article, two unbiased estimators of the wavelet variance when the observed time series is "gappy", i.e., it is sampled at regular intervals, but certain observations are missing.
Abstract: The wavelet variance is a scale-based decomposition of the process variance for a time series and has been used to analyze, for example, time deviations in atomic clocks, variations in soil properties in agricultural plots, accumulation of snow fields in the polar regions and marine atmospheric boundary layer turbulence We propose two new unbiased estimators of the wavelet variance when the observed time series is ‘gappy,’ ie, is sampled at regular intervals, but certain observations are missing We deduce the large sample properties of these estimators and discuss methods for determining an approximate confidence interval for the wavelet variance We apply our proposed methodology to series of gappy observations related to atmospheric pressure data and Nile River minima

66 citations


Journal ArticleDOI
TL;DR: A Bayesian approach to this problem based on state-space representations of point processes is reviewed, and the way these methods are used in decoding motor cortical activity, in which the hand motion is reconstructed from neural spike trains is described.
Abstract: Perception, memory, learning, and decision making are processes carried out in the brain. The performance of such intelligent tasks is made possible by the communication of neurons through sequences of voltage pulses called spike trains. It is of great interest to have methods of extracting information from spike trains in order to learn about their relationship to behavior. In this article, we review a Bayesian approach to this problem based on state-space representations of point processes. We discuss some of the theory and we describe the way these methods are used in decoding motor cortical activity, in which the hand motion is reconstructed from neural spike trains.

62 citations


Journal ArticleDOI
Thomas Kahle1
TL;DR: Bilomials as mentioned in this paper is a package for the computer algebra system Macaulay 2, which specializes well-known algorithms to binomial ideals, such as primary decomposition.
Abstract: We present Binomials, a package for the computer algebra system Macaulay 2, which specializes well-known algorithms to binomial ideals. These come up frequently in algebraic statistics and commutative algebra, and it is shown that significant speedup of computations like primary decomposition is possible. While central parts of the implemented algorithms go back to a paper of Eisenbud and Sturmfels, we also discuss a new algorithm for computing the minimal primes of a binomial ideal. All decompositions make significant use of combinatorial structure found in binomial ideals, and to demonstrate the power of this approach we show how Binomials was used to compute primary decompositions of commuting birth and death ideals of Evans et al., yielding a counterexample for their conjectures.

52 citations


Journal ArticleDOI
TL;DR: A Bayesian information criterion type approach is proposed and used to obtain a data-driven procedure which is proved to automatically select asymptotically optimal tuning parameters and achieves the so-called oracle property.
Abstract: We consider the median regression with a LASSO-type penalty term for variable selection. With the fixed number of variables in regression model, a two-stage method is proposed for simultaneous estimation and variable selection where the degree of penalty is adaptively chosen. A Bayesian information criterion type approach is proposed and used to obtain a data-driven procedure which is proved to automatically select asymptotically optimal tuning parameters. It is shown that the resultant estimator achieves the so-called oracle property. The combination of the median regression and LASSO penalty is computationally easy to implement via the standard linear programming. A random perturbation scheme can be made use of to get simple estimator of the standard error. Simulation studies are conducted to assess the finite-sample performance of the proposed method. We illustrate the methodology with a real example.

52 citations


Journal ArticleDOI
Masayuki Uchida1
TL;DR: The model selection problem for discretely observed ergodic multi-dimensional diffusion processes is considered and a contrast function based on a locally Gaussian approximation of the transition density is proposed and the contrast-based information criterion is proposed.
Abstract: In this paper, we consider the model selection problem for discretely observed ergodic multi-dimensional diffusion processes. In order to evaluate the statistical models, Akaike’s information criterion (AIC) is a useful tool. Since AIC is constructed by the maximum log likelihood and the dimension of the parameter space, it may look easy to get AIC even for discretely observed diffusion processes. However, there is a serious problem that a transition density of a diffusion process does not generally have an explicit form. Instead of the exact log-likelihood, we use a contrast function based on a locally Gaussian approximation of the transition density and we propose the contrast-based information criterion.

46 citations


Journal ArticleDOI
TL;DR: In this paper, an infinite-dimensional information manifold based on exponential Orlicz spaces without using the notion of exponential convergence is constructed, and convex mixtures of probability densities lie on the same connected component of this manifold, and characterize the class of densities for which this mixture can be extended to an open segment containing the extreme points.
Abstract: We construct an infinite-dimensional information manifold based on exponential Orlicz spaces without using the notion of exponential convergence. We then show that convex mixtures of probability densities lie on the same connected component of this manifold, and characterize the class of densities for which this mixture can be extended to an open segment containing the extreme points. For this class, we define an infinite-dimensional analogue of the mixture parallel transport and prove that it is dual to the exponential parallel transport with respect to the Fisher information. We also define α-derivatives and prove that they are convex mixtures of the extremal (±1)-derivatives.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed new estimators for the tail index of a heavy tailed distribution when only a few largest values are observed within blocks, which are proved to be asymptotically normal under suitable conditions, and their Edgeworth expansions are obtained.
Abstract: This paper proposes some new estimators for the tail index of a heavy tailed distribution when only a few largest values are observed within blocks. These estimators are proved to be asymptotically normal under suitable conditions, and their Edgeworth expansions are obtained. Empirical likelihood method is also employed to construct confidence intervals for the tail index. The comparison for the confidence intervals based on the normal approximation and the empirical likelihood method is made in terms of coverage probability and length of the confidence intervals. The simulation study shows that the empirical likelihood method outperforms the normal approximation method.

Journal ArticleDOI
TL;DR: In this paper, various bias and variance reduction methods are presented for improving the bootstrap bias correction term in computing the information criterion, and the properties of these methods are investigated both in theoretical and numerical aspects, for which they use a statistical functional approach.
Abstract: We discuss the problem of constructing information criteria by applying the bootstrap methods. Various bias and variance reduction methods are presented for improving the bootstrap bias correction term in computing the bootstrap information criterion. The properties of these methods are investigated both in theoretical and numerical aspects, for which we use a statistical functional approach. It is shown that the bootstrap method automatically achieves the second-order bias correction if the bias of the first-order bias correction term is properly removed. We also show that the variance associated with bootstrapping can be considerably reduced for various model estimation procedures without any analytical argument. Monte Carlo experiments are conducted to investigate the performance of the bootstrap bias and variance reduction techniques.

Journal ArticleDOI
TL;DR: In this article, a nonparametric plug-in estimator of the underlying sum-mand distribution is proposed, which is extended to general (but known) distributions for N. The authors show how recursion formulae can be inverted for the Panjer class in general.
Abstract: A compound distribution is the distribution of a random sum, which consists of a random number N of independent identically distributed summands, independent of N. Buchmann and Grubel (Ann Stat 31:1054–1074, 2003) considered decompounding a compound Poisson distribution, i.e. given observations on a random sum when N has a Poisson distribution, they constructed a nonparametric plug-in estimator of the underlying summand distribution. This approach is extended here to that of general (but known) distributions for N. Asymptotic normality of the proposed estimator is established, and bootstrap methods are used to provide confidence bounds. Finally, practical implementation is discussed, and tested on simulated data. In particular we show how recursion formulae can be inverted for the Panjer class in general, as well as for an example drawn from the Willmot class.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an alternative estimator of m(x) that, under certain conditions, does not share the inconsistency problems in the right tail (Beran 1981, Technical Report, University of California, Berkeley).
Abstract: Consider the random vector (X, Y), where X is completely observed and Y is subject to random right censoring. It is well known that the completely non-parametric kernel estimator of the conditional distribution F(.|x) of Y given X = x suffers from inconsistency problems in the right tail (Beran 1981, Technical Report, University of California, Berkeley), and hence any location function m(x) that involves the right tail of F(.|x) (like the conditional mean) cannot be estimated consistently in a completely nonparametric way. In this paper, we propose an alternative estimator of m(x), that, under certain conditions, does not share the above inconsistency problems. The estimator is constructed under the model Y = m (X) + sigma (X)epsilon, where sigma (.) is an unknown scale function and epsilon (with location zero and scale one) is independent of X. We obtain the asymptotic properties of the proposed estimator of m(x), we compare it with the completely nonparametric estimator via simulations and apply it to a study of quasars in astronomy.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric data-driven estimator that matches the performance of an oracle is proposed, which adapts to an unknown design of predictors, performs a dimension reduction if the response does not depend on the predictor, and minimax over a vast set of anisotropic bivariate function classes.
Abstract: Conditional density estimation in a parametric regression setting, where the problem is to estimate a parametric density of the response given the predictor, is a classical and prominent topic in regression analysis. This article explores this problem in a nonparametric setting where no assumption about shape of an underlying conditional density is made. For the first time in the literature, it is proved that there exists a nonparametric data-driven estimator that matches performance of an oracle which: (i) knows the underlying conditional density, (ii) adapts to an unknown design of predictors, (iii) performs a dimension reduction if the response does not depend on the predictor, (iv) is minimax over a vast set of anisotropic bivariate function classes. All these results are established via an oracle inequality which is on par with ones known in the univariate density estimation literature. Further, the asymptotically optimal estimator is tested on an interesting actuarial example which explores a relationship between credit scoring and premium for basic auto-insurance for 54 undergraduate college students.

Journal ArticleDOI
TL;DR: In this article, the authors studied the problem of computing universal Grobner bases for bounded two-way contingency tables with an upper bound on cells and showed that when these bounds on cells are positive, the set of basic moves of all 2 × 2 minors connected all incomplete contingency tables under independence model.
Abstract: In this paper we study the computation of Markov bases for contingency tables whose cell entries have an upper bound. It is known that in this case one has to compute universal Grobner bases, and this is often infeasible also in small- and medium-sized problems. Here we focus on bounded two-way contingency tables under independence model. We show that when these bounds on cells are positive the set of basic moves of all 2 × 2 minors connects all tables with given margins. We also give some results about bounded incomplete table and we conclude with an open problem on the necessary and sufficient condition on the set of structural zeros so that the set of basic moves of all 2 × 2 minors connects all incomplete contingency tables with given positive margins.

Journal ArticleDOI
TL;DR: In this paper, a general model selection procedure on the basis of arbitrary projective estimates, which does not need the knowledge of the noise correlation function, is proposed, and a non-asymptotic upper bound for the ORI -risk (oracle inequality) has been derived under mild conditions on the noise.
Abstract: This paper considers the problem of estimating a periodic function in a continuous time regression model with an additive stationary Gaussian noise having unknown correlation function. A general model selection procedure on the basis of arbitrary projective estimates, which does not need the knowledge of the noise correlation function, is proposed. A non-asymptotic upper bound for $${\mathcal{L}_2}$$ -risk (oracle inequality) has been derived under mild conditions on the noise. For the Ornstein–Uhlenbeck noise the risk upper bound is shown to be uniform in the nuisance parameter. In the case of Gaussian white noise the constructed procedure has some advantages as compared with the procedure based on the least squares estimates (LSE). The asymptotic minimaxity of the estimates has been proved. The proposed model selection scheme is extended also to the estimation problem based on the discrete data applicably to the situation when high frequency sampling can not be provided.

Journal ArticleDOI
TL;DR: In this article, the authors proposed smoothing the occurrence patterns in a clustered space-time process, in particular the data from an earthquake catalogue, by fitting a temporal version of the ETAS model, and the occurrence times are transformed by using the cumulative form of the fitted model.
Abstract: The following steps are suggested for smoothing the occurrence patterns in a clustered space–time process, in particular the data from an earthquake catalogue. First, the original data is fitted by a temporal version of the ETAS model, and the occurrence times are transformed by using the cumulative form of the fitted ETAS model. Then the transformed data (transformed times and original locations) is smoothed by a space–time kernel with bandwidth obtained by optimizing a naive likelihood cross-validation. Finally, the estimated intensity for the original data is obtained by back-transforming the estimated intensity for the transformed data. This technique is used to estimate the intensity for earthquake occurrence data for associated with complex sequences of events off the East Coast of Tohoku district, northern Japan. The intensity so obtained is compared to the conditional intensity estimated from a full space–time ETAS model for the same data.

Journal ArticleDOI
TL;DR: In this paper, the asymptotic behavior of recursive estimation procedures is studied and the results of the analysis can be used to determine the form of the recursive procedure which is expected to have the same properties as the corresponding non-recursive one defined as a solution of the corresponding estimating equation.
Abstract: This paper is concerned with the asymptotic behaviour of estimation procedures which are recursive in the sense that each successive estimator is obtained from the previous one by a simple adjustment. The results of the paper can be used to determine the form of the recursive procedure which is expected to have the same asymptotic properties as the corresponding non-recursive one defined as a solution of the corresponding estimating equation. Several examples are given to illustrate the theory, including an application to estimation of parameters in exponential families of Markov processes.

Journal ArticleDOI
TL;DR: In this article, a simple algorithm is given and bounds are derived for the criteria, which may be used to give asymptotic Nyquist-like estimability rates as model and sample sizes increase.
Abstract: For a particular experimental design, there is interest in finding which polynomial models can be identified in the usual regression set up. The algebraic methods based on Grobner bases provide a systematic way of doing this. The algebraic method does not, in general, produce all estimable models but it can be shown that it yields models which have minimal average degree in a well-defined sense and in both a weighted and unweighted version. This provides an alternative measure to that based on “aberration” and moreover is applicable to any experimental design. A simple algorithm is given and bounds are derived for the criteria, which may be used to give asymptotic Nyquist-like estimability rates as model and sample sizes increase.

Journal ArticleDOI
TL;DR: In this article, it was shown that the marginal semigroup of a binary graph model is normal if and only if the graph is free of K4 minors, where K4 is the number of minors in the marginal cone.
Abstract: We show that the marginal semigroup of a binary graph model is normal if and only if the graph is free of K4 minors. The technique, based on the interplay of normality and the geometry of the marginal cone, has potential applications to other normality questions in algebraic statistics.

Journal ArticleDOI
TL;DR: In this article, conditions under which an invariance property holds for the class of selection distributions arising from two uncorrelated random vectors were studied and applied to sample variogram and covariogram estimators used in spatial statistics.
Abstract: We study conditions under which an invariance property holds for the class of selection distributions. First, we consider selection distributions arising from two uncorrelated random vectors. In that setting, the invariance holds for the so-called C-class and for elliptical distributions. Second, we describe the invariance property for selection distributions arising from two correlated random vectors. The particular case of the distribution of quadratic forms and its invariance, under various selection distributions, is investigated in more details. We describe the application of our invari- ance results to sample variogram and covariogram estimators used in spatial statistics and provide a small simulation study for illustration. We end with a discussion about other applications, for example such as linear models and indices of temporal/spatial dependence.

Journal ArticleDOI
TL;DR: In this paper, the moments of noncentral chi-square distributions of general degrees were studied in terms of the undirected and directed graphs, respectively, for real and complex cases.
Abstract: We provide formulas for the moments of the real and complex noncentral Wishart distributions of general degrees. The obtained formulas for the real and complex cases are described in terms of the undirected and directed graphs, respectively. By considering degenerate cases, we give explicit formulas for the moments of bivariate chi-square distributions and 2 × 2 Wishart distributions by enumerating the graphs. Noting that the Laguerre polynomials can be considered to be moments of a noncentral chi-square distributions formally, we demonstrate a combinatorial interpretation of the coefficients of the Laguerre polynomials.

Journal ArticleDOI
TL;DR: In this article, a short, overlapping series (SOS) estimator is proposed to deal with the α-mixing models in presence of rounding errors, and the asymptotic properties of SOS estimators are established when the innovations are normally distributed.
Abstract: Observations on continuous populations are often rounded when recorded due to the precision of the recording mechanism. However, classical statistical approaches have ignored the effect caused by the rounding errors. When the observations are independent and identically distributed, the exact maximum likelihood estimation (MLE) can be employed. However, if rounded data are from a dependent structure, the MLE of the parameters is difficult to calculate since the integral involved in the likelihood equation is intractable. This paper presents and examines a new approach to the parameter estimation, named as “short, overlapping series” (SOS), to deal with the α-mixing models in presence of rounding errors. We will establish the asymptotic properties of the SOS estimators when the innovations are normally distributed. Comparisons of this new approach with other existing techniques in the literature are also made by simulation with samples of moderate sizes.

Journal ArticleDOI
TL;DR: This paper considers the maximal rank problem of 3-tensors and extends Atkinson and Stephens’ and Atkinson and Lloyd’s results over the real number field.
Abstract: Tensor data are becoming important recently in various application fields. In this paper, we consider the maximal rank problem of 3-tensors and extend Atkinson and Stephens’ and Atkinson and Lloyd’s results over the real number field. We also prove the assertion of Atkinson and Stephens: \({{\rm max.rank}_{\mathbb{R}}(m,n,p) \leq m+\lfloor p/2\rfloor n}\), \({{\rm max.rank}_{\mathbb{R}}(n,n,p) \leq (p+1)n/2}\) if p is even, \({{\rm max.rank}_{\mathbb{F}}(n,n,3)\leq 2n-1}\) if \({\mathbb{F}=\mathbb{C}}\) or n is odd, and \({{\rm max.rank}_{\mathbb{F}}(m,n,3)\leq m+n-1}\) if m < n where \({\mathbb{F}}\) stands for \({\mathbb{R}}\) or \({\mathbb{C}}\).

Journal ArticleDOI
TL;DR: In this paper, a fixed-size estimation for a linear function of means from independent and normally distributed populations having unknown and respective variances is considered, and a fixed width confidence interval with required accuracy about the magnitude of the length and the confidence coefficient is constructed.
Abstract: We consider fixed-size estimation for a linear function of means from independent and normally distributed populations having unknown and respective variances We construct a fixed-width confidence interval with required accuracy about the magnitude of the length and the confidence coefficient We propose a two-stage estimation methodology having the asymptotic second-order consistency with the required accuracy The key is the asymptotic second-order analysis about the risk function We give a variety of asymptotic characteristics about the estimation methodology, such as asymptotic sample size and asymptotic Fisher-information With the help of the asymptotic second-order analysis, we also explore a number of generalizations and extensions of the two-stage methodology to such as bounded risk point estimation, multiple comparisons among components between the populations, and power analysis in equivalence tests to plan the appropriate sample size for a study

Journal ArticleDOI
TL;DR: Very ample semigroup rings of the Lawrence type are discussed in this paper. But they do not consider very ampleness of configurations arising from contingency tables, which is not the case in this paper.
Abstract: In this paper, it is proved that, if a toric ideal possesses a fundamental binomial none of whose monomials is squarefree, then the corresponding semigroup ring is not very ample. Moreover, very ample semigroup rings of Lawrence type are discussed. As an application, we study very ampleness of configurations arising from contingency tables.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of estimating a conditional distribution function in a nonparametric way, when the response variable is nonnegative, and the observational procedure is length-biased.
Abstract: In this paper we consider the problem of estimating a conditional distribution function in a nonparametric way, when the response variable is nonnegative, and the observational procedure is length-biased. We propose a proper adaptation of the estimate to right-censoring provoked by limitation in following-up. Large sample analysis of the introduced estimator is given, including rates of convergence, limiting distribution, and efficiency results. We show that the length-bias model results in less variance in estimation, when compared to methods based on observed truncation times. Practical performance of the proposed estimator is explored through simulations. Application to unemployment data analysis is provided.

Journal ArticleDOI
TL;DR: In this article, a multivariate time series model for sales count data is proposed based on the fact that setting an independent Poisson distribution to each brand's sales produces the Poisson distributions for their total number, characterized as market sales, and then, conditional on market sales the brand sales follow a multinomial distribution.
Abstract: In this paper, we propose a multivariate time series model for sales count data. Based on the fact that setting an independent Poisson distribution to each brand’s sales produces the Poisson distribution for their total number, characterized as market sales, and then, conditional on market sales, the brand sales follow a multinomial distribution, we first extend this Poisson–multinomial modeling to a dynamic model in terms of a generalized linear model. We further extend the model to contain nesting hierarchical structures in order to apply it to find the market structure in the field of marketing. As an application using point of sales time series in a store, we compare several possible hypotheses on market structure and choose the most plausible structure by using several model selection criteria, including in-sample fit, out-of-sample forecasting errors, and information criterion.