scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series B-statistical Methodology in 2005"


Journal ArticleDOI
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

16,538 citations


Journal ArticleDOI
TL;DR: The fused lasso is proposed, a generalization that is designed for problems with features that can be ordered in some meaningful way, and is especially useful when the number of features p is much greater than N, the sample size.
Abstract: Summary. The lasso penalizes a least squares regression by the sum of the absolute values (L1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the ‘fused lasso’, a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the L1-norm of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also sparsity of their differences—i.e. local constancy of the coefficient profile. The fused lasso is especially useful when the number of features p is much greater than N, the sample size. The technique is also extended to the ‘hinge’ loss function that underlies the support vector classifier.We illustrate the methods on examples from protein mass spectroscopy and gene expression data.

2,760 citations


Journal ArticleDOI
TL;DR: In this paper, a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration is proposed, which is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest.
Abstract: Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.

1,537 citations


Journal ArticleDOI
TL;DR: The piecewise linearity of the lasso solution path was first proved by Osborne et al. (2000), who also described an efficient algorithm for calculating the complete lasso solutions path.
Abstract: We missed an important reference in Section 3.4. In page 309 we stated that ‘. . . which is based on the recently proposed algorithm LARS of Efron et al. (2004). They proved that, starting from zero, the lasso solution paths grow piecewise linearly in a predictable way. They proposed a new algorithm called LARS to solve the entire lasso solution path efficiently by using the same order of computations as a single OLS fit. . . .’ The following sentence should have been included. The piecewise linearity of the lasso solution path was first proved by Osborne et al. (2000), who also described an efficient algorithm for calculating the complete lasso solution path. Reference Osborne, M. R., Presnell, B. and Turlach, B. A. (2000) A new approach to variable selection in least squares problems. IMA J. Numer. Anal., 20, 389–403.

499 citations


Journal ArticleDOI
TL;DR: In this article, the authors find a common structure underlying many such data sets by using a non-standard type of asymptotics: the dimension tends to ∞ while the sample size is fixed.
Abstract: Summary. High dimension, low sample size data are emerging in various areas of science. We find a common structure underlying many such data sets by using a non-standard type of asymptotics: the dimension tends to ∞ while the sample size is fixed. Our analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex. Essentially all the randomness in the data appears only as a random rotation of this simplex. This geometric representation is used to obtain several new statistical insights.

439 citations


Journal ArticleDOI
TL;DR: In this paper, the authors define residuals for point process models fitted to spatial point pattern data, and propose diagnostic plots based on them, which apply to any point process model that has a conditional intensity and may exhibit spatial heterogeneity, interpoint interaction and dependence on spatial covariates.
Abstract: Summary. We define residuals for point process models fitted to spatial point pattern data, and we propose diagnostic plots based on them. The residuals apply to any point process model that has a conditional intensity; the model may exhibit spatial heterogeneity, interpoint interaction and dependence on spatial covariates. Some existing ad hoc methods for model checking (quadrat counts, scan statistic, kernel smoothed intensity and Berman's diagnostic) are recovered as special cases. Diagnostic tools are developed systematically, by using an analogy between our spatial residuals and the usual residuals for (non-spatial) generalized linear models. The conditional intensity λ plays the role of the mean response. This makes it possible to adapt existing knowledge about model validation for generalized linear models to the spatial point process context, giving recommendations for diagnostic plots. A plot of smoothed residuals against spatial location, or against a spatial covariate, is effective in diagnosing spatial trend or co-variate effects. Q–Q-plots of the residuals are effective in diagnosing interpoint interaction.

318 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the proportion of true null hypotheses, π0, i n a multiple-hypothesis set-up, is considered and the tests are based on observed p-values.
Abstract: Summary. We consider the problem of estimating the proportion of true null hypotheses, π0 ,i n a multiple-hypothesis set-up. The tests are based on observed p-values. We first review published estimators based on the estimator that was suggested by Schweder and Spjotvoll. Then we derive new estimators based on nonparametric maximum likelihood estimation of the p-value density, restricting to decreasing and convex decreasing densities. The estimators of π0 are all derived under the assumption of independent test statistics. Their performance under dependence is investigated in a simulation study. We find that the estimators are relatively robust with respect to the assumption of independence and work well also for test statistics with moderate dependence.

302 citations


Journal ArticleDOI
TL;DR: In this paper, an exact distribution-free cross-match statistic was proposed for brain activation measured by functional magnetic resonance imaging during two linguistic tasks, comparing brains that are impaired by arteriovenous abnormalities with normal controls.
Abstract: Summary. A new test is proposed comparing two multivariate distributions by using distances between observations. Unlike earlier tests using interpoint distances, the new test statistic has a known exact distribution and is exactly distribution free. The interpoint distances are used to construct an optimal non-bipartite matching, i.e. a matching of the observations into disjoint pairs to minimize the total distance within pairs. The cross-match statistic is the number of pairs containing one observation from the first distribution and one from the second. Distributions that are very different will exhibit few cross-matches. When comparing two discrete distributions with finite support, the test is consistent against all alternatives. The test is applied to a study of brain activation measured by functional magnetic resonance imaging during two linguistic tasks, comparing brains that are impaired by arteriovenous abnormalities with normal controls. A second exact distribution-free test is also discussed: it ranks the pairs and sums the ranks of the cross-matched pairs.

276 citations


Journal ArticleDOI
TL;DR: In this paper, the existence and uniqueness of the maximum likelihood estimator of the intensity matrix is investigated for discretely observed Markov jump processes with finite state space, and it is demonstrated that the estimator can be found either by the EM algorithm or by a Markov chain Monte Carlo procedure.
Abstract: Summary. Likelihood inference for discretely observed Markov jump processes with finite state space is investigated. The existence and uniqueness of the maximum likelihood estimator of the intensity matrix are investigated. This topic is closely related to the imbedding problem for Markov chains. It is demonstrated that the maximum likelihood estimator can be found either by the EM algorithm or by a Markov chain Monte Carlo procedure. When the maximum likelihood estimator does not exist, an estimator can be obtained by using a penalized likelihood function or by the Markov chain Monte Carlo procedure with a suitable prior. The methodology and its implementation are illustrated by examples and simulation studies.

161 citations


Journal ArticleDOI
Art B. Owen1
TL;DR: In this article, the authors present a variance formula that takes account of the correlations between test statistics, and a method based on sampling pairs of tests allows the variance to be approximated at a cost that is independent of d.
Abstract: Summary. In high throughput genomic work, a very large number d of hypotheses are tested based on n≪d data samples. The large number of tests necessitates an adjustment for false discoveries in which a true null hypothesis was rejected. The expected number of false discoveries is easy to obtain. Dependences between the hypothesis tests greatly affect the variance of the number of false discoveries. Assuming that the tests are independent gives an inadequate variance formula. The paper presents a variance formula that takes account of the correlations between test statistics. That formula involves O(d2) correlations, and so a naive implementation has cost O(nd2). A method based on sampling pairs of tests allows the variance to be approximated at a cost that is independent of d.

157 citations


Journal ArticleDOI
TL;DR: The algorithm proposed improves on similar existing methods by recovering EM's ascent property with high probability, being more robust to the effect of user‐defined inputs and handling classical Monte Carlo and Markov chain Monte Carlo methods within a common framework.
Abstract: Summary. The expectation–maximization (EM) algorithm is a popular tool for maximizing likelihood functions in the presence of missing data. Unfortunately, EM often requires the evaluation of analytically intractable and high dimensional integrals. The Monte Carlo EM (MCEM) algorithm is the natural extension of EM that employs Monte Carlo methods to estimate the relevant integrals. Typically, a very large Monte Carlo sample size is required to estimate these integrals within an acceptable tolerance when the algorithm is near convergence. Even if this sample size were known at the onset of implementation of MCEM, its use throughout all iterations is wasteful, especially when accurate starting values are not available. We propose a data-driven strategy for controlling Monte Carlo resources in MCEM. The algorithm proposed improves on similar existing methods by recovering EM’s ascent (i.e. likelihood increasing) property with high probability, being more robust to the effect of user-defined inputs and handling classical Monte Carlo and Markov chain Monte Carlo methods within a common framework. Because of the first of these properties we refer to the algorithm as ‘ascent-based MCEM’. We apply ascent-based MCEM to a variety of examples, including one where it is used to accelerate the convergence of deterministic EM dramatically.

Journal ArticleDOI
TL;DR: In this paper, the authors consider high dimensional Metropolis and Langevin algorithms in their initial transient phase and give weak convergence results which explain both of these types of behaviour and practical guidance on implementation based on their theory.
Abstract: Summary. The paper considers high dimensional Metropolis and Langevin algorithms in their initial transient phase. In stationarity, these algorithms are well understood and it is now well known how to scale their proposal distribution variances. For the random-walk Metropolis algorithm, convergence during the transient phase is extremely regular—to the extent that the algo-rithm's sample path actually resembles a deterministic trajectory. In contrast, the Langevin algorithm with variance scaled to be optimal for stationarity performs rather erratically. We give weak convergence results which explain both of these types of behaviour and practical guidance on implementation based on our theory.

Journal ArticleDOI
TL;DR: In this paper, an approach to defining Bayes factors based on modelling test statistics is described. But this approach does not eliminate the subjectivity that is normally associated with the definition of Bayes factor.
Abstract: Summary. Traditionally, the use of Bayes factors has required the specification of proper prior distributions on model parameters that are implicit to both null and alternative hypotheses. I describe an approach to defining Bayes factors based on modelling test statistics. Because the distributions of test statistics do not depend on unknown model parameters, this approach eliminates much of the subjectivity that is normally associated with the definition of Bayes factors. For standard test statistics, including the χ2-, F-, t- and z-statistics, the values of Bayes factors that result from this approach have simple, closed form expressions.

Journal ArticleDOI
TL;DR: This work develops a graphical model for sequences of Gaussian random vectors when changes in the underlying graph occur at random times, and a new block of data is created with the addition or deletion of an edge.
Abstract: Summary. When modelling multivariate financial data, the problem of structural learning is compounded by the fact that the covariance structure changes with time. Previous work has focused on modelling those changes by using multivariate stochastic volatility models. We present an alternative to these models that focuses instead on the latent graphical structure that is related to the precision matrix. We develop a graphical model for sequences of Gaussian random vectors when changes in the underlying graph occur at random times, and a new block of data is created with the addition or deletion of an edge. We show how a Bayesian hierarchical model incorporates both the uncertainty about that graph and the time variation thereof.

Journal ArticleDOI
TL;DR: In this article, the authors describe quantum tomography as an inverse statistical problem in which the quantum state of a light beam is the unknown parameter and the data are given by results of measurements performed on identical quantum systems.
Abstract: We describe quantum tomography as an inverse statistical problem in which the quantum state of a light beam is the unknown parameter and the data are given by results of measurements performed on identical quantum systems. The state can be represented as an infinite dimensional density matrix or equivalently as a density on the plane called the Wigner function. We present consistency results for pattern function projection estimators and for sieve maximum likelihood estimators for both the density matrix of the quantum state and its Wigner function. We illustrate the performance of the estimators on simulated data. An EM algorithm is proposed for practical implementation. There remain many open problems, e.g. rates of convergence, adaptation and studying other estimators; a main purpose of the paper is to bring these to the attention of the statistical community.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of estimating the noise variance in homoscedastic nonparametric regression models and show that for finite sample sizes, the performance of these estimators may be deficient owing to a large finite sample bias.
Abstract: Summary. We consider the problem of estimating the noise variance in homoscedastic nonparametric regression models. For low dimensional covariates t ∈ ℝd, d=1, 2, difference-based estimators have been investigated in a series of papers. For a given length of such an estimator, difference schemes which minimize the asymptotic mean-squared error can be computed for d=1 and d=2. However, from numerical studies it is known that for finite sample sizes the performance of these estimators may be deficient owing to a large finite sample bias. We provide theoretical support for these findings. In particular, we show that with increasing dimension d this becomes more drastic. If d4, these estimators even fail to be consistent. A different class of estimators is discussed which allow better control of the bias and remain consistent when d4. These estimators are compared numerically with kernel-type estimators (which are asymptotically efficient), and some guidance is given about when their use becomes necessary.

Journal ArticleDOI
TL;DR: In this paper, a self-weighted least absolute deviation estimator is proposed and shown to be asymptotically normal if the density of errors and its derivative are uniformly bounded.
Abstract: Summary. How to undertake statistical inference for infinite variance autoregressive models has been a long-standing open problem. To solve this problem, we propose a self-weighted least absolute deviation estimator and show that this estimator is asymptotically normal if the density of errors and its derivative are uniformly bounded. Furthermore, a Wald test statistic is developed for the linear restriction on the parameters, and it is shown to have non-trivial local power. Simulation experiments are carried out to assess the performance of the theory and method in finite samples and a real data example is given. The results are entirely different from other published results and should provide new insights for future research on heavy-tailed time series.

Journal ArticleDOI
TL;DR: A new class of model‐free variable selection approaches is proposed on the basis of the theory of sufficient dimension reduction, which assumes no model of any form, require no nonparametric smoothing and allow for general predictor effects.
Abstract: computing power has encouraged the modelling of data sets of ever-increasing size Data mining applications in finance, marketing and bioinformatics are obvious examples A limitation of nearly all existing variable selection methods is the need to specify the correct model before selection When the number of predictors is large, model formulation and validation can be difficult or even infeasible On the basis of the theory of sufficient dimension reduction, we propose a new class of model-free variable selection approaches The methods proposed assume no model of any form, require no nonparametric smoothing and allow for general predictor effects The efficacy of the methods proposed is demonstrated via simulation, and an empirical example is given

Journal ArticleDOI
TL;DR: The necessary conditions for the design and the additional parameters of the experiment to be optimum are given, the algorithm for the numerical optimization procedure is presented and the relevance of these methods to dynamic systems, especially to chemical kinetic models is shown.
Abstract: Summary. The paper is concerned with a problem of finding an optimum experimental design for discriminating between two rival multiresponse models. The criterion of optimality that we use is based on the sum of squares of deviations between the models and picks up the design points for which the divergence is maximum. An important part of our criterion is an additional vector of experimental conditions, which may affect the design. We give the necessary conditions for the design and the additional parameters of the experiment to be optimum, we present the algorithm for the numerical optimization procedure and we show the relevance of these methods to dynamic systems, especially to chemical kinetic models.

Journal ArticleDOI
TL;DR: A feasible cross‐validation procedure is introduced and applied to the problem of data‐driven bandwidth choice for the smooth backfitting estimator (SBE), showing that the SBE is less affected by sparseness of data in high dimensional regression problems or strongly correlated designs.
Abstract: Summary. Compared with the classical backfitting of Buja, Hastie and Tibshirani, the smooth backfitting estimator (SBE) of Mammen, Linton and Nielsen not only provides complete asymptotic theory under weaker conditions but is also more efficient, robust and easier to calculate. However, the original paper describing the SBE method is complex and the practical as well as the theoretical advantages of the method have still neither been recognized nor accepted by the statistical community. We focus on a clear presentation of the idea, the main theoretical results and practical aspects like implementation and simplification of the algorithm. We introduce a feasible cross-validation procedure and apply it to the problem of data-driven bandwidth choice for the SBE. By simulations it is shown that the SBE and our cross-validation work very well indeed. In particular, the SBE is less affected by sparseness of data in high dimensional regression problems or strongly correlated designs. The SBE has reasonable performance even in 100-dimensional additive regression problems.

Journal ArticleDOI
TL;DR: It is shown through simulation and examples that support vector machine models with multiple shrinkage parameters produce fewer misclassification errors than several existing classical methods as well as Bayesian methods based on the logistic likelihood or those involving only one shrinkage parameter.
Abstract: Summary. Precise classification of tumours is critical for the diagnosis and treatment of cancer. Diagnostic pathology has traditionally relied on macroscopic and microscopic histology and tumour morphology as the basis for the classification of tumours. Current classification frameworks, however, cannot discriminate between tumours with similar histopathologic features, which vary in clinical course and in response to treatment. In recent years, there has been a move towards the use of complementary deoxyribonucleic acid microarrays for the classification of tumours. These high throughput assays provide relative messenger ribonucleic acid expression measurements simultaneously for thousands of genes. A key statistical task is to perform classification via different expression patterns. Gene expression profiles may offer more information than classical morphology and may provide an alternative to classical tumour diagnosis schemes. The paper considers several Bayesian classification methods based on reproducing kernel Hilbert spaces for the analysis of microarray data. We consider the logistic likelihood as well as likelihoods related to support vector machine models. It is shown through simulation and examples that support vector machine models with multiple shrinkage parameters produce fewer misclassification errors than several existing classical methods as well as Bayesian methods based on the logistic likelihood or those involving only one shrinkage parameter.

Journal ArticleDOI
TL;DR: In this paper, the spectral-in-time model is applied to a data set of daily winds at 11 sites in Ireland over 18 years, and spectral and space-time domain diagnostic procedures are used to assess the quality of the fits.
Abstract: Summary. Meteorological and environmental data that are collected at regular time intervals on a fixed monitoring network can be usefully studied combining ideas from multiple time series and spatial statistics, particularly when there are little or no missing data. This work investigates methods for modelling such data and ways of approximating the associated likelihood functions. Models for processes on the sphere crossed with time are emphasized, especially models that are not fully symmetric in space–time. Two approaches to obtaining such models are described. The first is to consider a rotated version of fully symmetric models for which we have explicit expressions for the covariance function. The second is based on a representation of space–time covariance functions that is spectral in just the time domain and is shown to lead to natural partially nonparametric asymmetric models on the sphere crossed with time. Various models are applied to a data set of daily winds at 11 sites in Ireland over 18 years. Spectral and space–time domain diagnostic procedures are used to assess the quality of the fits. The spectral-in-time modelling approach is shown to yield a good fit to many properties of the data and can be applied in a routine fashion relative to finding elaborate parametric models that describe the space–time dependences of the data about as well.

Journal ArticleDOI
TL;DR: In this paper, a new methodology for statistical inference for final outcome infectious disease data using certain structured population stochastic epidemic models is proposed, which imputes missing information in the form of a random graph that describes the potential infectious contacts between individuals.
Abstract: Summary. The paper is concerned with new methodology for statistical inference for final outcome infectious disease data using certain structured population stochastic epidemic models. A major obstacle to inference for such models is that the likelihood is both analytically and numerically intractable. The approach that is taken here is to impute missing information in the form of a random graph that describes the potential infectious contacts between individuals. This level of imputation overcomes various constraints of existing methodologies and yields more detailed information about the spread of disease. The methods are illustrated with both real and test data.

Journal ArticleDOI
TL;DR: In this article, it is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbor classifiers provided that the resample size is less than 69% of the actual sample size, in the case of with-replacement bagging and less than 50% of sample size for without-replacements bagging.
Abstract: Summary. It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69% of the actual sample size, in the case of with-replacement bagging, or less than 50% of the sample size, for without-replacement bagging. However, for larger sampling fractions there is no asymptotic difference between the risk of the regular nearest neighbour classifier and its bagged version. In particular, neither achieves the large sample performance of the Bayes classifier. In contrast, when the sampling fractions converge to 0, but the resample sizes diverge to 1, the bagged classifier converges to the optimal Bayes rule and its risk converges to the risk of the latter. These results are most readily seen when the two populations have well-defined densities, but they may also be derived in other cases, where densities exist in only a relative sense. Cross-validation can be used effectively to choose the sampling fraction. Numerical calculation is used to illustrate these theoretical properties.

Journal ArticleDOI
TL;DR: In this article, it is shown that it is possible to make sequential probability forecasts that will pass any given battery of statistical tests, an easy consequence of von Neumann's minimax theorem, simplifies and generalizes work by earlier researchers.
Abstract: Summary. Building on the game theoretic framework for probability, we show that it is possible, using randomization, to make sequential probability forecasts that will pass any given battery of statistical tests. This result, an easy consequence of von Neumann's minimax theorem, simplifies and generalizes work by earlier researchers.

Journal ArticleDOI
TL;DR: This article proposed a general formulation for the analysis of data with incomplete observations and developed approximations to the resulting bias of maximum likelihood estimates on the assumption that model departures are small, and Doubling variances before calculating confidence intervals or test statistics is suggested as a crude way of addressing the possibility of undetectably small departures from the model.
Abstract: Problems of the analysis of data with incomplete observations are all too familiar in statistics. They are doubly difficult if we are also uncertain about the choice of model. We propose a general formulation for the discussion of such problems and develop approximations to the resulting bias of maximum likelihood estimates on the assumption that model departures are small. Loss of efficiency in parameter estimation due to incompleteness in the data has a dual interpretation: the increase in variance when an assumed model is correct; the bias in estimation when the model is incorrect. Examples include non-ignorable missing data, hidden confounders in observational studies and publication bias in meta-analysis. Doubling variances before calculating confidence intervals or test statistics is suggested as a crude way of addressing the possibility of undetectably small departures from the model. The problem of assessing the risk of lung cancer from passive smoking is used as a motivating example.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a jackknife variance estimator which is defined for any without-replacement unequal probability sampling design and demonstrate design consistency of this estimator for a broad class of point estimators.
Abstract: The jackknife method is often used for variance estimation in sample surveys but has only been developed for a limited class of sampling designs. We propose a jackknife variance estimator which is defined for any without-replacement unequal probability sampling design. We demonstrate design consistency of this estimator for a broad class of point estimators. A Monte Carlo study shows how the proposed estimator may improve on existing estimators.

Journal ArticleDOI
TL;DR: A new method called ‘stopping‐time resampling’ is developed, which allows us to compare partially simulated samples at different stages to terminate unpromising partial samples and to multiply promising samples early on.
Abstract: Summary. Motivated by the statistical inference problem in population genetics, we present a new sequential importance sampling with resampling strategy. The idea of resampling is key to the recent surge of popularity of sequential Monte Carlo methods in the statistics and engineering communities, but existing resampling techniques do not work well for coalescent-based inference problems in population genetics. We develop a new method called ‘stopping-time resampling’, which allows us to compare partially simulated samples at different stages to terminate unpromising partial samples and to multiply promising samples early on. To illustrate the idea, we first apply the new method to approximate the solution of a Dirichlet problem and the likelihood function of a non-Markovian process. Then we focus on its application in population genetics. All our examples show that the new resampling method can significantly improve the computational efficiency of existing sequential importance sampling methods.

Journal ArticleDOI
TL;DR: In this paper, the authors developed a new class of time continuous autoregressive fractionally integrated moving average (CARFIMA) models which are useful for modelling regularly spaced and irregularly spaced discrete time long memory data.
Abstract: Summary. We develop a new class of time continuous autoregressive fractionally integrated moving average (CARFIMA) models which are useful for modelling regularly spaced and irregularly spaced discrete time long memory data. We derive the autocovariance function of a stationary CARFIMA model and study maximum likelihood estimation of a regression model with CARFIMA errors, based on discrete time data and via the innovations algorithm. It is shown that the maximum likelihood estimator is asymptotically normal, and its finite sample properties are studied through simulation. The efficacy of the approach proposed is demonstrated with a data set from an environmental study.

Journal ArticleDOI
TL;DR: In this article, a semiparametric estimated estimating equations approach is proposed to estimate the nonparametric link and variance-covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equation.
Abstract: Summary. We introduce a flexible marginal modelling approach for statistical inference for clustered and longitudinal data under minimal assumptions. This estimated estimating equations approach is semiparametric and the proposed models are fitted by quasi-likelihood regression, where the unknown marginal means are a function of the fixed effects linear predictor with unknown smooth link, and variance-covariance is an unknown smooth function of the marginal means. We propose to estimate the nonparametric link and variance-covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equations. These are score equations that contain nonparametric function estimates. The proposed estimated estimating equations approach is motivated by its flexibility and easy implementation. Moreover, if data follow a generalized linear mixed model, with either a specified or an unspecified distribution of random effects and link function, the model proposed emerges as the corresponding marginal (population-average) version and can be used to obtain inference for the fixed effects in the underlying generalized linear mixed model, without the need to specify any other components of this generalized linear mixed model. Among marginal models, the estimated estimating equations approach provides a flexible alternative to modelling with generalized estimating equations. Applications of estimated estimating equations include diagnostics and link selection. The asymptotic distribution of the proposed estimators for the model parameters is derived, enabling statistical inference. Practical illustrations include Poisson modelling of repeated epileptic seizure counts and simulations for clustered binomial responses.