scispace - formally typeset
Search or ask a question

Showing papers in "Statistica Neerlandica in 2016"


Journal ArticleDOI
TL;DR: In this paper, a stepwise local influence approach is used to deal with data with possible masking effects, and the established results are illustrated to be effective by analyzing a stock transactions data set.
Abstract: In statistical diagnostics and sensitivity analysis, the local influence method plays an important role and has certain advantages over other methods in several situations. In this paper, we use this method to study time series of count data when employing a Poisson autoregressive model. We consider case-weights, scale, data, and additive perturbation schemes to obtain their corresponding vectors and matrices of derivatives for the measures of slope and normal curvatures. Based on the curvature diagnostics, we take a stepwise local influence approach to deal with data with possible masking effects. Finally, our established results are illustrated to be effective by analyzing a stock transactions data set.

19 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the location-scale quantile autoregression in which the location and scale parameters are subject to regime shifts and the regime changes are determined by the outcome of a latent, discrete-state Markov process.
Abstract: This paper considers the location-scale quantile autoregression in which the location and scale parameters are subject to regime shifts. The regime changes are determined by the outcome of a latent, discrete-state Markov process. The new method provides direct inference and estimate for different parts of a nonstationary time series distribution. Bayesian inference for switching regimes within a quantile,via a three-parameter asymmetric-Laplace distribution, is adapted and designed for parameter estimation. The simulation study shows reasonable accuracy and precision in model estimation. From a distribution point of view, rather than from a mean point of view, the potential of this new approach is illustrated in the empirical applications to reveal the countercyclical risk pattern of stock markets and the asymmetric persistence of real GDP growth rates and real trade-weighted exchange rates.

17 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore the usefulness of the standard AR(p) model for obtaining coherent forecasting from integer-valued time series data and carry out some simulation experiments, which show the adequacy of the proposed method over the available alternatives.
Abstract: During the last three decades, integer-valued autoregressive process of order p [or INAR(p)] based on different operators have been proposed as a natural, intuitive and maybe efficient model for integer-valued time-series data. However, this literature is surprisingly mute on the usefulness of the standard AR(p) process, which is otherwise meant for continuous-valued time-series data. In this paper, we attempt to explore the usefulness of the standard AR(p) model for obtaining coherent forecasting from integer-valued time series. First, some advantages of this standard Box–Jenkins's type AR(p) process are discussed. We then carry out our some simulation experiments, which show the adequacy of the proposed method over the available alternatives. Our simulation results indicate that even when samples are generated from INAR(p) process, Box–Jenkins's model performs as good as the INAR(p) processes especially with respect to mean forecast. Two real data sets have been employed to study the expediency of the standard AR(p) model for integer-valued time-series data.

14 citations


Journal ArticleDOI
TL;DR: In this article, the estimation and hypothesis testing problems for the partial linear regression models when some variables are distorted with errors by some unknown functions of a commonly observable confounding variable are considered, and a wild bootstrap procedure is proposed to calculate critical values.
Abstract: We consider the estimation and hypothesis testing problems for the partial linear regression models when some variables are distorted with errors by some unknown functions of commonly observable confounding variable. The proposed estimation procedure is designed to accommodate undistorted as well as distorted variables. To test a hypothesis on the parametric components, a restricted least squares estimator is proposed under the null hypothesis. Asymptotic properties for the estimators are established. A test statistic based on the difference between the residual sums of squares under the null and alternative hypotheses is proposed, and we also obtain the asymptotic properties of the test statistic. A wild bootstrap procedure is proposed to calculate critical values. Simulation studies are conducted to demonstrate the performance of the proposed procedure, and a real example is analyzed for an illustration.

12 citations


Journal ArticleDOI
TL;DR: In this article, a new randomized response model is proposed, which is shown to have a Cramer-Rao lower bound of variance that is lower than the Cramer -Rao upper bound suggested by Singh and Sedory at equal protection or greater protection of respondents.
Abstract: In this paper, a new randomized response model is proposed, which is shown to have a Cramer–Rao lower bound of variance that is lower than the Cramer–Rao lower bound of variance suggested by Singh and Sedory at equal protection or greater protection of respondents. A new measure of protection of respondents in the setup of the efficient use of two decks of cards, because of Odumade and Singh, is also suggested. The developed Cramer–Rao lower bounds of variances are compared under different situations through exact numerical illustrations. Survey data to estimate the proportion of students who have sometimes driven a vehicle after drinking alcohol and feeling over the legal limit are collected by using the proposed randomization device and then analyzed. The proposed randomized response technique is also compared with a black box technique within the same survey. A method to determine minimum sample size in randomized response sampling based on a small pilot survey is also given.

9 citations


Journal ArticleDOI
TL;DR: In this article, a two-stage procedure is proposed for segmentation of a zebrafish brain calcium image using a mixture of autoregressions (MoAR) model and a Markov random field (MRF) model.
Abstract: Time series data arise in many medical and biological imaging scenarios. In such images, a time series is obtained at each of a large number of spatially dependent data units. It is interesting to organize these data into model-based clusters. A two-stage procedure is proposed. In stage 1, a mixture of autoregressions (MoAR) model is used to marginally cluster the data. The MoAR model is fitted using maximum marginal likelihood (MMaL) estimation via a minorization–maximization (MM) algorithm. In stage 2, a Markov random field (MRF) model induces a spatial structure onto the stage 1 clustering. The MRF model is fitted using maximum pseudolikelihood (MPL) estimation via an MM algorithm. Both the MMaL and MPL estimators are proved to be consistent. Numerical properties are established for both MM algorithms. A simulation study demonstrates the performance of the two-stage procedure. An application to the segmentation of a zebrafish brain calcium image is presented.

9 citations


Journal ArticleDOI
TL;DR: This paper adopts the generalised Poisson difference distribution (GPDD) to model the goal difference of football matches and carries out the analysis in a Bayesian framework in order to incorporate external information, such as historical knowledge or data, through the prior distributions.
Abstract: The analysis of sports data, in particular football match outcomes, has always produced an immense interest among the statisticians. In this paper, we adopt the generalised Poisson difference distribution (GPDD) to model the goal difference of football matches. We discuss the advantages of the proposed model over the Poisson difference (PD) model which was also used for the same purpose. The GPDD model, like the PD model, is based on the goal difference in each game which allows us to account for the correlation without explicitly modelling it. The main advantage of the GPDD model is its flexibility in the tails by considering shorter as well as longer tails than the PD distribution. We carry out the analysis in a Bayesian framework in order to incorporate external information, such as historical knowledge or data, through the prior distributions. We model both the mean and the variance of the goal difference and show that such a model performs considerably better than a model with a fixed variance. Finally, the proposed model is fitted to the 2012-13 Italian Serie A football data and various model diagnostics are carried out to evaluate the performance of the model.

9 citations


Journal ArticleDOI
TL;DR: In this paper, a new first-order non-negative integer-valued autoregressive [INAR(1)] process with Poisson-geometric marginals based on binomial thinning was proposed for modeling time series with overdispersion.
Abstract: In this paper, we propose a new first-order non-negative integer-valued autoregressive [INAR(1)] process with Poisson–geometric marginals based on binomial thinning for modeling integer-valued time series with overdispersion. Also, the new process has, as a particular case, the Poisson INAR(1) and geometric INAR(1) processes. The main properties of the model are derived, such as probability generating function, moments, conditional distribution, higher-order moments, and jumps. Estimators for the parameters of process are proposed, and their asymptotic properties are established. Some numerical results of the estimators are presented with a discussion of the obtained results. Applications to two real data sets are given to show the potentiality of the new process.

7 citations


Journal ArticleDOI
TL;DR: In this paper, a finite mixture of censored Poisson regressions is proposed to accommodate heterogeneity and also identify clusters in right-censored count data, and an expectation maximization algorithm is developed to facilitate the estimation of such models and discuss the computational aspects of the proposed algorithm.
Abstract: While right-censored data are very common in survival analysis, they may also occur in the case of count data. The literature contains models to treat such right-censored count data. In this paper, we want to address issues of heterogeneity and clustering in this context. We propose a finite mixture of censored Poisson regressions to accommodate heterogeneity and also identify clusters in right-censored count data. We also develop an expectation maximization algorithm to facilitate the estimation of such models and discuss the computational aspects of the proposed algorithm. We then present results based on simulated data to show the effect of censoring in estimation. We also present a marketing application of the proposed approach involving the number of renewals of magazine subscriptions.

5 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed three methods for merging homogeneous clusters of observations that are grouped according to a pre-existing classification, which can simplify the analysis of the market without affecting the representativeness of the data and highlight commercial anomalies.
Abstract: This article proposes three methods for merging homogeneous clusters of observations that are grouped according to a pre-existing (known) classification. This clusterwise regression problem is at the very least compelling in analyzing international trade data, where transaction prices can be grouped according to the corresponding origin–destination combination. A proper merging of these prices could simplify the analysis of the market without affecting the representativeness of the data and highlight commercial anomalies that may hide frauds. The three algorithms proposed are based on an iterative application of the F-test and have the advantage of being extremely flexible, as they do not require to predetermine the number of final clusters, and their output depends only on a tuning parameter. Monte Carlo results show very good performances of all the procedures, whereas the application to a couple of empirical data sets proves the practical utility of the methods proposed for reducing the dimension of the market and isolating suspicious commercial behaviors.

3 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied locally D-optimal designs for such experiments with discrete-time event occurrence data by using a sequential construction algorithm and showed that the optimal designs for a linear effect of the predictor have two points that coincide with the design region's boundaries, but the design weights highly depend on the predictor effect size and its direction.
Abstract: In designing an experiment with one single, continuous predictor, the questions are composed of what is the optimal number of the predictor's values, what are these values, and how many subjects should be assigned to each of these values. In this study, locally D-optimal designs for such experiments with discrete-time event occurrence data are studied by using a sequential construction algorithm. Using the Weibull survival function for modeling the underlying time to event function, it is shown that the optimal designs for a linear effect of the predictor have two points that coincide with the design region's boundaries, but the design weights highly depend on the predictor effect size and its direction, the survival pattern, and the number of time points. For a quadratic effect of the predictor, three or four design points are needed.

Journal ArticleDOI
TL;DR: In this article, the skew-normal copula is used to capture the dependence structure within units, while the fixed and random effects coefficients are estimated through the mean of the copula.
Abstract: This paper presents a method for fitting a copula-driven generalized linear mixed models. For added flexibility, the skew-normal copula is adopted for fitting. The correlation matrix of the skew-normal copula is used to capture the dependence structure within units, while the fixed and random effects coefficients are estimated through the mean of the copula. For estimation, a Monte Carlo expectation-maximization algorithm is developed. Simulations are shown alongside a real data example from the Framingham Heart Study.

Journal ArticleDOI
TL;DR: In this paper, the estimation of linear models subject to inequality constraints with a special focus on new variance approximations for the estimated parameters is examined, for models with one inequality restriction, the proposed variance formulas are exact.
Abstract: In this paper, we examine the estimation of linear models subject to inequality constraints with a special focus on new variance approximations for the estimated parameters. For models with one inequality restriction, the proposed variance formulas are exact. The variance approximations proposed in this paper can be used in regression analysis, Kalman filtering, and balancing national accounts, when inequality constraints are to be incorporated in the estimation procedure.

Journal ArticleDOI
TL;DR: In this paper, the authors show that Cusum theory is easily adapted when the target is not the mean but some other aspect of the distribution, such as the distribution of the mean of a process.
Abstract: Cusum charts are widely used for detecting deviations of a process about a target value and also for finding evidence of change in the mean of a process. The testing theory approximates the process by a Wiener process or a Brownian bridge, respectively. For quality control, it is important that other aspects are monitored in addition to or instead of the mean. Here, we show that cusum theory is easily adapted when the target is not the mean but some other aspect of the distribution.

Journal ArticleDOI
TL;DR: In this paper, the authors show that when applied to biased estimation equations, it results in the estimates that would come from solving a bias-corrected estimation equation, making it a consistent estimator if regularity conditions hold.
Abstract: In a seminal paper, Mak, Journal of the Royal Statistical Society B, 55, 1993, 945, derived an efficient algorithm for solving non-linear unbiased estimation equations. In this paper, we show that when Mak's algorithm is applied to biased estimation equations, it results in the estimates that would come from solving a bias-corrected estimation equation, making it a consistent estimator if regularity conditions hold. In addition, the properties that Mak established for his algorithm also apply in the case of biased estimation equations but for estimates from the bias-corrected equations. The marginal likelihood estimator is obtained when the approach is applied to both maximum likelihood and least squares estimation of the covariance matrix parameters in the general linear regression model. The new approach results in two new estimators when applied to the profile and marginal likelihood functions for estimating the lagged dependent variable coefficient in the dynamic linear regression model. Monte Carlo simulation results show the new approach leads to a better estimator when applied to the standard profile likelihood. It is therefore recommended for situations in which standard estimators are known to be biased.

Journal ArticleDOI
TL;DR: In this article, the asymptotic bias and variance of the penalized spline estimator with general convex loss functions were analyzed. And the smoothing parameter selection for the minimization of the mean integrated squares error was discussed.
Abstract: Penalized splines are used in various types of regression analyses, including non-parametric quantile, robust and the usual mean regression. In this paper, we focus on the penalized spline estimator with general convex loss functions. By specifying the loss function, we can obtain the mean estimator, quantile estimator and robust estimator. We will first study the asymptotic properties of penalized splines. Specifically, we will show the asymptotic bias and variance as well as the asymptotic normality of the estimator. Next, we will discuss smoothing parameter selection for the minimization of the mean integrated squares error. The new smoothing parameter can be expressed uniquely using the asymptotic bias and variance of the penalized spline estimator. To validate the new smoothing parameter selection method, we will provide a simulation. The simulation results show that the consistency of the estimator with the proposed smoothing parameter selection method can be confirmed and that the proposed estimator has better behavior than the estimator with generalized approximate cross-validation. A real data example is also addressed.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a beta spatial linear mixed model with variable dispersion using Monte Carlo maximum likelihood, which is useful for those situations where the response variable is a rate or a proportion.
Abstract: We propose a beta spatial linear mixed model with variable dispersion using Monte Carlo maximum likelihood. The proposed method is useful for those situations where the response variable is a rate or a proportion. An approach to the spatial generalized linear mixed models using the Box–Cox transformation in the precision model is presented. Thus, the parameter optimization process is developed for both the spatial mean model and the spatial variable dispersion model. All the parameters are estimated using Markov chain Monte Carlo maximum likelihood. Statistical inference over the parameters is performed using approximations obtained from the asymptotic normality of the maximum likelihood estimator. Diagnosis and prediction of a new observation are also developed. The method is illustrated with the analysis of one simulated case and two studies: clay and magnesium contents. In the clay study, 147 soil profile observations were taken from the research area of the Tropenbos Cameroon Programme, with explanatory variables: elevation in metres above sea level, agro-ecological zone, reference soil group and land cover type. In the magnesium content, the soil samples were taken from 0- to 20-cm-depth layer at each of the 178 locations, and the response variable is related to the spatial locations, altitude and sub-region.