scispace - formally typeset
Search or ask a question

Showing papers in "Statistica Neerlandica in 2007"


Journal ArticleDOI
TL;DR: In this article, a general class of uctuation tests for parameter instability in an M-estimation framework is suggested, based on functional central limit theorems which are derived under the null hypothesis of parameter stability and local alternatives.
Abstract: A general class of uctuation tests for parameter instability in an M-estimation framework is suggested. Tests from this framework can be constructed by rst choosing an appropriate estimation technique, deriving a partial sum process of the estimation scores which captures instabilities over time, and aggregating this process to a test statistic by using a suitable scalar functional. Inference for these tests is based on functional central limit theorems which are derived under the null hypothesis of parameter stability and local alternatives. For (generalized) linear regression models, concrete tests are derived which cover several known tests for (approximately) normal data but also allow for testing for parameter instability in regressions with binary or count data. The usefulness of the test procedures|complemented by powerful visualizations derived from these|is illustrated using Dow Jones industrial average stock returns, youth homicides in Boston, USA, and illegitimate births in Groarl, Austria.

154 citations


Journal ArticleDOI
TL;DR: The results of this analysis would appear to support the notion that playing the ‘beautiful game’ is an effective strategy—more passes and crosses contribute to more effective play and more shots on the goal.
Abstract: In this paper copulas are used to generate novel bivariate discrete distributions. These distributions are fitted to soccer data from the English Premier League. An interesting aspect of these data is that the primary variable of interest, the discrete pair shots-for and shots-against, exhibit negative dependence; thus in particular we develop bivariate Poisson-related distributions that allow such dependence. The paper focuses on Archimedian copulas, for which the dependence structure is fully determined by a 1-dimensional projection that is invariant under marginal transformations. Diagnostic plots for copula fit based on this projection are adapted to deal with discrete variables. Covariates relating to within-match contributions such as numbers of passes and tackles are introduced to explain variability in shot outcomes. The results of this analysis would appear to support the notion that playing the “beautiful game” is an effective strategy—more passes and crosses contribute to more effective play and more shots on goal.

65 citations


Journal ArticleDOI
TL;DR: This paper introduces models of robustness in flight gate assignments at airports and presents a non-robust flight gate assignment model and incorporates two approaches of robustity.
Abstract: This paper introduces models of robustness in flight gate assignments at airports. We briefly repeat the general flight gate assignment problem and disruptions occurring in airline scheduling. Recovery strategies and robust scheduling are surveyed as the main methods in disruption management. We present a non-robust flight gate assignment model and incorporate two approaches of robustness.

37 citations


Journal ArticleDOI
TL;DR: A new class of stationary nonseparable covariance functions that can be used for both geometrically and zonally anistropic data and some desirable mathematical features of this class are shown.
Abstract: There is a great demand for statistical modelling of phenomena that evolve in both space and time, and thus, there is a growing literature on covariance function models for spatio-temporal processes. Although several nonseparable space–time covariance models are available in the literature, very few of them can be used for spatially anisotropic data. In this paper, we propose a new class of stationary nonseparable covariance functions that can be used for both geometrically and zonally anistropic data. In addition, we show some desirable mathematical features of this class. Another important aspect, only partially covered by the literature, is that of spatial nonstationarity. We show a very simple criteria allowing for the construction of space–time covariance functions that are nonseparable, nonstationary in space and stationary in time. Part of the theoretical results proposed in the paper will then be used for the analysis of Irish wind speed data as in HASLETT and RAFTERY (Applied Statistics, 38, 1989, 1).

31 citations


Journal ArticleDOI
TL;DR: In this paper, several cumulative sum (CUSUM) charts for the mean of a multivariate time series are introduced and analyzed under which conditions these charts are directionally invariant.
Abstract: In this paper several cumulative sum (CUSUM) charts for the mean of a multivariate time series are introduced. We extend the control schemes for independent multivariate observations of crosier [Technometrics (1988) Vol. 30, pp. 187–194], pignatiello and runger [Journal of Quality Technology (1990) Vol. 22, pp. 173–186], and ngai and zhang [Statistica Sinica (2001) Vol. 11, pp. 747–766] to multivariate time series by taking into account the probability structure of the underlying stochastic process. We consider modified charts and residual schemes as well. It is analyzed under which conditions these charts are directionally invariant. In an extensive Monte Carlo study these charts are compared with the CUSUM scheme of theodossiu [Journal of the American Statistical Association (1993) Vol. 88, pp. 441–448], the multivariate exponentially weighted moving-average (EWMA) chart of kramer and schmid [Sequential Analysis (1997) Vol. 16, pp. 131–154], and the control procedures of bodnar and schmid [Frontiers of Statistical Process Control (2006) Physica, Heidelberg]. As a measure of the performance, the maximum expected delay is used.

30 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of disconnecting a graph by removing as few vertices as possible, such that no component of the disconnected graph has more than a given number of vertices.
Abstract: In this paper, we consider the problem of disconnecting a graph by removing as few vertices as possible, such that no component of the disconnected graph has more than a given number of vertices. We give applications of this problem, present a formulation for it, and describe some polyhedral results.Furthermore, we establish ties with other polytopes and show how these relations can be used to obtain facets of our polytope. Finally, we give some computational results.

29 citations


Journal ArticleDOI
TL;DR: In this article, a generalized jackknife estimator based on any two members of these two classes was proposed, and compared with the Hill estimator and other reduced-bias estimators available in the literature, asymptotically, and for finite samples, through the use of Monte Carlo simulation.
Abstract: In the context of regularly varying tails, we first analyze a generalization of the classical Hill estimator of a positive tail index, with members that are not asymptotically more efficient than the original one. This has led us to propose alternative classical tail index estimators, that may perform asymptotically better than the Hill estimator. As the improvement is not really significant, we also propose generalized jackknife estimators based on any two members of these two classes. These generalized jackknife estimators are compared with the Hill estimator and other reduced-bias estimators available in the literature, asymptotically, and for finite samples, through the use of Monte Carlo simulation. The finite-sample behaviour of the new reduced-bias estimators is also illustrated through a practical example in the field of finance.

26 citations


Journal ArticleDOI
TL;DR: The extreme value analysis for Archimedean copulas is generalized to the non‐Archimingean case and the constant qd describes the asymptotic dependence structure.
Abstract: We generalize the extreme value analysis for Archimedean copulas (see Alink, Lowe and Wuthrich, 2003) to the non-Archimedean case: Assume we have d≥2 exchangeable and continuously distributed risks X1,…,Xd. Under appropriate assumptions there is a constant qd such that, for all large u, we have . The constant qd describes the asymptotic dependence structure. Typically, qd will depend on more aspects of this dependence structure than the well-known tail dependence coefficient.

23 citations


Journal ArticleDOI
TL;DR: In this paper, an alternative distribution-free estimator based on nearest-neighbour estimation with a non-constant smoothing field that is better able to adapt to spatially varying features of the data pattern is presented.
Abstract: Variogram estimation plays an important role in many areas of spatial statistics. Potential areas of application include biology, ecology, economics and meteorology. However, it is common that, for example under highly correlated patterns, traditional estimators can not reflect all the spatial features or dependencies. In this paper, we present an alternative distribution-free estimator based on nearest-neighbour estimation with a non-constant smoothing field that is better able to adapt to spatially varying features of the data pattern. We present a simulation study to compare our new estimator to a nearest-neighbour estimator built with a constant smoothing parameter and to the classical variogram estimator. We apply our method to analyze two ecological data sets.

21 citations


Journal ArticleDOI
TL;DR: The paper extends the framework presented in Brauneret al. for single machine, non‐preemptive high multiplicity scheduling problems, to more general classes of problems.
Abstract: High multiplicity scheduling problems arise naturally in contemporary production settings where manufacturers combine economies of scale with high product variety. Despite their frequent occurrence in practice, the complexity of high multiplicity problems – as opposed to classical, single multiplicity problems – is in many cases not well understood. In this paper, we discuss various concepts and results that enable a better understanding of the nature and complexity of high multiplicity scheduling problems. The paper extends the framework presented in brauneret al. [journal of combinatorial optimization (2005) vol. 9, pp. 313–323] for single machine, non-preemptive high multiplicity scheduling problems, to more general classes of problems.

20 citations


Journal ArticleDOI
TL;DR: The ring(2)-exponential as discussed by the authors is a special case of the ring-exponential, and it can be seen as a special form of the standard ring-2exponential.
Abstract: Recent work on social status led to derivation of a new continuous distribution based on the exponential. The new variate, termed the ring(2)-exponential, in turn leads to derivation of two closely related new families of continuous distributions, the mirror-exponential and the ring-exponential. Both the standard exponential and the ring(2)-exponential are special cases of both the new families. In this paper, we first focus on the ring(2)-exponential, describing its derivation and examining its properties, and next introduce the two new families, describing their derivation and initiating exploration of their properties. The mirror-exponential arises naturally in the study of status; the ring-exponential arises from the mathematical structure of the ring(2)-exponential. Both have the potential for broad application in diverse contexts across science and engineering. Within sociobehavioral contexts, the new mirror-exponential may have application to the problem of approximating the form and inequality of the wage distribution.

Journal ArticleDOI
TL;DR: In this paper, a genetic algorithm for the partial constraint satisfaction problem is described, where the typical elements of a GA, selection, mutation and cross-over are filled in with combinatorial ideas.
Abstract: We describe a genetic algorithm for the partial constraint satisfaction problem. The typical elements of a genetic algorithm, selection, mutation and cross-over, are filled in with combinatorial ideas. For instance, cross-over of two solutions is performed by taking the one or two domain elements in the solutions of each of the variables as the complete domain of the variable. Then a branch-and-bound method is used for solving this small instance. When tested on a class of frequency assignment problems this genetic algorithm produced the best known solutions for all test problems. This feeds the idea that combinatorial ideas may well be useful in genetic algorithms.

Journal ArticleDOI
TL;DR: An exact non‐iterative sampling algorithm to obtain independently and identically distributed samples from posterior distribution in discrete MDPs, thus completely avoiding problems of convergence and slow convergence in iterative algorithms such as Markov chain Monte Carlo.
Abstract: Many statistical problems can be formulated as discrete missing data problems (MDPs). Examples include change-point problems, capture and recapture models, sample survey with non-response, zero-inflated Poisson models, medical screening/diagnostic tests and bioassay. This paper proposes an exact non-iterative sampling algorithm to obtain independently and identically distributed (i.i.d.) samples from posterior distribution in discrete MDPs. The new algorithm is essentially a conditional sampling, thus completely avoiding problems of convergence and slow convergence in iterative algorithms such as Markov chain Monte Carlo. Different from the general inverse Bayes formulae (IBF) sampler of Tan, Tian and Ng (Statistica Sinica, 13, 2003, 625), the implementation of the new algorithm requires neither the expectation maximization nor the sampling importance resampling algorithms. The key idea is to first utilize the sampling-wise IBF to derive the conditional distribution of the missing data given the observed data, and then to draw i.i.d. samples from the complete-data posterior distribution. We first illustrate the method with a performing example and then apply the method to contingency tables with one supplemental margin for an human immunodeficiency virus study.

Journal ArticleDOI
TL;DR: In this article, the minimum converter wavelength assignment problem is studied and three integer programming formulations are developed to minimize the number of converters and study their properties, where the first two formulations lack the power to provide non-trivial lower bounds, tight lower bounds can be computed by solving the linear relaxation of the third formulation by delayed column generation.
Abstract: With the introduction of optical switching technology in telecommunication networks, all-optical connections, so-called lightpaths, can be established. Lightpaths have to be assigned a wavelength in such a way that no two lightpaths sharing a fiber use the same wavelength. The wavelength of operation can only be exchanged by the deployment of a wavelength converter. In this paper, we study the minimum converter wavelength assignment problem. We develop three integer programming formulations to minimize the number of converters and study their properties. Where the first two formulations lack the power to provide non-trivial lower bounds, tight lower bounds can be computed by solving the linear relaxation of the third formulation by delayed column generation. In fact, the lower bound equals the best known solution value for all realistic instances at our disposal. In a computational study, we compare different strategies to enhance the column generation algorithm.

Journal ArticleDOI
TL;DR: In this article, a general model formulation for the valuation of limited callable mortgages, based on binomial trees, is introduced for determining both the optimal prepayment strategy and the value of embedded pre-payment options.
Abstract: Valuation of the prepayment option in dutch mortgages is complicated. In the netherlands, mortgagors are not allowed to prepay the full mortgage loan without a compensating penalty. Only a limited amount of the initial mortgage loan can be prepaid penalty-free. We introduce a general model formulation for the valuation of limited callable mortgages, based on binomial trees. This model can be used for determining both the optimal prepayment strategy and the value of embedded prepayment options. For some mortgage types the prepayment option can be valued exactly, whereas other types require approximative methods for efficient valuation. The heuristic we propose here determines the prepayment option value efficiently and accurately for general mortgage types.

Journal ArticleDOI
TL;DR: In this paper, two kinds of models that have the similar multiplicative forms for cumulative probabilities that an observation will fall in row (column) category i or below and column (row) category j (>i) or above were proposed.
Abstract: For square contingency tables with ordered categories, CAUSSINUS [Annales de la Faculte des Sciences de l'Universite de Toulouse (1965) Vol. 29, pp. 77–182] and AGRESTI [Statistics and Probability Letters (1983) Vol. 1, pp. 313–316] considered the quasi-symmetry and the linear diagonal-parameter symmetry models, respectively, which have multiplicative forms for cell probabilities. This paper proposes two kinds of models that have the similar multiplicative forms for cumulative probabilities that an observation will fall in row (column) category i or below and column (row) category j (>i) or above. The endometrial cancer data are analyzed using these models.

Journal ArticleDOI
TL;DR: In this paper, the authors used some adaptive designs in a two-treatment two-period crossover design where the treatment responses are binary and calculated the allocation proportions to the possible treatment combinations and their standard deviations.
Abstract: Adaptive designs are sometimes used in a phase III clinical trial with the aim, of allocating a larger number of patients to the better treatment. In the present paper, we use some adaptive designs in a two-treatment two-period crossover design where the treatment responses are binary. We use some simple designs to choose between the possible treatment combinations AA, AB, BA or BB. The goal is to use the better treatment a larger proportion of times. We calculate the allocation proportions to the possible treatment combinations and their standard deviations. We also study related inferential problems. Related asymptotics are derived. The proposed procedure is compared with some possible competitors. Finally, we use real data to illustrate the applicability of our proposed design.

Journal ArticleDOI
TL;DR: In this article, the impact of microaggregation on a linear model in continuous variables is examined and a consistent estimator is developed to remove the aggregation bias. But, the results of statistical analyses are biased if the dependent variable is used to form the groups.
Abstract: Microaggregation is one of the most frequently applied statistical disclosure control techniques for continuous data The basic principle of microaggregation is to group the observations in a data set and to replace them by their corresponding group means However, while reducing the disclosure risk of data files, the technique also affects the results of statistical analyses The paper deals with the impact of microaggregation on a linear model in continuous variables We show that parameter estimates are biased if the dependent variable is used to form the groups Using this result, we develop a consistent estimator that removes the aggregation bias Moreover, we derive the asymptotic covariance matrix of the corrected least squares estimator

Journal ArticleDOI
TL;DR: In this article, the well-known shortest path algorithms are put into one framework and an interesting application of binary numbers in the shortest path theory is presented, where the binary numbers are used to represent binary numbers.
Abstract: Shortest path problems occupy an important position in operations research as well as in artificial intelligence. In this paper we study shortest path algorithms that exploit heuristic estimates. The well-known algorithms are put into one framework. Besides, we present an interesting application of binary numbers in the shortest path theory.


Journal ArticleDOI
TL;DR: In this paper, a bootstrap scheme is proposed to generate critical values for the Breusch-Pagan (BP) statistic allowing robust inference under misspecification of the adjacency matrix.
Abstract: In the empirical analysis of panel data the Breusch–Pagan (BP) statistic has become a standard tool to infer on unobserved heterogeneity over the cross-section. Put differently, the test statistic is central to discriminate between the pooled regression and the random effects model. Conditional versions of the test statistic have been provided to immunize inference on unobserved heterogeneity against random time effects or patterns of spatial error correlation. Panel data models with spatially correlated error terms are typically set out under the presumption of some known adjacency matrix parameterizing the correlation structure up to a scaling factor. This paper delivers a bootstrap scheme to generate critical values for the BP statistic allowing robust inference under misspecification of the adjacency matrix. Moreover, asymptotic results are derived for the case of a finite cross-section and infinite time dimension. Finite sample simulations show that misspecification of spatial covariance features could lead to large size distortions, while the robust bootstrap procedure retains asymptotic validity.

Journal ArticleDOI
TL;DR: In this article, the problem of procuring truthful responses to estimate the proportion of qualitative characteristics is considered, and various techniques of generating randomized response rather than direct response are available in the literature, but the theory concerning them is generally developed with no attention to the required level of privacy protection.
Abstract: This paper considers the problem of procuring truthful responses to estimate the proportion of qualitative characteristics. In order to protect the respondent's privacy, various techniques of generating randomized response rather than direct response are available in the literature. But the theory concerning them is generally developed with no attention to the required level of privacy protection. Illustrating two randomization devices, we show how optimal randomized response designs may be achieved. The optimal designs of forced response procedure as well as BHARGAVA and SINGH [Statistica (2000) Vol. 6, pp 315–321] procedure are shown to be special cases. In addition, the equivalent designs of optimal WARNER [Journal of the American Statistical Association (1965) Vol. 60, pp. 63–69] procedure are considered as well. It is also shown that stratification with proportional allocation will be helpful for improving the estimation efficiency.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed several confidence upper bounds for linear combinations based on Hoeffding-type inequalities and showed how they can be applied to the actual auditing problems.
Abstract: In auditing practice it often occurs that a statement regarding the accounting error in a population consisting of several subpopulations has to be made. As the relative proportion of errors can differ dramatically across these subpopulations, it is desirable to take independent fixed-size dollar-unit samples from each of them, as this often leads to lower variability compared with dollar-unit sampling from the whole population. It also occurs that the results of the separate investigations of, e.g. different branches of one company need to be combined to make a statement on the bookkeeping quality in general. The problem of estimating the total accounting error is thus related to the problem of estimating linear combinations of the mean values corresponding to several families of identically distributed independent random variables. In this article, we propose several confidence upper bounds for such linear combinations based on Hoeffding-type inequalities and show how they can be applied to the actual auditing problems. Simulation results comparing these modifications to the Hoeffding-based bounds for the one-sample case are also provided. It must be emphasized that the technique that we propose in this paper is fully justified from a mathematical point of view. Although the simulations show the proposed bounds to be highly conservative, they still present great interest, since we are not aware of any other method for estimation of the total accounting error in the multisample setting. Moreover, it is shown that significant improvements are hardly possible given the present conditions.

Journal ArticleDOI
TL;DR: In this article, simple approximate relations between the distributions of the interarrival times and the total time of a stopped counting pro- cess, under the assumption of small intensities, were determined.
Abstract: We determine simple approximate relations between the distributions of the interarrival times and the total time of a stopped counting pro- cess, under the assumption of small intensities. These relations sug- gest applying recent results for nonparametric estimation of monotone and convex densities. The results are applied to estimating the distri- bution of the period of stay of migrating birds.

Journal ArticleDOI
TL;DR: In this article, the design for supporting the optimal decision on tree cutting in a Portuguese eucalyptus production forest is addressed, where the authors aim the maximization of the long-term yearly volume yield reduced by harvest costs.
Abstract: This paper addresses the design for supporting the optimal decision on tree cutting in a Portuguese eucalyptus production forest. Trees are usually cut at the biological rotation age, i.e. the age which maximizes the yearly volume production. Here we aim the maximization of the long-term yearly volume yield reduced by harvest costs. We consider different growth curves, with a known prior distribution, that can occur in each rotation. The optimal cutting time at each rotation depends both on the current growth curve and on the prior distribution. Different priors and strategies are compared with respect to the long-term production. Optimization of the cutting time allows an improvement of 16% of the long-term volume production. We conclude that the use of optimal designs can be beneficial for tree cutting in modern production forests.

Journal ArticleDOI
TL;DR: It is shown that concavity, and hence the buyers-are-substitutes condition, holds for the TU-game of the assignment problem with general capacities and that the VCG mechanism is supported by a pricing equilibrium which can also be achieved by an ascending price auction.
Abstract: For the allocation of heterogeneous items, it is known that the buyers-are-substitutes condition is necessary and sufficient to ensure that a pricing equilibrium can yield the same allocation and payments as the VCG mechanism. Furthermore, concavity of the corresponding transferable utility TU-game guarantees that this VCG outcome can also be achieved by an ascending price auction. We show that concavity, and hence the buyers-are-substitutes condition, holds for the TU-game of the assignment problem with general capacities. Therefore, the VCG mechanism is supported by a pricing equilibrium which can also be achieved by an ascending auction. We also show that the buyers-are-substitutes condition, and hence concavity, does not hold anymore for very natural and straightforward extensions of this problem. This shows that the necessity of the substitutes property is a considerable restriction on the applicability of the VCG mechanism.

Journal ArticleDOI
TL;DR: A joint model consisting of a marginal mean model and a cluster‐specific conditional mean model is considered, which outperforms the approach based on normality assumption with respect to some important features of ‘between‐cluster variation’.
Abstract: Generalized linear mixed models are widely used for analyzing clustered data. If the primary interest is in regression parameters, one can proceed alternatively, through the marginal mean model approach. In the present study, a joint model consisting of a marginal mean model and a cluster-specific conditional mean model is considered. This model is useful when both time-independent and time-dependent covariates are available. Furthermore our model is semi-parametric, as we assume a flexible, smooth semi-nonparametric density of the cluster-specific effects. This semi-nonparametric density-based approach outperforms the approach based on normality assumption with respect to some important features of ‘between-cluster variation’. We employ a full likelihood-based approach and apply the Monte Carlo EM algorithm to analyze the model. A simulation study is carried out to demonstrate the consistency of the approach. Finally, we apply this to a study of long-term illness data.