scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Statistical Software in 2005"


Journal ArticleDOI
TL;DR: This paper is a general description of spatstat and an introduction for new users.
Abstract: spatstat is a package for analyzing spatial point pattern data. Its functionality includes exploratory data analysis, model-fitting, and simulation. It is designed to handle realistic datasets, including inhomogeneous point patterns, spatial sampling regions of arbitrary shape, extra covariate data, and "marks" attached to the points of the point pattern. A unique feature of spatstat is its generic algorithm for fitting point process models to point pattern data. The interface to this algorithm is a function ppm that is strongly analogous to lm and glm. This paper is a general description of spatstat and an introduction for new users.

2,268 citations


Journal ArticleDOI
TL;DR: The R2WinBUGS package provides convenient functions to call WinBUGS from R and automatically writes the data and scripts in a format readable by WinBUGs for processing in batch mode, which is possible since version 1.4.
Abstract: The R2WinBUGS package provides convenient functions to call WinBUGS from R. It automatically writes the data and scripts in a format readable by WinBUGS for processing in batch mode, which is possible since version 1.4. After the WinBUGS process has finished, it is possible either to read the resulting data into R by the package itself—which gives a compact graphical summary of inference and convergence diagnostics—or to use the facilities of the coda package for further analyses of the output. Examples are given to demonstrate the usage of this package.

1,633 citations


Journal ArticleDOI
TL;DR: An add-on package for the language and environment R which allows simultaneous fitting of several non-linear regression models, the focus is on analysis of dose response curves, but the functionality is applicable to arbitrary non- linear regression models.
Abstract: We describe an add-on package for the language and environment R which allows simultaneous fitting of several non-linear regression models. The focus is on analysis of dose response curves, but the functionality is applicable to arbitrary non-linear regression models. Features of the package is illustrated in examples.

1,439 citations


Journal ArticleDOI
TL;DR: The R Commander as discussed by the authors is a graphical user interface for R. The R Commander uses a simple and familiar menu/dialog-box interface, which includes file, Edit, Data, Statistics, Graphs, Models, Distributions, Tools, and Help.
Abstract: Unlike S-PLUS, R does not incorporate a statistical graphical user interface (GUI), but it does include tools for building GUIs. Based on the tcltk package (which furnishes an interface to the Tcl/Tk GUI toolkit), the Rcmdr package provides a basic-statistics graphical user interface to R called the "R Commander." The design objectives of the R Commander were as follows: to support, through an easy-to-use, extensible, cross-platform GUI, the statistical functionality required for a basic-statistics course (though its current functionality has grown to include support for linear and generalized-linear models, and other more advanced features); to make it relatively difficult to do unreasonable things; and to render visible the relationship between choices made in the GUI and the R commands that they generate. The R Commander uses a simple and familiar menu/dialog-box interface. Top-level menus include File, Edit, Data, Statistics, Graphs, Models, Distributions, Tools, and Help, with the complete menu tree given in the paper. Each dialog box includes a Help button, which leads to a relevant help page. Menu and dialog-box selections generate R commands, which are recorded in a script window and are echoed, along with output, to an output window. The script window also provides the ability to edit, enter, and re-execute commands. Error messages, warnings, and some other information appear in a separate messages window. Data sets in the R Commander are simply R data frames, and can be read from attached packages or imported from files. Although several data frames may reside in memory, only one is "active" at any given time. There may also be an active statistical model (e.g., an R lm or glm ob ject). The purpose of this paper is to introduce and describe the use of the R Commander GUI; to describe the design and development of the R Commander; and to explain how the R Commander GUI can be extended. The second part of the paper (following a brief introduction) can serve as an introductory guide for students who will use the R Commander.

862 citations


Journal ArticleDOI
TL;DR: Zoo as discussed by the authors is an R package providing an S3 class with methods for indexed totally ordered observations, such as discrete irregular time series, and its key design goals are independence of a particular index/time/date class and consistency with base R and the "ts" class for regular time series.
Abstract: zoo is an R package providing an S3 class with methods for indexed totally ordered observations, such as discrete irregular time series. Its key design goals are independence of a particular index/time/date class and consistency with base R and the "ts" class for regular time series. This paper describes how these are achieved within zoo and provides several illustrations of the available methods for "zoo" objects which include plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling. A subclass "zooreg" embeds regular time series into the "zoo" framework and thus bridges the gap between regular and irregular time series classes in R.

750 citations


Journal ArticleDOI
TL;DR: The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules.
Abstract: Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.

470 citations



Journal ArticleDOI
TL;DR: This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS, where good mixing properties of the MCMC chains are obtained by using low-rank thin-plate splines, while simulation times per iteration are reduced employing WinBUGs specific computational tricks.
Abstract: Penalized splines can be viewed as BLUPs in a mixed model framework, which allows the use of mixed model software for smoothing. Thus, software originally developed for Bayesian analysis of mixed models can be used for penalized spline regression. Bayesian inference for nonparametric models enjoys the flexibility of nonparametric models and the exact inference provided by the Bayesian inferential machinery. This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS. Good mixing properties of the MCMC chains are obtained by using low-rank thin-plate splines, while simulation times per iteration are reduced employing WinBUGS specific computational tricks.

346 citations


Journal ArticleDOI
TL;DR: The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including methods for measuring proximity and obtaining consensus and "secondary" clusterings.
Abstract: Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including methods for measuring proximity and obtaining consensus and "secondary" clusterings.

275 citations


Journal ArticleDOI
TL;DR: The capabilities of the free software package BayesX for estimating regression models with structured additive predictor based on MCMC inference are described, which extends the capabilities of existing software for semiparametric regression included in S-PLUS, SAS, R or Stata.
Abstract: There has been much recent interest in Bayesian inference for generalized additive and related models. The increasing popularity of Bayesian methods for these and other model classes is mainly caused by the introduction of Markov chain Monte Carlo (MCMC) simulation techniques which allow realistic modeling of complex problems. This paper describes the capabilities of the free software package BayesX for estimating regression models with structured additive predictor based on MCMC inference. The program extends the capabilities of existing software for semiparametric regression included in S-PLUS, SAS, R or Stata. Many model classes well known from the literature are special cases of the models supported by BayesX. Examples are generalized additive (mixed) models, dynamic models, varying coefficient models, geoadditive models, geographically weighted regression and models for space-time regression. BayesX supports the most common distributions for the response variable. For univariate responses these are Gaussian, Binomial, Poisson, Gamma, negative Binomial, zero inflated Poisson and zero inflated negative binomial. For multicategorical responses, both multinomial logit and probit models for unordered categories of the response as well as cumulative threshold models for ordered categories can be estimated. Moreover, BayesX allows the estimation of complex continuous time survival and hazard rate models.

241 citations



Journal ArticleDOI
TL;DR: In this article, an R package called bivpois is presented for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models, and an Expectation-Maximization (EM) algorithm is implemented.
Abstract: In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using bivpois package is provided.

Journal ArticleDOI
TL;DR: In this article, a collection of functions that can be used to implement a robust analysis of a linear model based on weighted Wilcoxon (WW) estimations is presented, for instance, estimation, regression model, designed experiment, and autoregressive time series model for the sake of illustration.
Abstract: It is well-known that Wilcoxon procedures out perform least squares procedures when the data deviate from normality and/or contain outliers. These procedures can be generalized by introducing weights; yielding so-called weighted Wilcoxon (WW) techniques. In this paper we demonstrate how WW-estimates can be calculated using an L1 regression routine. More importantly, we present a collection of functions that can be used to implement a robust analysis of a linear model based on WW-estimates. For instance, estimation, tests of linear hypotheses, residual analyses, and diagnostics to detect differences in fits for various weighting schemes are discussed. We analyze a regression model, designed experiment, and autoregressive time series model for the sake of illustration. We have chosen to implement the suite of functions using the R statistical software package. Because R is freely available and runs on multiple platforms, WW-estimation and associated inference is now universally accessible.

Journal ArticleDOI
TL;DR: The EbayesThresh package in the S language implements a class of Empirical Bayes thresholding methods that can take advantage of possible sparsity in the sequence, to improve the quality of estimation.
Abstract: Suppose that a sequence of unknown parameters is observed sub ject to independent Gaussian noise. The EbayesThresh package in the S language implements a class of Empirical Bayes thresholding methods that can take advantage of possible sparsity in the sequence, to improve the quality of estimation. The prior for each parameter in the sequence is a mixture of an atom of probability at zero and a heavy-tailed density. Within the package, this can be either a Laplace (double exponential) density or else a mixture of normal distributions with tail behavior similar to the Cauchy distribution. The mixing weight, or sparsity parameter, is chosen automatically by marginal maximum likelihood. If estimation is carried out using the posterior median, this is a random thresholding procedure; the estimation can also be carried out using other thresholding rules with the same threshold, and the package provides the posterior mean, and hard and soft thresholding, as additional options. This paper reviews the method, and gives details (far beyond those previously published) of the calculations needed for implementing the procedures. It explains and motivates both the general methodology, and the use of the EbayesThresh package, through simulated and real data examples. When estimating the wavelet transform of an unknown function, it is appropriate to apply the method level by level to the transform of the observed data. The package can carry out these calculations for wavelet transforms obtained using various packages in R and S-PLUS. Details, including a motivating example, are presented, and the application of the method to image estimation is also explored. The final topic considered is the estimation of a single sequence that may become progressively sparser along the sequence. An iterated least squares isotone regression method allows for the choice of a threshold that depends monotonically on the order in which the observations are made. An alternative possibility, also discussed in detail, is a particular parametric dependence of the sparsity parameter on the position in the sequence.

Journal ArticleDOI
TL;DR: The normalp package is presented, a package for the statistical environment R that has a set of tools for dealing with the exponential power distribution and methods concerning the estimation of the distribution parameters are described and implemented.
Abstract: In this paper we present the normalp package, a package for the statistical environment R that has a set of tools for dealing with the exponential power distribution. In this package there are functions to compute the density function, the distribution function and the quantiles from an exponential power distribution and to generate pseudo-random numbers from the same distribution. Moreover, methods concerning the estimation of the distribution parameters are described and implemented. It is also possible to estimate linear regression models when we assume the random errors distributed according to an exponential power distribution. A set of functions is designed to perform simulation studies to see the suitability of the estimators used. Some examples of use of this package are provided.

Journal ArticleDOI
TL;DR: The R add-on package BradleyTerry as discussed by the authors facilitates the specification and fitting of Bradley-Terry logit models to pair-comparison data, including the standard "unstructured" model, structured versions in which the parameters are related through a linear predictor to explanatory variables, and the possibility of an order or "home advantage" effect.
Abstract: This paper describes the R add-on package BradleyTerry, which facilitates the specification and fitting of Bradley-Terry logit models to pair-comparison data. Included are the standard "unstructured" Bradley-Terry model, structured versions in which the parameters are related through a linear predictor to explanatory variables, and the possibility of an order or "home advantage" effect. Model fitting is either by maximum likelihood or by bias-reduced maximum likelihood in which the first-order asymptotic bias of parameter estimates is eliminated. Also provided are a simple and efficient approach to handling missing covariate data, and suitably-defined residuals for diagnostic checking of the linear predictor; these are new methodological contributions which will be discussed in greater detail elsewhere.

Journal ArticleDOI
TL;DR: The BACCO bundle of R routines for carrying out Bayesian analysis of computer code output is introduced, which is self-contained and fully documented R code, and includes a toy dataset that furnishes a working example of the functions.
Abstract: This paper introduces the BACCO bundle of R routines for carrying out Bayesian analysis of computer code output. The bundle comprises packages emulator and calibrator, computerized implementations of the ideas of Oakley and O’Hagan (2002) and Kennedy and O’Hagan (2001a) respectively. The bundle is self-contained and fully documented R code, and includes a toy dataset that furnishes a working example of the functions. Package emulator carries out Bayesian emulation of computer code output; package calibrator allows the incorporation of observational data into model calibration using Bayesian techniques. The package is then applied to a dataset taken from climate science.

Journal ArticleDOI
TL;DR: MNP is a publicly available R package that fits the Bayesian multinomial probit model via Markov chain Monte Carlo through the efficient marginal data augmentation algorithm developed by Imai and van Dyk (2005).
Abstract: MNP is a publicly available R package that fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP software can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005).

Journal ArticleDOI
TL;DR: It is shown how the gRbase building blocks can be combined and integrated with inference engines in the special cases of hierarchical loglinear models and how to extend the package to deal with other types of graphical models, in this case the graphical Gaussian models.
Abstract: The gRbase package is intended to set the framework for computer packages for data analysis using graphical models The gRbase package is developed for the open source language, R, and is available for several platforms The package is intended to be widely extendible and flexible so that package developers may implement further types of graphical models using the available methods The gRbase package consists of a set of S version 3 classes and associated methods for representing data and models The package is linked to the dynamicGraph package (Badsberg 2005), an interactive graphical user interface for manipulating graphs In this paper, we show how these building blocks can be combined and integrated with inference engines in the special cases of hierarchical loglinear models We also illustrate how to extend the package to deal with other types of graphical models, in this case the graphical Gaussian models


Journal ArticleDOI
TL;DR: WhatIf is an R package that implements the methods for evaluating counterfactuals introduced in King and Zeng (2006a), and can be used to approximate the common support of the treatment and control groups in causal inference.
Abstract: WhatIf is an R package that implements the methods for evaluating counterfactuals introduced in King and Zeng (2006a) and King and Zeng (2006b). It offers easy-to-use techniques for assessing a counterfactual's model dependence without having to conduct sensitivity testing over specified classes of models. These same methods can be used to approximate the common support of the treatment and control groups in causal inference.

Journal ArticleDOI
TL;DR: In this article, an S-PLUS function for calculating the first order jackknife estimator of species richness and some associated plots and statistics is proposed, which is a closed form of the estimator.
Abstract: An estimate of the number of species, S , usually called species richness by ecologists, in an area is one of the basic statistics used to ascertain biological diversity. Traditionally ecologists have used the number of species observed in a sample, S 0 , to estimate S , realizing that S 0 is a lower bound for S . One alternative to S 0 is to use a nonparametric procedure such as jackknife resampling. For species richness, a closed form of the jackknife estimator is available. Typically statistical software contains only the traditional iterative form of the jackknife estimator. The purpose of this article is to propose an S-PLUS function for calculating the noniterative first order jackknife estimator of species richness and some associated plots and statistics.

Journal ArticleDOI
TL;DR: It is shown that the short period of the uniform random number generator in the published implementation of Marsaglia and Tsang's Ziggurat method for generating random deviates can lead to poor distributions.
Abstract: We show that the short period of the uniform random number generator in the published implementation of Marsaglia and Tsang's Ziggurat method for generating random deviates can lead to poor distributions. Changing the uniform random number generator used in its implementation fixes this issue.

Journal ArticleDOI
TL;DR: This link allows R, S-PLUS and Excel to call the functions in the lp_solve system and allows Excel users to handle substantially larger problems at no extra cost.
Abstract: We present a link that allows R, S-PLUS and Excel to call the functions in the lp_solve system lp_solve is free software (licensed under the GNU Lesser GPL) that solves linear and mixed integer linear programs of moderate size (on the order of 10,000 variables and 50,000 constraints) R does not include this ability (though two add-on packages offer linear programs without integer variables), while S-PLUS users need to pay extra for the NuOPT library in order to solve these problems Our link manages the interface between these statistical packages and lp_solve Excel has a built-in add-in named Solver that is capable of solving mixed integer programs, but only with fewer than 200 variables This link allows Excel users to handle substantially larger problems at no extra cost While our primary concern has been the Windows operating system, the package has been tested on some Unix-type systems as well

Journal ArticleDOI
TL;DR: In this paper, the authors present a way to adjust or tamper with the classical test to make it test for independence as well as identical distribution, using monkey tests similar to those in the Diehard Battery of Tests of Randomness (Marsaglia 1995).
Abstract: The familiar Σ(OBS - EXP) 2 /EXP goodness-of-fit measure is commonly used to test whether an observed sequence came from the realization of n independent identically distributed (iid) discrete random variables. It can be quite effective for testing for identical distribution, but is not suited for assessing independence, as it pays no attention to the order in which output values are received. This note reviews a way to adjust or tamper, that is, monkey-with the classical test to make it test for independence as well as identical distribution--in short, to test for both the i's in iid, using monkey tests similar to those in the Diehard Battery of Tests of Randomness (Marsaglia 1995).

Journal ArticleDOI
TL;DR: This work analyzes and discusses how a generic software to produce biplot graphs should be designed and describes a data structure appropriate to include the biplot description and the algorithm(s) to be used for several biplot types.
Abstract: We analyze and discuss how a generic software to produce biplot graphs should be designed. We describe a data structure appropriate to include the biplot description and we specify the algorithm(s) to be used for several biplot types. We discuss the options the software should offer to the user in two different environments. In a highly interactive environment the user should be able to specify many graphical options and also to change them using the usual interactive tools. The resulting graph needs to be available in several formats, including high quality format for printing. In a web-based environment, the user submits a data file or listing together with some options specified either in a file or using a form. Then the graphic is sent back to the user in one of several possible formats according to the specifications. We review some of the already available software and we present an implementation based in XLISP-STAT. It can be run under Unix or Windows, and it is also part of a service that provides biplot graphs through the web.

Journal ArticleDOI
TL;DR: A graphical user interface is described, programmed by the author, which facilitates the specification of a wide class of generalised linear mixed models for analysis using WinBUGS, a Bayesian software package truly useable by the average data analyst.
Abstract: The absence of user-friendly software has long been a major obstacle to the routine application of Bayesian methods in business and industry. It will only be through widespread application of the Bayesian approach to real problems that issues, such as the use of prior distributions, can be practically resolved in the same way that the choice of significance levels has been in the classical approach; although most Bayesians would hope for a much more satisfactory resolution. It is only relatively recently that any general purpose Bayesian software has been available; by far the most widely used such package is WinBUGS. Although this software has been designed to enable an extremely wide variety of models to be coded relatively easily, it is unlikely that many will bother to learn the language and its nuances unless they are already highly motivated to try Bayesian methods. This paper describes a graphical user interface, programmed by the author, which facilitates the specification of a wide class of generalised linear mixed models for analysis using WinBUGS. The program, BugsXLA (v2.1), is an Excel Add-In that not only allows the user to specify a model as one would in a package such as SAS or S-PLUS, but also aids the specification of priors and control of the MCMC run itself. Inevitably, developing a program such as this forces one to think again about such issues as choice of default priors, parameterisation and assessing convergence. I have tried to adopt currently perceived good practices, but mainly share my approach so that others can apply it and, through constructive criticism, play a small part in the ultimate development of the first Bayesian software package truly useable by the average data analyst.

Journal ArticleDOI
TL;DR: SimReg as mentioned in this paper is a software for multiple regression of several regression models, where predictors can be constrained to their natural boundaries, if known, which results in narrower bands, as compared to the case where no restriction is imposed.
Abstract: The problem of simultaneous inference and multiple comparison for comparing means of k( ≥ 3) populations has been long studied in the statistics literature and is widely available in statistics literature. However to-date, the problem of multiple comparison of regression models has not found its way to the software. It is only recently that the computational aspects of this problem have been resolved in a general setting. SimReg employs this new methodology and provides users with software for multiple regression of several regression models. The comparisons can be among any set of pairs, and moreover any number of predictors can be included in the model. More importantly predictors can be constrained to their natural boundaries, if known. Computational methods for the problem of simultaneous confidence bands when predictors are constrained to intervals has also recently been addressed. SimReg utilizes this recent development to offer simultaneous confidence bands for regression models with any number of predictor variables. Again, the predictors can be constrained to their natural boundaries which results in narrower bands, as compared to the case where no restriction is imposed. A by-product of these confidence bands is a new method for comparing two regression surfaces, that is more informative than the usual partial F test.

Journal ArticleDOI
TL;DR: In'98 the UCLA Department of Statistics decided to switch to S/R, and what the pros and the cons were were are discussed.
Abstract: In 1998 the UCLA Department of Statistics, which had been one of the major users of Lisp-Stat, and one of the main producers of Lisp-Stat code, decided to switch to S/R. This paper discusses why this decision was made, and what the pros and the cons were.

Journal ArticleDOI
TL;DR: A customized SAS program is described which accomplishes an analysis on survey data with jackknifed replicate weights for which the primary sampling unit information has been suppressed for respondent confidentiality.
Abstract: Packaged statistical software for analyzing categorical, repeated measures marginal models on sample survey data with binary covariates does not appear to be available. Consequently, this report describes a customized SAS program which accomplishes such an analysis on survey data with jackknifed replicate weights for which the primary sampling unit information has been suppressed for respondent confidentiality. First, the program employs the Macro Language and the Output Delivery System (ODS) to estimate the means and covariances of indicator variables for the response variables, taking the design into account. Then, it uses PROC CATMOD and ODS, ignoring the survey design, to obtain the design matrix and hypothesis test specifications. Finally, it enters these results into another run of CATMOD, which performs automated direct input of the survey design specifications and accomplishes the appropriate analysis. This customized SAS program can be employed, with minor editing, to analyze general categorical, repeated measures marginal models on sample surveys with replicate weights. Finally, the results of our analysis accounting for the survey design are compared to the results of two alternate analyses of the same data. This comparison confirms that such alternate analyses, which do not properly account for the design, do not produce useful results.