Showing papers in &quot;Journal of Statistical Software in 2015&quot;

fitdistrplus: An R Package for Fitting Distributions

TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.

...read moreread less

Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

...read moreread less

50,607 citations

Journal Article•DOI•

[...]

Marie Laure Delignette-Muller, Christophe Dutang

Bayesian Spatial Modelling with R-INLA

TL;DR: Fitdistrplus as discussed by the authors provides functions for fitting univariate distributions to different types of data (continuous censored or non-censored data and discrete data) and allowing different estimation methods (maximum likelihood, moment matching, quantile matching and maximum goodness of fit estimation).

...read moreread less

Abstract: The package fitdistrplus provides functions for fitting univariate distributions to different types of data (continuous censored or non-censored data and discrete data) and allowing different estimation methods (maximum likelihood, moment matching, quantile matching and maximum goodness-of-fit estimation). Outputs of fitdist and fitdistcens functions are S3 objects, for which specific methods are provided, including summary, plot and quantile. This package also provides various functions to compare the fit of several distributions to the same data set and can handle to bootstrap parameter estimates. Detailed examples are given in food risk assessment, ecotoxicology and insurance contexts.

...read moreread less

1,433 citations

Journal Article•DOI•

[...]

Finn Lindgren¹, Håvard Rue²•Institutions (2)

Engineering and Physical Sciences Research Council¹, Norwegian University of Science and Technology²

Comparing Implementations of Estimation Methods for Spatial Econometrics

TL;DR: The principles behind the interface to continuous domain spatial models in the RINLA software package for R are described and the integrated nested Laplace approximation approach proposed by Rue, Martino, and Chopin (2009) is a computationally effective alternative to MCMC for Bayesian inference.

...read moreread less

Abstract: The principles behind the interface to continuous domain spatial models in the RINLA software package for R are described. The integrated nested Laplace approximation (INLA) approach proposed by Rue, Martino, and Chopin (2009) is a computationally effective alternative to MCMC for Bayesian inference. INLA is designed for latent Gaussian models, a very wide and flexible class of models ranging from (generalized) linear mixed to spatial and spatio-temporal models. Combined with the stochastic partial differential equation approach (SPDE, Lindgren, Rue, and Lindstrom 2011), one can accommodate all kinds of geographically referenced data, including areal and geostatistical ones, as well as spatial point process data. The implementation interface covers stationary spatial models, non-stationary spatial models, and also spatio-temporal models, and is applicable in epidemiology, ecology, environmental risk assessment, as well as general geostatistics.

...read moreread less

829 citations

Journal Article•DOI•

[...]

Roger Bivand, Gianfranco Piras

A Toolbox for Nonlinear Regression in R: The Package nlstools

TL;DR: This review constitutes an up-to-date comparison of generalized method of moments and maximum likelihood implementations now available, using the cross-sectional US county data set provided by Drukker, Prucha, and Raciborski (2013d).

...read moreread less

Abstract: Recent advances in the implementation of spatial econometrics model estimation techniques have made it desirable to compare results, which should correspond between implementations across software applications for the same data. These model estimation techniques are associated with methods for estimating impacts (emanating effects), which are also presented and compared. This review constitutes an up-to-date comparison of generalized method of moments and maximum likelihood implementations now available. The comparison uses the cross-sectional US county data set provided by Drukker, Prucha, and Raciborski (2013d). The comparisons will be cast in the context of alternatives using the MATLAB Spatial Econometrics toolbox, Stata's user-written sppack commands, Python with PySAL and R packages including spdep, sphet and McSpatial.

...read moreread less

828 citations

Journal Article•DOI•

[...]

Florent Baty, Christian Ritz¹, Sandrine Charles, Martin Brutsche, Jean-Pierre Flandrois, Marie Laure Delignette-Muller - Show less +2 more•Institutions (1)

University of Copenhagen¹

27 Aug 2015-Journal of Statistical Software

TL;DR: A unified diagnostic framework with the R package nlstools is introduced and the various features of the package are presented and exemplified using a worked example from pulmonary medicine.

...read moreread less

Abstract: Nonlinear regression models are applied in a broad variety of scientific fields. Various R functions are already dedicated to fitting such models, among which the function nls() has a prominent position. Unlike linear regression fitting of nonlinear models relies on non-trivial assumptions and therefore users are required to carefully ensure and validate the entire modeling. Parameter estimation is carried out using some variant of the least- squares criterion involving an iterative process that ideally leads to the determination of the optimal parameter estimates. Therefore, users need to have a clear understanding of the model and its parameterization in the context of the application and data considered, an a priori idea about plausible values for parameter estimates, knowledge of model diagnostics procedures available for checking crucial assumptions, and, finally, an under- standing of the limitations in the validity of the underlying hypotheses of the fitted model and its implication for the precision of parameter estimates. Current nonlinear regression modules lack dedicated diagnostic functionality. So there is a need to provide users with an extended toolbox of functions enabling a careful evaluation of nonlinear regression fits. To this end, we introduce a unified diagnostic framework with the R package nlstools. In this paper, the various features of the package are presented and exemplified using a worked example from pulmonary medicine.

...read moreread less

491 citations

Journal Article•DOI•

GWmodel: An R Package for Exploring Spatial Heterogeneity Using Geographically Weighted Models

[...]

Isabella Gollini, Binbin Lu¹, Martin Charlton², Chris Brunsdon², Paul Harris³ - Show less +1 more•Institutions (3)

Wuhan University of Technology¹, Maynooth University², Rothamsted Research³

TSclust: An R Package for Time Series Clustering

TL;DR: Geographically weighted (GW) models as discussed by the authors use a moving window weighting technique, where localized models are found at target locations, and outputs are mapped to provide a useful exploratory tool into the nature of the data spatial heterogeneity.

...read moreread less

Abstract: Spatial statistics is a growing discipline providing important analytical techniques in a wide range of disciplines in the natural and social sciences. In the R package GWmodel we present techniques from a particular branch of spatial statistics, termed geographically weighted (GW) models. GW models suit situations when data are not described well by some global model, but where there are spatial regions where a suitably localized calibration provides a better description. The approach uses a moving window weighting technique, where localized models are found at target locations. Outputs are mapped to provide a useful exploratory tool into the nature of the data spatial heterogeneity. Currently, GWmodel includes functions for: GW summary statistics, GW principal components analysis, GW regression, and GW discriminant analysis; some of which are provided in basic and robust forms.

...read moreread less

376 citations

Journal Article•DOI•

[...]

Pablo Montero, José A. Vilar

01 Jan 2015-Journal of Statistical Software

TL;DR: The R package TSclust is aimed to implement a large set of well-established peer-reviewed time series dissimilarity measures, including measures based on raw data, extracted features, underlying parametric models, complexity levels, and forecast behaviors.

...read moreread less

Abstract: Time series clustering is an active research area with applications in a wide range of fields. One key component in cluster analysis is determining a proper dissimilarity measure between two data objects, and many criteria have been proposed in the literature to assess dissimilarity between two time series. The R package TSclust is aimed to implement a large set of well-established peer-reviewed time series dissimilarity measures, including measures based on raw data, extracted features, underlying parametric models, complexity levels, and forecast behaviors. Computation of these measures allows the user to perform clustering by using conventional clustering algorithms. TSclust also includes a clustering procedure based on p values from checking the equality of generating models, and some utilities to evaluate cluster solutions. The implemented dissimilarity functions are accessible individually for an easier extension and possible use out of the clustering context. The main features of TSclust are described and examples of its use are presented.

...read moreread less

362 citations

Journal Article•DOI•

Fitting Heavy Tailed Distributions: The poweRlaw Package

[...]

Colin S. Gillespie

Structured Additive Regression Models: An R Interface to BayesX

TL;DR: The poweRlaw R package is described, which makes fitting power laws and other heavy-tailed distributions straightforward and provides a principled approach to power law fitting.

...read moreread less

Abstract: Over the last few years, the power law distribution has been used as the data generating mechanism in many disparate fields. However, at times the techniques used to fit the power law distribution have been inappropriate. This paper describes the poweRlaw R package, which makes fitting power laws and other heavy-tailed distributions straightforward. This package contains R functions for fitting, comparing and visualizing heavy tailed distributions. Overall, it provides a principled approach to power law fitting.

...read moreread less

320 citations

Journal Article•DOI•

[...]

Nikolaus Umlauf¹, Daniel Adler, Thomas Kneib, Stefan Lang, Achim Zeileis - Show less +1 more•Institutions (1)

University of Innsbruck¹

ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data

TL;DR: A new fully interactive R interface to BayesX is presented: the R package R2BayesX, which complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.

...read moreread less

Abstract: Structured additive regression (STAR) models provide a flexible framework for modeling possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using R’s formula language (with some extended terms), fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.

...read moreread less

262 citations

Journal Article•DOI•

[...]

Nicholas A. James¹, David S. Matteson¹•Institutions (1)

Cornell University¹

21 Jan 2015-Journal of Statistical Software

TL;DR: The ecp package is designed to perform multiple change point analysis while making as few assumptions as possible, and is suitable for both univariate and multivariate observations.

...read moreread less

Abstract: There are many different ways in which change point analysis can be performed, from purely parametric methods to those that are distribution free. The ecp package is designed to perform multiple change point analysis while making as few assumptions as possible. While many other change point methods are applicable only for univariate data, this R package is suitable for both univariate and multivariate observations. Hierarchical estimation can be based upon either a divisive or agglomerative algorithm. Divisive estimation sequentially identifies change points via a bisection algorithm. The agglomerative algorithm estimates change point locations by determining an optimal segmentation. Both approaches are able to detect any type of distributional change within the data. This provides an advantage over many existing change point algorithms which are only able to detect changes within the marginal distributions.

...read moreread less

259 citations

Journal Article•DOI•

nparcomp: An R Software Package for Nonparametric Multiple Comparisons and Simultaneous Confidence Intervals

[...]

Frank Konietschke, Marius Placzek¹, Frank Schaarschmidt, Ludwig A. Hothorn•Institutions (1)

University of Göttingen¹

entropart : An R Package to Measure and Partition Diversity

TL;DR: A new R package nparcomp is introduced which provides an easy and user-friendly access to rank-based methods for the analysis of unbalanced one-way layouts and provides procedures performing multiple comparisons and computing simultaneous confidence intervals for the estimated effects which can be easily visualized.

...read moreread less

Abstract: One-way layouts, i.e., a single factor with several levels and multiple observations at each level, frequently arise in various fields. Usually not only a global hypothesis is of interest but also multiple comparisons between the different treatment levels. In most practical situations, the distribution of observed data is unknown and there may exist a number of atypical measurements and outliers. Hence, use of parametric and semiparametric procedures that impose restrictive distributional assumptions on observed samples becomes questionable. This, in turn, emphasizes the demand on statistical procedures that enable us to accurately and reliably analyze one-way layouts with minimal conditions on available data. Nonparametric methods offer such a possibility and thus become of particular practical importance. In this article, we introduce a new R package nparcomp which provides an easy and user-friendly access to rank-based methods for the analysis of unbalanced one-way layouts. It provides procedures performing multiple comparisons and computing simultaneous confidence intervals for the estimated effects which can be easily visualized. The special case of two samples, the nonparametric Behrens-Fisher problem, is included. We illustrate the implemented procedures by examples from biology and medicine.

...read moreread less

Journal Article•DOI•

[...]

Eric Marcon, Bruno Hérault¹•Institutions (1)

Centre de coopération internationale en recherche agronomique pour le développement¹

Kml and kml3d: R packages to cluster longitudinal data

TL;DR: Dentropart is a package for R designed to estimate diversity based on HCDT entropy or similarity-based entropy, which allows calculating species-neutral, phylogenetic and functional entropy and diversity, partitioning them and correcting them for estimation bias.

...read moreread less

Abstract: entropart is a package for R designed to estimate diversity based on HCDT entropy or similarity-based entropy. It allows calculating species-neutral, phylogenetic and functional entropy and diversity, partitioning them and correcting them for estimation bias.

...read moreread less

Journal Article•DOI•

[...]

Christophe Genolini, Xavier Alacoque, Mariane Sentenac, Catherine Arnaud

01 Jun 2015-Journal of Statistical Software

TL;DR: kml and kml3d are R packages providing an implementation of k-means designed to work specifically on trajectories (kml) or on joint trajectories(kml3D), and they offer graphic facilities to “visualize” the trajectories, either in 2D or 3D (joint-trajectories).

...read moreread less

Abstract: Longitudinal studies are essential tools in medical research. In these studies, variables are not restricted to single measurements but can be seen as variable-trajectories, either single or joint. Thus, an important question concerns the identification of homogeneous patient trajectories.kml and kml3d are R packages providing an implementation of k-means designed to work specifically on trajectories (kml) or on joint trajectories (kml3d). They provide various tools to work on longitudinal data: imputation methods for trajectories (nine classic and one original), methods to define starting conditions in k-means (four classic and three original) and quality criteria to choose the best number of clusters (four classic and one original). In addition, they offer graphic facilities to “visualize” the trajectories, either in 2D (single trajectory) or 3D (joint-trajectories). The 3D graph representing the mean joint-trajectories of each cluster can be exported through LATEX in a 3D dynamic rotating PDF graph (Figures 1 and 9).

...read moreread less

Journal Article•DOI•

Analysis, Simulation and Prediction of Multivariate Random Fields with Package RandomFields

[...]

Martin Schlather, Alexander Malinowski, Peter J. Menck, Marco Oesting, Kirstin Strokorb - Show less +1 more

10 Feb 2015-Journal of Statistical Software

TL;DR: The R package RandomFields supports the simulation, the parameter estimation and the prediction in particular for the linear model of coregionalization, the multivariate Matern models, the delay model, and a spectrum of physically motivated vector valued models.

...read moreread less

Abstract: Modeling of and inference on multivariate data that have been measured in space, such as temperature and pressure, are challenging tasks in environmental sciences, physics and materials science. We give an overview over and some background on modeling with crosscovariance models. The R package RandomFields supports the simulation, the parameter estimation and the prediction in particular for the linear model of coregionalization, the multivariate Matern models, the delay model, and a spectrum of physically motivated vector valued models. An example on weather data is considered, illustrating the use of RandomFields for parameter estimation and prediction.

...read moreread less

Journal Article•DOI•

Parametric and Nonparametric Sequential Change Detection in R: The cpm Package

[...]

Gordon J. Ross¹•Institutions (1)

Imperial College London¹

27 Aug 2015-Journal of Statistical Software

TL;DR: The R package cpm is described, which provides a fast implementation of all the above change point models in both batch (Phase I) and sequential (Phase II) settings, where the sequences may contain either a single or multiple change points.

...read moreread less

Abstract: The change point model framework introduced in Hawkins, Qiu, and Kang (2003) and Hawkins and Zamba (2005a) provides an effective and computationally efficient method for detecting multiple mean or variance change points in sequences of Gaussian random variables, when no prior information is available regarding the parameters of the distribution in the various segments. It has since been extended in various ways by Hawkins and Deng (2010), Ross, Tasoulis, and Adams (2011), Ross and Adams (2012) to allow for fully nonparametric change detection in non-Gaussian sequences, when no knowledge is available regarding even the distributional form of the sequence. Another extension comes from Ross and Adams (2011) and Ross (2014) which allows change detection in streams of Bernoulli and Exponential random variables respectively, again when the values of the parameters are unknown. This paper describes the R package cpm, which provides a fast implementation of all the above change point models in both batch (Phase I) and sequential (Phase II) settings, where the sequences may contain either a single or multiple change points.

...read moreread less

Journal Article•DOI•

Exploring Diallelic Genetic Markers: The HardyWeinberg Package

[...]

Jan Graffelman

frbs: Fuzzy Rule-Based Systems for Classification and Regression in R

TL;DR: The HardyWeinberg package offers the classical tests for equilibrium, functions for power computation and for the simulation of marker data under equilibrium and disequilibrium, and various graphical tools for exploring the equilibrium status of a large set of diallelic markers.

...read moreread less

Abstract: Testing genetic markers for Hardy-Weinberg equilibrium is an important issue in genetic association studies. The HardyWeinberg package oers the classical tests for equilibrium, functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Functions for testing equilibrium in the presence of missing data by using multiple imputation are provided. The package also supplies various graphical tools such as ternary plots with acceptance regions, log-ratio plots and Q-Q plots for exploring the equilibrium status of a large set of diallelic markers. Classical tests for equilibrium and graphical representations for diallelic marker data are reviewed. Several data sets illustrate the use of the package.

...read moreread less

Journal Article•DOI•

[...]

Lala Septem Riza¹, Christoph Bergmeir¹, Francisco Herrera¹, José Manuel Benítez¹•Institutions (1)

University of Granada¹

01 Jun 2015-Journal of Statistical Software

TL;DR: This paper presents the R package frbs, which implements the most widely used FRBS models, namely, Mamdani and Takagi Sugeno Kang (TSK) ones, as well as some common variants.

...read moreread less

Abstract: Fuzzy rule-based systems (FRBSs) are a well-known method family within soft computing. They are based on fuzzy concepts to address complex real-world problems. We present the R package frbs which implements the most widely used FRBS models, namely, Mamdani and Takagi Sugeno Kang (TSK) ones, as well as some common variants. In addition a host of learning methods for FRBSs, where the models are constructed from data, are implemented. In this way, accurate and interpretable systems can be built for data analysis and modeling tasks. In this paper, we also provide some examples on the usage of the package and a comparison with other common classification and regression methods available in R.

...read moreread less

Journal Article•DOI•

Bayesian model averaging employing fixed and flexible priors: The BMS package for R

[...]

Stefan Zeugner, Martin Feldkircher

24 Nov 2015-Journal of Statistical Software

TL;DR: The BMS (Bayesian model sampling) package for R that implements Bayesian model averaging for linear regression models excels in allowing for a variety of prior structures, among them the "binomial-beta" prior on the model space and the so-called "hyper-g" specifications for Zellner's g prior.

...read moreread less

Abstract: This article describes the BMS (Bayesian model sampling) package for R that implements Bayesian model averaging for linear regression models. The package excels in allowing for a variety of prior structures, among them the "binomial-beta" prior on the model space and the so-called "hyper-g" specifications for Zellner's g prior. Furthermore, the BMS package allows the user to specify her own model priors and offers a possibility of subjective inference by setting "prior inclusion probabilities" according to the researcher's beliefs. Furthermore, graphical analysis of results is provided by numerous built-in plot functions of posterior densities, predictive densities and graphical illustrations to compare results under different prior settings. Finally, the package provides full enumeration of the model space for small scale problems as well as two efficient MCMC (Markov chain Monte Carlo) samplers that sort through the model space when the number of potential covariates is large.

...read moreread less

Journal Article•DOI•

Spatial Data Analysis with R-INLA with Some Extensions

[...]

Roger Bivand, Virgilio Gómez-Rubio, Håvard Rue

DiceDesign and DiceEval: Two R Packages for Design and Analysis of Computer Experiments

TL;DR: This paper shows how to fit a number of spatial models with R-INLA, including its interaction with other R packages for data analysis, and describes a novel method to extend the number of latent models available for the model parameters.

...read moreread less

Abstract: The integrated nested Laplace approximation (INLA) provides an interesting way of approximating the posterior marginals of a wide range of Bayesian hierarchical models. This approximation is based on conducting a Laplace approximation of certain functions and numerical integration is extensively used to integrate some of the models parameters out. The R-INLA package offers an interface to INLA, providing a suitable framework for data analysis. Although the INLA methodology can deal with a large number of models, only the most relevant have been implemented within R-INLA. However, many other important models are not available for R-INLA yet. In this paper we show how to fit a number of spatial models with R-INLA, including its interaction with other R packages for data analysis. Secondly, we describe a novel method to extend the number of latent models available for the model parameters. Our approach is based on conditioning on one or several model parameters and fit these conditioned models with R-INLA. Then these models are combined using Bayesian model averaging to provide the final approximations to the posterior marginals of the model. Finally, we show some examples of the application of this technique in spatial statistics. It is worth noting that our approach can be extended to a number of other fields, and not only spatial statistics.

...read moreread less

Journal Article•DOI•

[...]

Delphine Dupuy¹, Céline Helbert, Jessica Franco²•Institutions (2)

Schneider Electric¹, Total S.A.²

21 Jun 2015-Journal of Statistical Software

TL;DR: This paper introduces two R packages available on the Comprehensive R Archive network, DiceDesign and DiceEval, dedicated to numerical design of experiments and the fit, the validation and the comparison of metamodels.

...read moreread less

Abstract: This paper introduces two R packages available on the Comprehensive R Archive network. The main application concerns the study of computer code output. Package DiceDesign is dedicated to numerical design of experiments, from the construction to the study of the design properties. Package DiceEval deals with the fit, the validation and the comparison of metamodels. After a brief presentation of the context, we focus on the architecture of these two packages. A two-dimensional test function will be a running example to illustrate the main functionalities of these packages and an industrial case study in five dimensions will also be detailed.

...read moreread less

Journal Article•DOI•

spBayes for Large Univariate and Multivariate Point-Referenced Spatio-Temporal Data Models

[...]

Andrew O. Finley, Sudipto Banerjee¹, Alan E. Gelfand²•Institutions (2)

University of California, Los Angeles¹, Duke University²

13 Feb 2015-Journal of Statistical Software

TL;DR: In this article, the spBayes R package has been reformulated and rewritten to improve computational efficiency, flexibility, and usability for point-referenced data models, which has resulted in improved sampler convergence rate and efficiency by reducing parameter space; decreased sampler run-time by avoiding expensive matrix computations; increased scalability to large datasets by implementing a class of predictive process models that attempt to overcome computational hurdles by representing spatial processes in terms of lower-dimensional realizations.

...read moreread less

Abstract: In this paper we detail the reformulation and rewrite of core functions in the spBayes R package. These efforts have focused on improving computational efficiency, flexibility, and usability for point-referenced data models. Attention is given to algorithm and computing developments that result in improved sampler convergence rate and efficiency by reducing parameter space; decreased sampler run-time by avoiding expensive matrix computations, and; increased scalability to large datasets by implementing a class of predictive process models that attempt to overcome computational hurdles by representing spatial processes in terms of lower-dimensional realizations. Beyond these general computational improvements for existing model functions, we detail new functions for modeling data indexed in both space and time. These new functions implement a class of dynamic spatio-temporal models for settings where space is viewed as continuous and time is taken as discrete.

...read moreread less

Journal Article•DOI•

GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs

[...]

Blake MacDonald¹, Pritam Ranjan¹, Hugh A. Chipman•Institutions (1)

Indian Institute of Management Indore¹

08 Apr 2015-Journal of Statistical Software

TL;DR: This paper implements a slightly modified version of the model proposed by Ranjan et al. (2011) in the R package GPfit, with a novel parameterization of the spatial correlation function and a clustering based multi-start gradient based optimization algorithm that yield robust optimization that is typically faster than the genetic algorithm based approach.

...read moreread less

Abstract: Gaussian process (GP) models are commonly used statistical metamodels for emulating expensive computer simulators. Fitting a GP model can be numerically unstable if any pair of design points in the input space are close together. Ranjan, Haynes, and Karsten (2011) proposed a computationally stable approach for fitting GP models to deterministic computer simulators. They used a genetic algorithm based approach that is robust but computationally intensive for maximizing the likelihood. This paper implements a slightly modified version ofthe model proposed by Ranjan et al. (2011 ) in the R package GPfit. A novel parameterization of the spatial correlation function and a clustering based multi-start gradient based optimization algorithm yield robust optimization that is typically faster than the genetic algorithm based approach. We present two examples with R codes to illustrate the usage of the main functions in GPfit . Several test functions are used for performance comparison with the popular R package mlegp . We also use GPfit for a real application, i.e., for emulating the tidal kinetic energy model for the Bay of Fundy, Nova Scotia, Canada. GPfit is free software and distributed under the General Public License and available from the Comprehensive R Archive Network.

...read moreread less

Journal Article•DOI•

PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes

[...]

Silvia Liverani¹, David I. Hastie², Lamiae Azizi, Michail Papathomas³, Sylvia Richardson - Show less +1 more•Institutions (3)

Brunel University London¹, Imperial College London², University of St Andrews³

spTimer: Spatio-Temporal Bayesian Modeling Using R

TL;DR: PReMiuM as mentioned in this paper is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model, which allows binary, categorical, count and continuous response, as well as continuous and discrete covariates.

...read moreread less

Abstract: PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection.

...read moreread less

Journal Article•DOI•

[...]

Khandoker Shuvo Bakar, Sujit K. Sahu

Rmixmod: The R Package of the Model-Based Unsupervised, Supervised and Semi-Supervised Classification Mixmod Library

TL;DR: The package spTimer for hierarchical Bayesian modeling of stylized environmental space-time monitoring data is developed as a contributed software package in the R language that is fast becoming a very popular statistical computing platform.

...read moreread less

Abstract: Hierarchical Bayesian modeling of large point-referenced space-time data is increasingly becoming feasible in many environmental applications due to the recent advances in both statistical methodology and computation power. Implementation of these methods using the Markov chain Monte Carlo (MCMC) computational techniques, however, requires development of problem-specific and user-written computer code, possibly in a low-level language. This programming requirement is hindering the widespread use of the Bayesian model-based methods among practitioners and, hence there is an urgent need to develop high-level software that can analyze large data sets rich in both space and time. This paper develops the package spTimer for hierarchical Bayesian modeling of stylized environmental space-time monitoring data as a contributed software package in the R language that is fast becoming a very popular statistical computing platform. The package is able to fit, spatially and temporally predict large amounts of space-time data using three recently developed Bayesian models. The user is given control over many options regarding covariance function selection, distance calculation, prior selection and tuning of the implemented MCMC algorithms, although suitable defaults are provided. The package has many other attractive features such as on the fly transformations and an ability to spatially predict temporally aggregated summaries on the original scale, which saves the problem of storage when using MCMC methods for large datasets. A simulation example, with more than a million observations, and a real life data example are used to validate the underlying code and to illustrate the software capabilities.

...read moreread less

Journal Article•DOI•

[...]

Rémi Lebret, Serge Iovleff, Florent Langrognet, Christophe Biernacki, Gilles Celeux, Gérard Govaert - Show less +2 more

Bayesian Inference and Data Augmentation Schemes for Spatial, Spatiotemporal and Multivariate Log-Gaussian Cox Processes in R

TL;DR: An overview of the model-based clustering and classification methods implemented in Mixmod is given, and it is shown how the R package Rmixmod can be used for clustersering and discriminant analysis.

...read moreread less

Abstract: Mixmod is a well-established software package for fitting a mixture model of multivariate Gaussian or multinomial probability distribution functions to a given data set with either a clustering, a density estimation or a discriminant analysis purpose. The Rmixmod S4 package provides a bridge between the C++ core library of Mixmod (mixmodLib) and the R statistical computing environment. In this article, we give an overview of the model-based clustering and classification methods, and we show how the R package Rmixmod can be used for clustering and discriminant analysis.

...read moreread less

Journal Article•DOI•

[...]

Benjamin M. Taylor¹, Tilman M. Davies², Barry Rowlingson¹, Peter J. Diggle¹•Institutions (2)

Lancaster University¹, University of Otago²

10 Feb 2015-Journal of Statistical Software

TL;DR: A suite of R functions provides an extensible framework for inferring covariate effects as well as the parameters of the latent field in log-Gaussian Cox processes and presents methods for Bayesian inference in two further classes of model based on the log- Gaussian Cox process.

...read moreread less

Abstract: Log-Gaussian Cox processes are an important class of models for spatial and spatiotemporal point-pattern data. Delivering robust Bayesian inference for this class of models presents a substantial challenge, since Markov chain Monte Carlo (MCMC) algorithms require careful tuning in order to work well. To address this issue, we describe recent advances in MCMC methods for these models and their implementation in the R package lgcp. Our suite of R functions provides an extensible framework for inferring covariate effects as well as the parameters of the latent field. We also present methods for Bayesian inference in two further classes of model based on the log-Gaussian Cox process. The first of these concerns the case where we wish to fit a point process model to data consisting of event-counts aggregated to a set of spatial regions: we demonstrate how this can be achieved using data-augmentation. The second concerns Bayesian inference for a class of marked-point processes specified via a multivariate log-Gaussian Cox process model. For both of these extensions, we give details of their implementation in R.

...read moreread less

Journal Article•DOI•

Statistics: An Introduction Using R (2nd Edition)

[...]

James E. Helmreich

Model-Based Geostatistics the Easy Way

TL;DR: Crawley as discussed by the authors presents a short introduction to statistical analysis with R. The first half covers standard introductory material (descriptive statistics, simple regression and inferential statistics) in a somewhat idiosyncratic way, the second half is a good introduction to modeling (multiple regression, ANOVA and ANCOVA, various general linear models, survival analysis).

...read moreread less

Abstract: Michael J. Crawley’s text Statistics: An Introduction with R is indeed an introduction to statistical analysis as well as excellent introduction to working in and with R. However, it is not for the faint of heart. The first half covers standard introductory material (descriptive statistics, simple regression and inferential statistics) in a somewhat idiosyncratic way. The second half of the text is a good introduction to modeling (multiple regression, ANOVA and ANCOVA, various general linear models, survival analysis). The treatments are thorough and yet short – only 300 odd pages. As you can imagine he is extraordinarily succinct. There are many useful and thoughtful insights; it is well written and clear. It contains no exercises; there is an excellent appendix that serves well as a basic R tutorial. If you teach this material without a text or think you might like to try, you might find that it (paradoxically) will work as the structural backbone for a course that you fill out as you see fit. The text will serve well for an introductory course or for a second course in modeling.

...read moreread less

Journal Article•DOI•

[...]

Patrick Brown

13 Feb 2015-Journal of Statistical Software

TL;DR: This paper briefly describes geostatistical models for Gaussian and non-Gaussian data and demonstrates the geostatsp and dieasemapping packages for performing inference using these models.

...read moreread less

Abstract: This paper briefly describes geostatistical models for Gaussian and non-Gaussian data and demonstrates the geostatsp and dieasemapping packages for performing inference using these models. Making use of R’s spatial data types, and raster objects in particular, makes spatial analyses using geostatistical models simple and convenient. Examples using real data are shown for Gaussian spatial data, binomially distributed spatial data, a logGaussian Cox process, and an area-level model for case counts.

...read moreread less

Journal Article•DOI•

The glarma Package for Observation-Driven Time Series Regression of Counts

[...]

William T. M. Dunsmuir, David J. Scott