Showing papers in "Journal of Statistical Software in 2012"

PDF

Open Access

Journal Article•DOI•

lavaan: An R Package for Structural Equation Modeling

[...]

24 May 2012-Journal of Statistical Software

TL;DR: The aims behind the development of the lavaan package are explained, an overview of its most important features are given, and some examples to illustrate how lavaan works in practice are provided.

...read moreread less

Abstract: Structural equation modeling (SEM) is a vast field and widely used by many applied researchers in the social and behavioral sciences. Over the years, many software packages for structural equation modeling have been developed, both free and commercial. However, perhaps the best state-of-the-art software packages in this field are still closed-source and/or commercial. The R package lavaan has been developed to provide applied researchers, teachers, and statisticians, a free, fully open-source, but commercial-quality package for latent variable modeling. This paper explains the aims behind the development of the package, gives an overview of its most important features, and provides some examples to illustrate how lavaan works in practice.

...read moreread less

14,401 citations

Journal Article•DOI•

Qgraph: Network visualizations of relationships in psychometric data

[...]

Sacha Epskamp, Angélique O. J. Cramer¹, Lourens J. Waldorp, Verena D. Schmittmann, Denny Borsboom - Show less +1 more•Institutions (1)

VU University Amsterdam¹

24 May 2012-Journal of Statistical Software

TL;DR: The qgraph package for R is presented, which provides an interface to visualize data through network modeling techniques, and is introduced by applying the package functions to data from the NEO-PI-R, a widely used personality questionnaire.

...read moreread less

Abstract: We present the qgraph package for R, which provides an interface to visualize data through network modeling techniques. For instance, a correlation matrix can be represented as a network in which each variable is a node and each correlation an edge; by varying the width of the edges according to the magnitude of the correlation, the structure of the correlation matrix can be visualized. A wide variety of matrices that are used in statistics can be represented in this fashion, for example matrices that contain (implied) covariances, factor loadings, regression parameters and p values. qgraph can also be used as a psychometric tool, as it performs exploratory and confirmatory factor analysis, using sem and lavaan; the output of these packages is automatically visualized in qgraph ,w hich may aid the interpretation of results. In this article, we introduce qgraph by applying the package functions to data from the NEO-PI-R, a widely used personality questionnaire.

...read moreread less

2,338 citations

Journal Article•DOI•

mirt: A Multidimensional Item Response Theory Package for the R Environment

[...]

R. Philip Chalmers

24 May 2012-Journal of Statistical Software

TL;DR: The mirt package was created for estimating multidimensional item response theory parameters for exploratory and confirmatory models by using maximum-likelihood meth- ods.

...read moreread less

Abstract: Item response theory (IRT) is widely used in assessment and evaluation research to explain how participants respond to item level stimuli. Several R packages can be used to estimate the parameters in various IRT models, the most flexible being the ltm (Rizopoulos 2006), eRm (Mair and Hatzinger 2007), and MCMCpack (Martin, Quinn, and Park 2011) packages. However these packages have limitations in that ltm and eRm can only analyze unidimensional IRT models effectively and the exploratory multidimensional extensions available in MCMCpack requires prior understanding of Bayesian estimation convergence diagnostics and are computationally intensive. Most importantly, multidimensional confirmatory item factor analysis methods have not been implemented in any R package. The mirt package was created for estimating multidimensional item response theory parameters for exploratory and confirmatory models by using maximum-likelihood meth- ods. The Gauss-Hermite quadrature method used in traditional EM estimation (e.g., Bock and Aitkin 1981) is presented for exploratory item response models as well as for confirmatory bifactor models (Gibbons and Hedeker 1992). Exploratory and confirmatory models are estimated by a stochastic algorithm described by Cai (2010a,b). Various program comparisons are presented and future directions for the package are discussed.

...read moreread less

1,420 citations

Journal Article•DOI•

nparLD: An R Software Package for the Nonparametric Analysis of Longitudinal Data in Factorial Experiments

[...]

Kimihiro Noguchi, Yulia R. Gel, Edgar Brunner, Frank Konietschke

18 Sep 2012-Journal of Statistical Software

TL;DR: In this paper, the authors introduce a new R package nparLD which provides statisticians and researchers from other disciplines an easy and user-friendly access to the most up-to-date robust rank-based methods for the analysis of longitudinal data in factorial settings.

...read moreread less

Abstract: Longitudinal data from factorial experiments frequently arise in various fields of study, ranging from medicine and biology to public policy and sociology In most practical situations, the distribution of observed data is unknown and there may exist a number of atypical measurements and outliers Hence, use of parametric and semi-parametric procedures that impose restrictive distributional assumptions on observed longitudinal samples becomes questionable This, in turn, has led to a substantial demand for statistical procedures that enable us to accurately and reliably analyze longitudinal measurements in factorial experiments with minimal conditions on available data, and robust nonparametric methodology offering such a possibility becomes of particular practical importance In this article, we introduce a new R package nparLD which provides statisticians and researchers from other disciplines an easy and user-friendly access to the most up-to-date robust rank-based methods for the analysis of longitudinal data in factorial settings We illustrate the implemented procedures by case studies from dentistry, biology, and medicine

...read moreread less

1,181 citations

Journal Article•DOI•

Glotaran: A Java-Based Graphical User Interface for the R Package TIMP

[...]

Joris J. Snellenburg, Sergey P. Laptenok, Ralf Seger, Katharine M. Mullen, Ivo H. M. van Stokkum - Show less +1 more

30 Jun 2012-Journal of Statistical Software

TL;DR: Glotaran is introduced as a Java-based graphical user interface to the R package TIMP, a problem solving environment for fitting superposition models to multi-dimensional data which features interactive and dynamic data inspection and interactive viewing of results.

...read moreread less

Abstract: In this work the software application called Glotaran is introduced as a Java-based graphical user interface to the R package TIMP, a problem solving environment for fitting superposition models to multi-dimensional data. TIMP uses a command-line user interface for the interaction with data, the specification of models and viewing of analysis results. Instead, Glotaran provides a graphical user interface which features interactive and dynamic data inspection, easier -- assisted by the user interface -- model specification and interactive viewing of results. The interactivity component is especially helpful when working with large, multi-dimensional datasets as often result from time-resolved spectroscopy measurements, allowing the user to easily pre-select and manipulate data before analysis and to quickly zoom in to regions of interest in the analysis results. Glotaran has been developed on top of the NetBeans rich client platform and communicates with R through the Java-to-R interface Rserve. The background and the functionality of the application are described here. In addition, the design, development and implementation process of Glotaran is documented in a generic way.

...read moreread less

994 citations

Journal Article•DOI•

Closing the Gap between Methodologists and End-Users: R as a Computational Back-End

[...]

Byron C. Wallace, Issa J Dahabreh¹, Thomas A Trikalinos¹, Joseph Lau¹, Paul Trow¹, Christopher H. Schmid¹ - Show less +2 more•Institutions (1)

Tufts University¹

30 Jun 2012-Journal of Statistical Software

TL;DR: This paper presents open-source meta-analysis software that uses R as the underlying statistical engine, and Python for the GUI, and a framework that allows methodologists to implement new methods in R that are then automatically integrated into the GUI for use by end-users, so long as the programmer conforms to the interface.

...read moreread less

Abstract: The R environment provides a natural platform for developing new statistical methods due to the mathematical expressiveness of the language, the large number of existing libraries, and the active developer community. One drawback to R, however, is the learning curve; programming is a deterrent to non-technical users, who typically prefer graphical user interfaces (GUIs) to command line environments. Thus, while statisticians develop new methods in R, practitioners are often behind in terms of the statistical techniques they use as they rely on GUI applications. Meta-analysis is an instructive example; cutting-edge meta-analysis methods are often ignored by the overwhelming majority of practitioners, in part because they have no easy way of applying them. This paper proposes a strategy to close the gap between the statistical state-of-the-science and what is applied in practice. We present open-source meta-analysis software that uses R as the underlying statistical engine, and Python for the GUI. We present a framework that allows methodologists to implement new methods in R that are then automatically integrated into the GUI for use by end-users, so long as the programmer conforms to our interface. Such an approach allows an intuitive interface for non-technical users while leveraging the latest advanced statistical methods implemented by methodologists.

...read moreread less

848 citations

Journal Article•DOI•

Fast R Functions for Robust Correlations and Hierarchical Clustering

[...]

Peter Langfelder¹, Steve Horvath•Institutions (1)

University of California, Los Angeles¹

07 Mar 2012-Journal of Statistical Software

TL;DR: An implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries is presented, and the package flashClust that implements the original algorithm which in practice achieves order approximately n(2), leading to substantial time savings when clustering large data sets.

...read moreread less

Abstract: Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA.The hierarchical clustering algorithm implemented in R function hclust is an order n(3) (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n(2), leading to substantial time savings when clustering large data sets.

...read moreread less

839 citations

Journal Article•DOI•

DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization

[...]

Olivier Roustant, David Ginsbourger, Yves Deville

22 Oct 2012-Journal of Statistical Software

TL;DR: The versatility of DiceKriging with respect to trend and noise specifications, covariance parameter estimation, as well as conditional and unconditional simulations are illustrated on the basis of several reproducible numerical experiments.

...read moreread less

Abstract: We present two recently released R packages, DiceKriging and DiceOptim, for the approximation and the optimization of expensive-to-evaluate deterministic functions. Following a self-contained mini tutorial on Kriging-based approximation and optimization, the functionalities of both packages are detailed and demonstrated in two distinct sections. In particular, the versatility of DiceKriging with respect to trend and noise specifications, covariance parameter estimation, as well as conditional and unconditional simulations are illustrated on the basis of several reproducible numerical experiments. We then put to the fore the implementation of sequential and parallel optimization strategies relying on the expected improvement criterion on the occasion of DiceOptim’s presentation. An appendix is dedicated to complementary mathematical and computational details.

...read moreread less

622 citations

Journal Article•DOI•

semPLS: Structural Equation Modeling Using Partial Least Squares

[...]

Armin Monecke, Friedrich Leisch¹•Institutions (1)

University of Wollongong¹

24 May 2012-Journal of Statistical Software

TL;DR: The semPLS package provides the capability to estimate PLS path models within the R programming environment and contains modular methods for computation of bootstrap confidence intervals, model parameters and several quality indices.

...read moreread less

Abstract: Structural equation models (SEM) are very popular in many disciplines. The partial least squares (PLS) approach to SEM offers an alternative to covariance-based SEM, which is especially suited for situations when data is not normally distributed. PLS path modelling is referred to as soft-modeling-technique with minimum demands regarding mea- surement scales, sample sizes and residual distributions. The semPLS package provides the capability to estimate PLS path models within the R programming environment. Different setups for the estimation of factor scores can be used. Furthermore it contains modular methods for computation of bootstrap confidence intervals, model parameters and several quality indices. Various plot functions help to evaluate the model. The well known mobile phone dataset from marketing research is used to demonstrate the features of the package.

...read moreread less

596 citations

Journal Article•DOI•

Causal Inference using Graphical Models with the R Package pcalg

[...]

Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, Peter Bühlmann - Show less +1 more

17 May 2012-Journal of Statistical Software

TL;DR: The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data.

...read moreread less

Abstract: The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data. In this document, we give a brief overview of the methodology, and demonstrate the package’s functionality in both toy examples and applications.

...read moreread less

576 citations

Journal Article•DOI•

Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned

[...]

Bettina Grün, Ioannis Kosmidis, Achim Zeileis

24 May 2012-Journal of Statistical Software

TL;DR: These extensions make beta regression not only “a better lemon squeezer” but a full-fledged modern juicer offering lemon-based drinks: shaken and stirred, mixed, mixed (finite mixture model), or partitioned (tree model).

...read moreread less

Abstract: Beta regression – an increasingly popular approach for modeling rates and proportions – is extended in various directions: (a) bias correction/reduction of the maximum likelihood estimator, (b) beta regression tree models by means of recursive partitioning, (c) latent class beta regression by means of finite mixture models. All three extensions may be of importance for enhancing the beta regression toolbox in practice to provide more reliable inference and capture both observed and unobserved/latent heterogeneity in the data. Using the analogy of Smithson and Verkuilen (2006), these extensions make beta regression not only “a better lemon squeezer” (compared to classical least squares regression) but a full-fledged modern juicer offering lemon-based drinks: shaken and stirred (bias correction and reduction), mixed (finite mixture model), or partitioned (tree model). All three extensions are provided in the R package betareg (at least 2.4-0), building on generic algorithms and implementations for bias correction/reduction, model-based recursive partioning, and finite mixture models, respectively. Specifically, the new functions betatree() and betamix() reuse the object-oriented flexible implementation from the R packages party and flexmix, respectively.

...read moreread less

Journal Article•DOI•

Evaluating Random Forests for Survival Analysis Using Prediction Error Curves

[...]

Ulla B Mogensen¹, Hemant Ishwaran², Thomas A. Gerds¹•Institutions (2)

University of Copenhagen¹, University of Miami²

18 Sep 2012-Journal of Statistical Software

TL;DR: The R package pec is surveyed, showing how the functionality of pec can be extended to yet unsupported prediction models, and implemented support for random forest prediction models based on the R-packages randomSurvivalForest and party.

...read moreread less

Abstract: Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages randomSurvivalForest and party. Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.

...read moreread less

Journal Article•DOI•

splm: Spatial Panel Data Models in R

[...]

Giovanni Millo, Gianfranco Piras

17 Apr 2012-Journal of Statistical Software

TL;DR: This paper considers the implementation of both maximum likelihood and generalized moments estimators in the context of fixed as well as random effects spatial panel data models and performs comparisons with other available software.

...read moreread less

Abstract: splm is an R package for the estimation and testing of various spatial panel data specifications. We consider the implementation of both maximum likelihood and generalized moments estimators in the context of fixed as well as random effects spatial panel data models. This paper is a general description of splm and all functionalities are illustrated using a well-known example taken from Munnell (1990) with productivity data on 48 US states observed over 17 years. We perform comparisons with other available software; and, when this is not possible, Monte Carlo results support our original implementation.

...read moreread less

Journal Article•DOI•

Statistical Computing in Functional Data Analysis: The R Package fda.usc

[...]

Manuel Febrero-Bande, Manuel Oviedo de la Fuente

22 Oct 2012-Journal of Statistical Software

TL;DR: This paper is devoted to the R package fda.usc which includes some utilities for functional data analysis which carries out exploratory and descriptive analysis of functional data analyzing its most important features such as depth measurements or functional outliers detection, among others.

...read moreread less

Abstract: This paper is devoted to the R package fda.usc which includes some utilities for functional data analysis. This package carries out exploratory and descriptive analysis of functional data analyzing its most important features such as depth measurements or functional outliers detection, among others. The R package fda.usc also includes functions to compute functional regression models, with a scalar response and a functional explanatory data via non-parametric functional regression, basis representation or functional principal components analysis. There are natural extensions such as functional linear models and semi-functional partial linear models, which allow non-functional covariates and factors and make predictions. The functions of this package complement and incorporate the two main references of functional data analysis: The R package fda and the functions implemented by Ferraty and Vieu (2006).

...read moreread less

Journal Article•DOI•

Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS

[...]

Christoph Bergmeir¹, José Manuel Benítez¹•Institutions (1)

University of Granada¹

30 Jan 2012-Journal of Statistical Software

TL;DR: The R package RSNNS is described that provides a convenient interface to the popular Stuttgart Neural Network Simulator SNNS, and encapsulation of the relevant SNNs parts in a C++ class for sequential and parallel usage of different networks.

...read moreread less

Abstract: Neural networks are important standard machine learning procedures for classication and regression. We describe the R package RSNNS that provides a convenient interface to the popular Stuttgart Neural Network Simulator SNNS. The main features are (a) encapsulation of the relevant SNNS parts in a C++ class, for sequential and parallel usage of dierent networks, (b) accessibility of all of the SNNS algorithmic functionality from R using a low-level interface, and (c) a high-level interface for convenient, R-style usage of many standard neural network procedures. The package also includes functions for visualization and analysis of the models and the training procedures, as well as functions for data input/output from/to the original SNNS le formats.

...read moreread less

Journal Article•DOI•

New Developments in Mokken Scale Analysis in R

[...]

L.A. van der Ark

24 May 2012-Journal of Statistical Software

TL;DR: The new developments in MSA since 2000 that have been implemented in mokken since its first release in 2007 are described and all new applications are demonstrated using data obtained with a transitive reasoning test and a personality test.

...read moreread less

Abstract: Mokken (1971) developed a scaling procedure for both dichotomous and polytomous items that was later coined Mokken scale analysis (MSA). MSA has been developed ever since, and the developments until 2000 have been implemented in the software package MSP (Molenaar and Sijtsma 2000) and the R package mokken (Van der Ark 2007). This paper describes the new developments in MSA since 2000 that have been implemented in mokken since its first release in 2007. These new developments pertain to invariant item ordering, a new automated item selection procedure based on a genetic algorithm, inclusion of reliability coefficients, and the computation of standard errors for the scalability coefficients. We demonstrate all new applications using data obtained with a transitive reasoning test and a personality test.

...read moreread less

Journal Article•DOI•

ClustOfVar: An R Package for the Clustering of Variables

[...]

Marie Chavent¹, Vanessa Kuentz-Simonet, Benoit Liquet, Jérôme Saracco•Institutions (1)

University of Bordeaux¹

22 Sep 2012-Journal of Statistical Software

TL;DR: In this article, two clustering algorithms are proposed to optimize the homogeneity criterion: iterative relocation algorithm and ascendant hierarchical clustering, and a bootstrap approach is also proposed to determine suitable numbers of clusters.

...read moreread less

Abstract: Clustering of variables is as a way to arrange variables into homogeneous clusters, i.e., groups of variables which are strongly related to each other and thus bring the same information. These approaches can then be useful for dimension reduction and variable selection. Several specific methods have been developed for the clustering of numerical variables. However concerning qualitative variables or mixtures of quantitative and qualitative variables, far fewer methods have been proposed. The R package ClustOfVar was specifically developed for this purpose. The homogeneity criterion of a cluster is defined as the sum of correlation ratios (for qualitative variables) and squared correlations (for quantitative variables) to a synthetic quantitative variable, summarizing ``as good as possible'' the variables in the cluster. This synthetic variable is the first principal component obtained with the PCAMIX method. Two clustering algorithms are proposed to optimize the homogeneity criterion: iterative relocation algorithm and ascendant hierarchical clustering. We also propose a bootstrap approach in order to determine suitable numbers of clusters. We illustrate the methodologies and the associated package on small datasets.

...read moreread less

Journal Article•DOI•

Spherical k-Means Clustering

[...]

Kurt Hornik, Ingo Feinerer, Martin Kober

18 Sep 2012-Journal of Statistical Software

TL;DR: The R extension package skmeans is introduced which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans).

...read moreread less

Abstract: Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents. This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.

...read moreread less

Journal Article•DOI•

An SPSS R-Menu for Ordinal Factor Analysis

[...]

Manuel Pereira

30 Jan 2012-Journal of Statistical Software

TL;DR: This paper offers an SPSS dialog written in the R programming language with the help of some packages, so that researchers with little or no knowledge in programming, or those who are accustomed to making their calculations based on statistical dialogs, have more options when applying factor analysis to their data and hence can adopt a better approach when dealing with ordinal, Likert-type data.

...read moreread less

Abstract: Exploratory factor analysis is a widely used statistical technique in the social sciences. It attempts to identify underlying factors that explain the pattern of correlations within a set of observed variables. A statistical software package is needed to perform the calculations. However, there are some limitations with popular statistical software packages, like SPSS. The R programming language is a free software package for statistical and graphical computing. It oers many packages written by contributors from all over the world and programming resources that allow it to overcome the dialog limitations of SPSS. This paper oers an SPSS dialog written in the R programming language with the help of some packages, so that researchers with little or no knowledge in programming, or those who are accustomed to making their calculations based on statistical dialogs, have more options when applying factor analysis to their data and hence can adopt a better approach when dealing with ordinal, Likert-type data.

...read moreread less

Journal Article•DOI•

frailtypack: An R Package for the Analysis of Correlated Survival Data with Frailty Models Using Penalized Likelihood Estimation or Parametrical Estimation

[...]

Virginie Rondeau, Yassin Mazroui, Juan R. González

17 Apr 2012-Journal of Statistical Software

TL;DR: The aim of this article is to present the new version of an R package called frailtypack, which allows to fit Cox models and four types of frailty models (shared, nested, joint, additive) that could be useful for several issues within biomedical research.

...read moreread less

Abstract: Frailty models are very useful for analysing correlated survival data, when observations are clustered into groups or for recurrent events. The aim of this article is to present the new version of an R package called frailtypack. This package allows to fit Cox models and four types of frailty models (shared, nested, joint, additive) that could be useful for several issues within biomedical research. It is well adapted to the analysis of recurrent events such as cancer relapses and/or terminal events (death or lost to follow-up). The approach uses maximum penalized likelihood estimation. Right-censored or left-truncated data are considered. It also allows stratification and time-dependent covariates during analysis.

...read moreread less

Journal Article•DOI•

Bradley-Terry Models in R: The BradleyTerry2 Package

[...]

Heather Turner, David Firth

24 May 2012-Journal of Statistical Software

TL;DR: This is a short overview of the R add-on package BradleyTerry2, which facilitates the specification and fitting of Bradley-Terry logit, probit or cauchit models to paircomparison data.

...read moreread less

Abstract: This is a short overview of the R add-on package BradleyTerry2, which facilitates the specification and fitting of Bradley-Terry logit, probit or cauchit models to pair-comparison data. Included are the standard 'unstructured' Bradley-Terry model, structured versions in which the parameters are related through a linear predictor to explanatory variables, and the possibility of an order or 'home advantage' effect or other 'contest-specific' effects. Model fitting is either by maximum likelihood, by penalized quasi-likelihood (for models which involve a random effect), or by bias-reduced maximum likelihood in which the first-order asymptotic bias of parameter estimates is eliminated. Also provided are a simple and efficient approach to handling missing covariate data, and suitably-defined residuals for diagnostic checking of the linear predictor.

...read moreread less

Journal Article•DOI•

tmle : An R Package for Targeted Maximum Likelihood Estimation

[...]

Susan Gruber¹, Mark J. van der Laan²•Institutions (2)

Harvard University¹, University of California, Berkeley²

16 Nov 2012-Journal of Statistical Software

TL;DR: Tmle is a recently developed R package that implements TMLE of the effect of a binary treatment at a single point in time on an outcome of interest, controlling for user supplied covariates, including an additive treatment effect, relative risk, odds ratio, and the controlled direct effect.

...read moreread less

Abstract: Targeted maximum likelihood estimation (TMLE) is a general approach for constructing an efficient double-robust semi-parametric substitution estimator of a causal effect parameter or statistical association measure. tmle is a recently developed R package that implements TMLE of the effect of a binary treatment at a single point in time on an outcome of interest, controlling for user supplied covariates, including an additive treatment effect, relative risk, odds ratio, and the controlled direct effect of a binary treatment controlling for a binary intermediate variable on the pathway from treatment to the out- come. Estimation of the parameters of a marginal structural model is also available. The package allows outcome data with missingness, and experimental units that contribute repeated records of the point-treatment data structure, thereby allowing the analysis of longitudinal data structures. Relevant factors of the likelihood may be modeled or fit data-adaptively according to user specifications, or passed in from an external estimation procedure. Effect estimates, variances, p values, and 95% confidence intervals are provided by the software.

...read moreread less

Journal Article•DOI•

spacetime: Spatio-Temporal Data in R

[...]

Edzer Pebesma

13 Nov 2012-Journal of Statistical Software

TL;DR: To explore how spatio-temporal data can be sensibly represented in classes, and to find out which analysis and visualisation methods are useful and feasible, the time series convention of representing time intervals by their starting time only is discussed.

...read moreread less

Abstract: This document describes classes and methods designed to deal with different types of spatio-temporal data in R implemented in the R package spacetime, and provides examples for analyzing them. It builds upon the classes and methods for spatial data from package sp, and for time series data from package xts. The goal is to cover a number of useful representations for spatio-temporal sensor data, and results from predicting (spatial and/or temporal interpolation or smoothing), aggregating, or subsetting them, and to represent trajectories. The goals of this paper is to explore how spatio-temporal data can be sensibly represented in classes, and to find out which analysis and visualisation methods are useful and feasible. We discuss the time series convention of representing time intervals by their starting time only. This document is the main reference for the R package spacetime, and is available (in updated form) as a vignette in this package.

...read moreread less

Journal Article•DOI•

IRTrees: Tree-Based Item Response Models of the GLMM Family

[...]

Paul De Boeck¹, Ivailo Partchev•Institutions (1)

Katholieke Universiteit Leuven¹

24 May 2012-Journal of Statistical Software

TL;DR: The aim of the article is to present four subcategories of models, the first two of which are based on a tree representation for response categories: 1. linear response tree models, 2. nested responseTree models, 3. linear latent-variable tree models (e.g., models for change processes), and 4. bi-factor models, which are members of the family of generalized linear mixed models (GLMM).

...read moreread less

Abstract: A category of item response models is presented with two defining features: they all (i) have a tree representation, and (ii) are members of the family of generalized linear mixed models (GLMM). Because the models are based on trees, they are denoted as IRTree models. The GLMM nature of the models implies that they can all be estimated with the glmer function of the lme4 package in R. The aim of the article is to present four subcategories of models, the first two of which are based on a tree representation for response categories: 1. linear response tree models (e.g., missing response models), 2. nested response tree models (e.g., models for parallel observations regarding item responses such as agreement and certainty), while the last two are based on a tree representation for latent variables: 3. linear latent-variable tree models (e.g., models for change processes), and 4. nested latent-variable tree models (e.g., bi-factor models). The use of the glmer function is illustrated for all four subcategories. Simulated example data sets and two service functions useful in preparing the data for IRTree modeling with glmer are provided in the form of an R package, irtrees. For all four subcategories also a real data application is discussed.

...read moreread less

Journal Article•DOI•

Graphical Independence Networks with the gRain Package for R

[...]

Søren Højsgaard

28 Feb 2012-Journal of Statistical Software

TL;DR: The main part of the paper is an illustration of how to use the R package gRain for propagation in graphical independence networks (for which Bayesian networks is a special instance).

...read moreread less

Abstract: In this paper we present the R package gRain for propagation in graphical independence networks (for which Bayesian networks is a special instance). The paper includes a description of the theory behind the computations. The main part of the paper is an illustration of how to use the package. The paper also illustrates how to turn a graphical model and data into an independence network.

...read moreread less

Journal Article•DOI•

parfm: Parametric Frailty Models in R

[...]

Marco Munda, Federico Rotolo, Catherine Legrand

13 Nov 2012-Journal of Statistical Software

TL;DR: The new parfm package remedies that lack by providing a wide range of parametric frailty models in R by maximising the marginal log-likelihood, with right-censored and possibly left-truncated data.

...read moreread less

Abstract: Frailty models are getting more and more popular to account for overdispersion and/or clustering in survival data. When the form of the baseline hazard is somehow known in advance, the parametric estimation approach can be used advantageously. Nonetheless, there is no unified widely available software that deals with the parametric frailty model. The new parfm package remedies that lack by providing a wide range of parametric frailty models in R. The gamma, inverse Gaussian, and positive stable frailty distributions can be specified, together with five different baseline hazards. Parameter estimation is done by maximising the marginal log-likelihood, with right-censored and possibly left-truncated data. In the multivariate setting, the inverse Gaussian may encounter numerical difficulties with a huge number of events in at least one cluster. The positive stable model shows analogous difficulties but an ad-hoc solution is implemented, whereas the gamma model is very resistant due to the simplicity of its Laplace transform.

...read moreread less

Journal Article•DOI•

Deducer: A Data Analysis GUI for R

[...]

Ian Fellows

30 Jun 2012-Journal of Statistical Software

TL;DR: Deducer is a graphical user interface for R that presents dialogs that are understandable for the beginner, and yet contain all (or most) of the options that an experienced statistician, performing the same task, would want.

...read moreread less

Abstract: While R has proven itself to be a powerful and flexible tool for data exploration and analysis, it lacks the ease of use present in other software such as SPSS and Minitab. An easy to use graphical user interface (GUI) can help new users accomplish tasks that would otherwise be out of their reach, and improves the efficiency of expert users by replacing fifty key strokes with five mouse clicks. With this in mind, Deducer presents dialogs that are understandable for the beginner, and yet contain all (or most) of the options that an experienced statistician, performing the same task, would want. An Excel-like spreadsheet is included for easy data viewing and editing. Deducer is based on Java's Swing GUI library and can be used on any common operating system. The GUI is independent of the specific R console and can easily be used by calling a text-based menu system. Graphical menus are provided for the JGR console and the Windows R GUI.

...read moreread less

Journal Article•DOI•

MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms

[...]

Volodymyr Melnykov, Wei-Chen Chen, Ranjan Maitra

13 Nov 2012-Journal of Statistical Software

TL;DR: The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components, and can be readily employed to control the clustering complexity of datasets simulated from mixtures.

...read moreread less

Abstract: The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.

...read moreread less

Journal Article•DOI•

MortalitySmooth: An R Package for Smoothing Poisson Counts with P-Splines

[...]

Carlo G. Camarda

20 Jul 2012-Journal of Statistical Software

TL;DR: The MortalitySmooth package provides a framework for smoothing count data in both one- and two-dimensional settings and is specifically tailored to demographers, actuaries, epidemiologists, and geneticists who may be interested in using a practical tool for smoothed mortality data over ages and/or years.

...read moreread less

Abstract: The MortalitySmooth package provides a framework for smoothing count data in both one- and two-dimensional settings Although general in its purposes, the package is specifically tailored to demographers, actuaries, epidemiologists, and geneticists who may be interested in using a practical tool for smoothing mortality data over ages and/or years The total number of deaths over a specified age- and year-interval is assumed to be Poisson-distributed, and P-splines and generalized linear array models are employed as a suitable regression methodology Extra-Poisson variation can also be accommodated

...read moreread less

Journal Article•DOI•

Basic Functions for Supporting an Implementation of Choice Experiments in R

[...]

Hideo Aizaki

22 Sep 2012-Journal of Statistical Software

TL;DR: The package support.CEs provides seven basic functions that support the implementation of choice experiments (CEs) in R: two functions for creating a CE design, which is based on orthogonal main-effect arrays and a function for calculating the goodness-of-fit measures of an estimated model.

...read moreread less

Abstract: The package support.CEs provides seven basic functions that support the implementation of choice experiments (CEs) in R: two functions for creating a CE design, which is based on orthogonal main-effect arrays; a function for converting a CE design into questionnaire format; a function for converting a CE design into a design matrix; a function for making the data set suitable for the implementation of a conditional logit model; a function for calculating the goodness-of-fit measures of an estimated model; and a function for calculating the marginal willingness to pay for the attributes and/or levels of the estimated model.

...read moreread less