Showing papers in &quot;Journal of Statistical Software in 2011&quot;

MatchIt: Nonparametric Preprocessing for Parametric Causal Inference

TL;DR: Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.

...read moreread less

Abstract: The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

...read moreread less

10,234 citations

Journal Article•DOI•

[...]

Daniel E. Ho¹, Kosuke Imai², Gary King³, Elizabeth A. Stuart⁴•Institutions (4)

Stanford University¹, Princeton University², Harvard University³, Johns Hopkins University⁴

Amelia II: A Program for Missing Data

TL;DR: MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions.

...read moreread less

Abstract: MatchIt implements the suggestions of Ho, Imai, King, and Stuart (2007) for improving parametric statistical models by preprocessing data with nonparametric matching methods. MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions. The software also easily fits into existing research practices since, after preprocessing data with MatchIt , researchers can use whatever parametric model they would have used without MatchIt , but produce inferences with substantially more robustness and less sensitivity to modeling assumptions. MatchIt is an R program, and also works seamlessly with Zelig .

...read moreread less

3,012 citations

Journal Article•DOI•

[...]

James Honaker, Gary King, Matthew Blackwell

The Split-Apply-Combine Strategy for Data Analysis

TL;DR: The Amelia II package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers.

...read moreread less

Abstract: Amelia II is a complete R package for multiple imputation of missing data. The package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers. The program also improves imputation models by allowing researchers to put Bayesian priors on individual cell values, thereby including a great deal of potentially valuable and extensive information. It also includes features to accurately impute cross-sectional datasets, individual time series, or sets of time series for different cross-sections. A full set of graphical diagnostics are also available. The program is easy to use, and the simplicity of the algorithm makes it far more robust; both a simple command line and extensive graphical user interface are included.

...read moreread less

2,404 citations

Journal Article•DOI•

[...]

Hadley Wickham

unmarked: An R Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance

TL;DR: This paper gives rise to a new R package that allows you to smoothly apply a split-apply-combine strategy, without having to worry about the type of structure in which your data is stored.

...read moreread less

Abstract: Many data analysis problems involve the application of a split-apply-combine strategy, where you break up a big problem into manageable pieces, operate on each piece independently and then put all the pieces back together. This insight gives rise to a new R package that allows you to smoothly apply this strategy, without having to worry about the type of structure in which your data is stored. The paper includes two case studies showing how these insights make it easier to work with batting records for veteran baseball players and a large 3d array of spatio-temporal ozone measurements.

...read moreread less

2,243 citations

Journal Article•DOI•

[...]

Ian Fiske, Richard B. Chandler

24 Aug 2011-Journal of Statistical Software

TL;DR: The R package unmarked provides a unified modeling framework for ecological research, including tools for data exploration, model fitting, model criticism, post-hoc analysis, and model comparison.

...read moreread less

Abstract: Ecological research uses data collection techniques that are prone to substantial and unique types of measurement error to address scientific questions about species abundance and distribution. These data collection schemes include a number of survey methods in which unmarked individuals are counted, or determined to be present, at spatially- referenced sites. Examples include site occupancy sampling, repeated counts, distance sampling, removal sampling, and double observer sampling. To appropriately analyze these data, hierarchical models have been developed to separately model explanatory variables of both a latent abundance or occurrence process and a conditional detection process. Because these models have a straightforward interpretation paralleling mechanisms under which the data arose, they have recently gained immense popularity. The common hierarchical structure of these models is well-suited for a unified modeling interface. The R package unmarked provides such a unified modeling framework, including tools for data exploration, model fitting, model criticism, post-hoc analysis, and model comparison.

...read moreread less

1,675 citations

Journal Article•DOI•

Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent

[...]

Noah Simon¹, Jerome H. Friedman¹, Trevor Hastie¹, Robert Tibshirani¹•Institutions (1)

Stanford University¹

09 Mar 2011-Journal of Statistical Software

TL;DR: This work introduces a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of ℓ1 andℓ2 penalties (elastic net), and employs warm starts to find a solution along a regularization path.

...read moreread less

Abstract: We introduce a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of l1 and l2 penalties (elastic net). Our algorithm fits via cyclical coordinate descent, and employs warm starts to find a solution along a regularization path. We demonstrate the efficacy of our algorithm on real and simulated data sets, and find considerable speedup between our algorithm and competing methods.

...read moreread less

1,579 citations

Journal Article•DOI•

Rcpp: Seamless R and C++ Integration

[...]

Dirk Eddelbuettel, Romain François

13 Apr 2011-Journal of Statistical Software

TL;DR: The Rcpp package simplifies integrating C++ code with R by providing a consistent C++ class hierarchy that maps various types of R objects to dedicated C++ classes.

...read moreread less

Abstract: The Rcpp package simplies integrating C++ code with R. It provides a consistent C++ class hierarchy that maps various types of R objects (vectors, matrices, functions, environments, . . . ) to dedicated C++ classes. Object interchange between R and C++ is managed by simple, exible and extensible concepts which include broad support for C++ Standard Template Library idioms. C++ code can both be compiled, linked and loaded on the y, or added via packages. Flexible error and exception code handling is provided. Rcpp substantially lowers the barrier for programmers wanting to combine C++ code with R.

...read moreread less

1,322 citations

Journal Article•DOI•

Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R

[...]

Jasjeet S. Sekhon

Distributed Lag Linear and Non-Linear Models in R: The Package dlnm

TL;DR: Matching as mentioned in this paper is an R package which provides functions for multivariate and propensity score matching and for finding optimal covariate balance based on a genetic search algorithm and a variety of univariate and multivariate metrics to determine if balance actually has been obtained.

...read moreread less

Abstract: Matching is an R package which provides functions for multivariate and propensity score matching and for finding optimal covariate balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance actually has been obtained are provided. The underlying matching algorithm is written in C++, makes extensive use of system BLAS and scales efficiently with dataset size. The genetic algorithm which finds optimal balance is parallelized and can make use of multiple CPUs or a cluster of computers. A large number of options are provided which control exactly how the matching is conducted and how balance is evaluated.

...read moreread less

1,184 citations

Journal Article•DOI•

[...]

Antonio Gasparrini¹•Institutions (1)

University of London¹

25 Jul 2011-Journal of Statistical Software

TL;DR: An overview of the capabilities of the DLNM package is offered, describing the conceptual and practical steps to specify and interpret DLNMs with an example of application to real data.

...read moreread less

Abstract: Distributed lag non-linear models (DLNMs) represent a modeling framework to flexibly describe associations showing potentially non-linear and delayed effects in time series data. This methodology rests on the definition of a crossbasis, a bi-dimensional functional space expressed by the combination of two sets of basis functions, which specify the relationships in the dimensions of predictor and lags, respectively. This framework is implemented in the R package dlnm, which provides functions to perform the broad range of models within the DLNM family and then to help interpret the results, with an emphasis on graphical representation. This paper offers an overview of the capabilities of the package, describing the conceptual and practical steps to specify and interpret DLNMs with an example of application to real data.

...read moreread less

1,028 citations

Journal Article•DOI•

poLCA: An R Package for Polytomous Variable Latent Class Analysis

[...]

Drew A. Linzer, Jeffrey B. Lewis

topicmodels: An R Package for Fitting Topic Models

TL;DR: poLCA is a software package for the estimation of latent class and latent class regression models for polytomous outcome variables, implemented in the R statistical computing environment using expectation-maximization and Newton-Raphson algorithms to find maximum likelihood estimates of the model parameters.

...read moreread less

Abstract: poLCA is a software package for the estimation of latent class and latent class regression models for polytomous outcome variables, implemented in the R statistical computing environment. Both models can be called using a single simple command line. The basic latent class model is a finite mixture model in which the component distributions are assumed to be multi-way cross-classification tables with all variables mutually independent. The latent class regression model further enables the researcher to estimate the effects of covariates on predicting latent class membership. poLCA uses expectation-maximization and Newton-Raphson algorithms to find maximum likelihood estimates of the model parameters.

...read moreread less

991 citations

Journal Article•DOI•

[...]

Bettina Grün, Kurt Hornik

09 May 2011-Journal of Statistical Software

TL;DR: The R package topicmodels provides basic infrastructure for fitting topic models based on data structures from the text mining package tm to estimate the similarity between documents as well as between a set of specified keywords using an additional layer of latent variables.

...read moreread less

Abstract: Topic models allow the probabilistic modeling of term frequency occurrences in documents. The fitted model can be used to estimate the similarity between documents as well as between a set of specified keywords using an additional layer of latent variables which are referred to as topics. The R package topicmodels provides basic infrastructure for fitting topic models based on data structures from the text mining package tm. The package includes interfaces to two algorithms for fitting topic models: the variational expectation-maximization algorithm provided by David M. Blei and co-authors and an algorithm using Gibbs sampling by Xuan-Hieu Phan and co-authors.

...read moreread less

Journal Article•DOI•

Multiple Imputation by Chained Equations (MICE): Implementation in Stata

[...]

Patrick Royston, Ian R. White

Multi-State Models for Panel Data: The msm Package for R

TL;DR: Ice is described, an implementation in Stata of the MICE approach to multiple imputation, and real data from an observational study in ovarian cancer is used to illustrate the most important of the many options available with ice.

...read moreread less

Abstract: Missing data are a common occurrence in real datasets. For epidemiological and prognostic factors studies in medicine, multiple imputation is becoming the standard route to estimating models with missing covariate data under a missing-at-random assumption. We describe ice, an implementation in Stata of the MICE approach to multiple imputation. Real data from an observational study in ovarian cancer are used to illustrate the most important of the many options available with ice. We remark briefly on the new database architecture and procedures for multiple imputation introduced in releases 11 and 12 of Stata.

...read moreread less

Journal Article•DOI•

[...]

Christopher Jackson

04 Jan 2011-Journal of Statistical Software

TL;DR: The range of Markov models and their extensions which can be fitted to panel-observed data, and their implementation in the msm package for R are reviewed, intended to be straightforward to use, flexible and comprehensively documented.

...read moreread less

Abstract: Panel data are observations of a continuous-time process at arbitrary times, for example, visits to a hospital to diagnose disease status. Multi-state models for such data are generally based on the Markov assumption. This article reviews the range of Markov models and their extensions which can be fitted to panel-observed data, and their implementation in the msm package for R. Transition intensities may vary between individuals, or with piecewise-constant time-dependent covariates, giving an inhomogeneous Markov model. Hidden Markov models can be used for multi-state processes which are misclassified or observed only through a noisy marker. The package is intended to be straightforward to use, flexible and comprehensively documented. Worked examples are given of the use of msm to model chronic disease progression and screening. Assessment of model fit, and potential future developments of the software, are also discussed.

...read moreread less

Journal Article•DOI•

Analyzing and Visualizing State Sequences in R with TraMineR

[...]

Alexis Gabadinho, Gilbert Ritschard, Nicolas S. Müller, Matthias Studer

Dates and Times Made Easy with lubridate

TL;DR: This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data and focuses more specifically on the analysis and rendering of state sequences.

...read moreread less

Abstract: This article describes the many capabilities offered by the TraMineR toolbox for categorical sequence data. It focuses more specifically on the analysis and rendering of state sequences. Addressed features include the description of sets of sequences by means of transversal aggregated views, the computation of longitudinal characteristics of individual sequences and the measure of pairwise dissimilarities. Special emphasis is put on the multiple ways of visualizing sequences. The core element of the package is the state se- quence object in which we store the set of sequences together with attributes such as the alphabet, state labels and the color palette. The functions can then easily retrieve this information to ensure presentation homogeneity across all printed and graphical displays. The article also demonstrates how TraMineR’s outcomes give access to advanced analyses such as clustering and statistical modeling of sequence data.

...read moreread less

Journal Article•DOI•

[...]

Garrett Grolemund, Hadley Wickham

Genetic Optimization Using Derivatives: The rgenoud Package for R

TL;DR: The lubridate package for R is presented, which facilitates working with dates and times and introduces a conceptual framework for arithmetic with date-times in R.

...read moreread less

Abstract: This paper presents the lubridate package for R, which facilitates working with dates and times. Date-times create various technical problems for the data analyst. The paper highlights these problems and offers practical advice on how to solve them using lubridate. The paper also introduces a conceptual framework for arithmetic with date-times in R.

...read moreread less

Journal Article•DOI•

[...]

Walter R. Mebane, Jasjeet S. Sekhon¹•Institutions (1)

University of California, Berkeley¹

Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box

TL;DR: This introduction to the R package rgenoud is a modied version of Mebane and Sekhon (2011), published in the Journal of Statistical Software and contains higher resolution gures.

...read moreread less

Abstract: This introduction to the R package rgenoud is a modied version of Mebane and Sekhon (2011), published in the Journal of Statistical Software. That version of the introduction contains higher resolution gures. genoud is an R function that combines evolutionary algorithm methods with a derivativebased (quasi-Newton) method to solve dicult optimization problems. genoud may also be used for optimization problems for which derivatives do not exist. genoud solves problems that are nonlinear or perhaps even discontinuous in the parameters of the function to be optimized. When the function to be optimized (for example, a log-likelihood) is nonlinear in the model’s parameters, the function will generally not be globally concave and may have irregularities such as saddlepoints or discontinuities. Optimization methods that rely on derivatives of the objective function may be unable to nd any optimum at all. Multiple local optima may exist, so that there is no guarantee that a derivative-based method will converge to the global optimum. On the other hand, algorithms that do not use derivative information (such as pure genetic algorithms) are for many problems needlessly poor at local hill climbing. Most statistical problems are regular in a neighborhood of the solution. Therefore, for some portion of the search space, derivative information is useful. The function supports parallel processing on multiple CPUs on a single machine or a cluster of computers.

...read moreread less

Journal Article•DOI•

[...]

Yu-Sung Su¹, Andrew Gelman², Jennifer Hill, Masanao Yajima³•Institutions (3)

Tsinghua University¹, Columbia University², University of California, Los Angeles³

MCMCpack: Markov chain Monte Carlo in R

TL;DR: The mi package in R has features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations, and uses Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models.

...read moreread less

Abstract: Our mi package in R has several features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations. These features include: choice of predictors, models, and transformations for chained imputation models; standard and binned residual plots for checking the fit of the conditional distributions used for imputation; and plots for comparing the distributions of observed and imputed data. In addition, we use Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models. Our goal is to have a demonstration package that (a) avoids many of the practical problems that arise with existing multivariate imputation programs, and (b) demonstrates state-of-the-art diagnostics that can be applied more generally and can be incorporated into the software of others.

...read moreread less

Journal Article•DOI•

[...]

Andrew D. Martin¹, Kevin M. Quinn², Jong Hee Park³•Institutions (3)

Washington University in St. Louis¹, University of California, Berkeley², University of Chicago³

lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations

TL;DR: MCpack is introduced, an R package that contains functions to perform Bayesian inference using posterior simulation for a number of statistical models, and some useful utility functions are introduced.

...read moreread less

Abstract: We introduce MCMCpack, an R package that contains functions to perform Bayesian inference using posterior simulation for a number of statistical models In addition to code that can be used to fit commonly used models, MCMCpack also contains some useful utility functions, including some additional density functions and pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization

...read moreread less

Journal Article•DOI•

[...]

Seung W. Choi¹, Laura E. Gibbons², Paul K. Crane²•Institutions (2)

Northwestern University¹, University of Washington²

01 Mar 2011-Journal of Statistical Software

TL;DR: The current investigation advances the technique by developing a computational platform integrating both statistical and IRT procedures into a single program, and a Monte Carlo simulation approach was incorporated to derive empirical criteria for various DIF statistics and effect size measures.

...read moreread less

Abstract: Logistic regression provides a flexible framework for detecting various types of differential item functioning (DIF). Previous efforts extended the framework by using item response theory (IRT) based trait scores, and by employing an iterative process using group--specific item parameters to account for DIF in the trait scores, analogous to purification approaches used in other DIF detection frameworks. The current investigation advances the technique by developing a computational platform integrating both statistical and IRT procedures into a single program. Furthermore, a Monte Carlo simulation approach was incorporated to derive empirical criteria for various DIF statistics and effect size measures. For purposes of illustration, the procedure was applied to data from a questionnaire of anxiety symptoms for detecting DIF associated with age from the Patient--Reported Outcomes Measurement Information System.

...read moreread less

Journal Article•DOI•

DEoptim: An R Package for Global Optimization by Differential Evolution

[...]

Katharine M. Mullen, David Ardia, David L. Gil, Donald Windover, James P. Cline - Show less +1 more

mstate: An R Package for the Analysis of Competing Risks and Multi-State Models

TL;DR: The R package DEoptim is described which implements the differential evolution algorithm for the global optimization of a real-valued function of areal-valued parameter vector and interfaces with C code for efficiency.

...read moreread less

Abstract: This article describes the R package DEoptim which implements the differential evolution algorithm for the global optimization of a real-valued function of a real-valued parameter vector. The implementation of differential evolution in DEoptim interfaces with C code for efficiency. The utility of the package is illustrated via case studies in fitting a Parratt model for X-ray reflectometry data and a Markov-Switching Generalized AutoRegressive Conditional Heteroskedasticity (MSGARCH) model for the returns of the Swiss Market Index.

...read moreread less

Journal Article•DOI•

[...]

Liesbeth C. de Wreede, Marta Fiocco, Hein Putter

04 Jan 2011-Journal of Statistical Software

TL;DR: The mstate package in R is developed, which covers all steps of the analysis of multi-state models, from model building and data preparation to estimation and graphical representation of the results, and is suitable for non- and semi-parametric (Cox) models.

...read moreread less

Abstract: Multi-state models are a very useful tool to answer a wide range of questions in survival analysis that cannot, or only in a more complicated way, be answered by classical models. They are suitable for both biomedical and other applications in which time-to-event variables are analyzed. However, they are still not frequently applied. So far, an important reason for this has been the lack of available software. To overcome this problem, we have developed the mstate package in R for the analysis of multi-state models. The package covers all steps of the analysis of multi-state models, from model building and data preparation to estimation and graphical representation of the results. It can be applied to non- and semi-parametric (Cox) models. The package is also suitable for competing risks models, as they are a special category of multi-state models. This article offers guidelines for the actual use of the software by means of an elaborate multi-state analysis of data describing post-transplant events of patients with blood cancer. The data have been provided by the EBMT (the European Group for Blood and Marrow Transplantation). Special attention will be paid to the modeling of different covariate effects (the same for all transitions or transition-specific) and different baseline hazard assumptions (different for all transitions or equal for some).

...read moreread less

Journal Article•DOI•

REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types

[...]

James R. Carpenter¹, Harvey Goldstein¹, Michael G. Kenward•Institutions (1)

University of Bristol¹

Unifying Optimization Algorithms to Aid Software System Users: optimx for R

TL;DR: The REALCOM-IMPUTE software performs multilevel multiple imputation, and handles ordinal and unordered categorical data appropriately, and may be used either as a standalone package, or in conjunction with the multileVEL software MLwiN or Stata.

...read moreread less

Abstract: Multiple imputation is becoming increasingly established as the leading practical approach to modelling partially observed data, under the assumption that the data are missing at random. However, many medical and social datasets are multilevel, and this structure should be reected not only in the model of interest, but also in the imputation model. In particular, the imputation model should reect the dierences between level 1 variables and level 2 variables (which are constant across level 1 units). This led us to develop the REALCOM-IMPUTE software, which we describe in this article. This software performs multilevel multiple imputation, and handles ordinal and unordered categorical data appropriately. It is freely available on-line, and may be used either as a standalone package, or in conjunction with the multilevel software MLwiN or Stata.

...read moreread less

Journal Article•DOI•

[...]

John C. Nash¹, Ravi Varadhan•Institutions (1)

University of Ottawa¹

24 Aug 2011-Journal of Statistical Software

TL;DR: This work attempts to provide some diagnostic information about the function, its scaling and parameter bounds, and the solution characteristics of optimx, a wrapper to consolidate many of these choices for the optimization of functions that are mostly smooth with parameters at most bounds-constrained.

...read moreread less

Abstract: R users can often solve optimization tasks easily using the tools in the optim function in the stats package provided by default on R installations. However, there are many other optimization and nonlinear modelling tools in R or in easily installed add-on packages. These present users with a bewildering array of choices. optimx is a wrapper to consolidate many of these choices for the optimization of functions that are mostly smooth with parameters at most bounds-constrained. We attempt to provide some diagnostic information about the function, its scaling and parameter bounds, and the solution characteristics. optimx runs a battery of methods on a given problem, thus facilitating comparative studies of optimization algorithms for the problem at hand. optimx can also be a useful pedagogical tool for demonstrating the strengths and pitfalls of different classes of optimization approaches including Newton, gradient, and derivative-free methods.

...read moreread less

Journal Article•DOI•

Exploratory Multivariate Analysis by Example Using R

[...]

Gary Evans

20 Apr 2011-Journal of Statistical Software

Journal Article•DOI•

Synth: An R Package for Synthetic Control Methods in Comparative Case Studies

[...]

Alberto Abadie, Alexis Diamond, Jens Hainmueller

Multiple Imputation Using SAS Software

TL;DR: The R package Synth as discussed by the authors implements synthetic control methods for comparative case studies designed to estimate the causal effects of policy interventions and other events of interest (Abadie and Gardeazabal 2003; Abadie, Diamond, and Hainmueller 2010).

...read moreread less

Abstract: The R package Synth implements synthetic control methods for comparative case studies designed to estimate the causal effects of policy interventions and other events of interest (Abadie and Gardeazabal 2003; Abadie, Diamond, and Hainmueller 2010). These techniques are particularly well-suited to investigate events occurring at an aggregate level (i.e., countries, cities, regions, etc.) and affecting a relatively small number of units. Benefits and features of the Synth package are illustrated using data from Abadie and Gardeazabal (2003), which examined the economic impact of the terrorist conflict in the Basque Country.

...read moreread less

Journal Article•DOI•

[...]

Yang Yuan

ipw: An R Package for Inverse Probability Weighting

TL;DR: This paper presents the SAS/STAT MI and MIANALYZE procedures, which perform inference by multiple imputation under numerous settings, and implements popular methods for creating imputations under monotone and nonmonotone (arbitrary) patterns of missing data.

...read moreread less

Abstract: Multiple imputation provides a useful strategy for dealing with data sets that have missing values. Instead of filling in a single value for each missing value, a multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard procedures for complete data and combining the results from these analyses. No matter which complete-data analysis is used, the process of combining results of parameter estimates and their associated standard errors from different imputed data sets is essentially the same. This process results in valid statistical inferences that properly reflect the uncertainty due to missing values. This paper reviews methods for analyzing missing data and applications of multiple imputation techniques. This paper presents the SAS/STAT MI and MIANALYZE procedures, which perform inference by multiple imputation under numerous settings. PROC MI implements popular methods for creating imputations under monotone and nonmonotone (arbitrary) patterns of missing data, and PROC MIANALYZE analyzes results from multiply imputed data sets.

...read moreread less

Journal Article•DOI•

[...]

Willem M. van der Wal, Ronald B. Geskus¹•Institutions (1)

University of Amsterdam¹

14 Sep 2011-Journal of Statistical Software

TL;DR: In this article, the authors describe the R package ipw for estimating inverse probability weights, which can be used with binomial, categorical, ordinal and continuous exposure variables, and show how to use the package to fit marginal structural models through inverse probability weighting, to estimate causal effects.

...read moreread less

Abstract: We describe the R package ipw for estimating inverse probability weights. We show how to use the package to fit marginal structural models through inverse probability weighting, to estimate causal effects. Our package can be used with data from a point treatment situation as well as with a time-varying exposure and time-varying confounders. It can be used with binomial, categorical, ordinal and continuous exposure variables.

...read moreread less

Journal Article•DOI•

The estimation of item response models with the lmer function from the lme4 package in R

[...]

Paul De Boeck, Marjan Bakker, Robert J. Zwitser, Michel G. Nivard, Abe D. Hofman, Francis Tuerlinckx, Ivailo Partchev - Show less +3 more

09 Mar 2011-Journal of Statistical Software

TL;DR: In this article, the potential of the lmer function from the lme4 package in R for item response (IRT) modeling is discussed, and three broad categories of models are described: item covariate models, person covariate model, and person-by-item model.

...read moreread less

Abstract: In this paper we elaborate on the potential of the lmer function from the lme4 package in R for item response (IRT) modeling. In line with the package, an IRT framework is described based on generalized linear mixed modeling. The aspects of the framework refer to (a) the kind of covariates -- their mode (person, item, person-by-item), and their being external vs. internal to responses, and (b) the kind of effects the covariates have -- fixed vs. random, and if random, the mode across which the effects are random (persons, items). Based on this framework, three broad categories of models are described: Item covariate models, person covariate models, and person-by-item covariate models, and within each category three types of more specific models are discussed. The models in question are explained and the associated lmer code is given. Examples of models are the linear logistic test model with an error term, differential item functioning models, and local item dependency models. Because the lme4 package is for univariate generalized linear mixed models, neither the two-parameter, and three-parameter models, nor the item response models for polytomous response data, can be estimated with the lmer function.

...read moreread less

Journal Article•DOI•

DPpackage: Bayesian Semi- and Nonparametric Modeling in R

[...]

Alejandro Jara, Timothy Hanson¹, Fernando A. Quintana, Peter Müller², Gary L. Rosner³ - Show less +1 more•Institutions (3)

University of South Carolina¹, University of Texas at Austin², Johns Hopkins University³