scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Statistical Software in 2017"


Journal ArticleDOI
TL;DR: The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects, and implementing the Satterthwaite's method for approximating degrees of freedom for the t and F tests.
Abstract: One of the frequent questions by users of the mixed model function lmer of the lme4 package has been: How can I get p values for the F and t tests for objects returned by lmer? The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects. We have implemented the Satterthwaite's method for approximating degrees of freedom for the t and F tests. We have also implemented the construction of Type I - III ANOVA tables. Furthermore, one may also obtain the summary as well as the anova table using the Kenward-Roger approximation for denominator degrees of freedom (based on the KRmodcomp function from the pbkrtest package). Some other convenient mixed model analysis tools such as a step method, that performs backward elimination of nonsignificant effects - both random and fixed, calculation of population means and multiple comparison tests together with plot facilities are provided by the package as well.

12,305 citations


Journal ArticleDOI
TL;DR: Stan as discussed by the authors is a probabilistic programming language for specifying statistical models, where a program imperatively defines a log probability function over parameters conditioned on specified data and constants, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration.
Abstract: Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

4,947 citations


Journal ArticleDOI
TL;DR: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan, allowing users to fit linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multileVEL context.
Abstract: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan. A wide range of distributions and link functions are supported, allowing users to fit - among others - linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multilevel context. Further modeling options include autocorrelation of the response variable, user defined covariance structures, censored data, as well as meta-analytic standard errors. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fit can easily be assessed and compared with the Watanabe-Akaike information criterion and leave-one-out cross-validation.

4,353 citations


Journal ArticleDOI
TL;DR: It is shown that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.
Abstract: We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.

1,512 citations


Journal ArticleDOI
TL;DR: This paper constitutes a companion paper to the R package lcmm by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.
Abstract: The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event outcome that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.

470 citations



Journal ArticleDOI
TL;DR: The gdistance as discussed by the authors package provides classes and functions to calculate various distance measures and routes in heterogeneous geographic spaces represented as grids, including least-cost distances as well as more complex distances based on (constrained) random walks.
Abstract: The R package gdistance provides classes and functions to calculate various distance measures and routes in heterogeneous geographic spaces represented as grids. Least-cost distances as well as more complex distances based on (constrained) random walks can be calculated. Also the corresponding routes or probabilities of passing each cell can be determined. The package implements classes to store the data about the probability or cost of transitioning from one cell to another on a grid in a memory-efficient sparse format. These classes make it possible to manipulate the values of cell-to-cell movement directly, which offers flexibility and the possibility to use asymmetric values. The novel distances implemented in the package are used in geographical genetics (applying circuit theory), but also have applications in other fields of geospatial analysis.

341 citations


Journal ArticleDOI
TL;DR: The tree-based TVCM algorithm and its implementation in the R package vcrpart are introduced for generalized linear models to learn whether and how the coefficients of a regression model vary by moderating variables.
Abstract: The tree-based TVCM algorithm and its implementation in the R package vcrpart are introduced for generalized linear models. The purpose of TVCM is to learn whether and how the coefficients of a regression model vary by moderating variables. A separate partition is built for each potentially varying coefficient, allowing the user to specify coefficient-specific sets of potential moderators, and allowing the algorithm to select moderators individually by coefficient. In addition to describing the algorithm, the TVCM is evaluated using a benchmark comparison and a simulation study and the R commands are demonstrated by means of empirical applications.

264 citations


Journal ArticleDOI
TL;DR: An arm-based network meta-analysis method has been proposed, and the R package pcnetmeta provides user-friendly functions for its implementation, which estimates both absolute and relative effects, and can handle binary, continuous, and count outcomes.
Abstract: Network meta-analysis is a powerful approach for synthesizing direct and indirect evidence about multiple treatment comparisons from a collection of independent studies. At present, the most widely used method in network meta-analysis is contrast-based, in which a baseline treatment needs to be specified in each study, and the analysis focuses on modeling relative treatment effects (typically log odds ratios). However, population-averaged treatment-specific parameters, such as absolute risks, cannot be estimated by this method without an external data source or a separate model for a reference treatment. Recently, an arm-based network meta-analysis method has been proposed, and the R package pcnetmeta provides user-friendly functions for its implementation. This package estimates both absolute and relative effects, and can handle binary, continuous, and count outcomes.

189 citations


Journal ArticleDOI
TL;DR: The ASMap linkage map construction R package is introduced which contains functions that use the efficient MSTmap algorithm for clustering and optimally ordering large sets of markers.
Abstract: Although various forms of linkage map construction software are widely available, there is a distinct lack of packages for use in the R statistical computing environment (R Core Team 2017). This article introduces the ASMap linkage map construction R package which contains functions that use the efficient MSTmap algorithm (Wu, Bhat, Close, and Lonardi 2008) for clustering and optimally ordering large sets of markers. Additional to the construction functions, the package also contains a suite of tools to assist in the rapid diagnosis and repair of a constructed linkage map. The package functions can also be used for post linkage map construction techniques such as fine mapping or combining maps of the same population. To showcase the efficiency and functionality of ASMap, the complete linkage map construction process is demonstrated with a high density barley backcross marker data set.

185 citations


Journal ArticleDOI
TL;DR: A new package in R is described that may be used to generate the half-normal plot with a simulated envelope for residuals from different types of models, illustrating its use on a range of examples, including continuous and discrete responses, and how it can beused to inform model selection and diagnose overdispersion.
Abstract: Count and proportion data may present overdispersion, i.e., greater variability than expected by the Poisson and binomial models, respectively. Different extended generalized linear models that allow for overdispersion may be used to analyze this type of data, such as models that use a generalized variance function, random-effects models, zero-inflated models and compound distribution models. Assessing goodness-of-fit and verifying assumptions of these models is not an easy task and the use of half-normal plots with a simulated envelope is a possible solution for this problem. These plots are a useful indicator of goodness-of-fit that may be used with any generalized linear model and extensions. For GLIM users, functions that generated these plots were widely used, however, in the open-source software R, these functions were not yet available on the Comprehensive R Archive Network (CRAN). We describe a new package in R, hnp, that may be used to generate the half-normal plot with a simulated envelope for residuals from different types of models. The function hnp() can be used together with a range of different model fitting packages in R that extend the basic generalized linear model fitting in glm() and is written so that it is relatively easy to extend it to new model classes and different diagnostics. We illustrate its use on a range of examples, including continuous and discrete responses, and show how it can be used to inform model selection and diagnose overdispersion.

Journal ArticleDOI
TL;DR: Package httk enables the inclusion of toxicokinetics in the statistical analysis of chemicals undergoing high-throughput screening, and contains tools for Monte Carlo sampling and reverse dosimetry along with functions for the analysis of concentration vs. time simulations.
Abstract: Thousands of chemicals have been profiled by high-throughput screening programs such as ToxCast and Tox21; these chemicals are tested in part because most of them have limited or no data on hazard, exposure, or toxicokinetics. Toxicokinetic models aid in predicting tissue concentrations resulting from chemical exposure, and a "reverse dosimetry" approach can be used to predict exposure doses sufficient to cause tissue concentrations that have been identified as bioactive by high-throughput screening. We have created four toxicokinetic models within the R software package httk. These models are designed to be parameterized using high-throughput in vitro data (plasma protein binding and hepatic clearance), as well as structure-derived physicochemical properties and species-specific physiological data. The package contains tools for Monte Carlo sampling and reverse dosimetry along with functions for the analysis of concentration vs. time simulations. The package can currently use human in vitro data to make predictions for 553 chemicals in humans, rats, mice, dogs, and rabbits, including 94 pharmaceuticals and 415 ToxCast chemicals. For 67 of these chemicals, the package includes rat-specific in vitro data. This package is structured to be augmented with additional chemical data as they become available. Package httk enables the inclusion of toxicokinetics in the statistical analysis of chemicals undergoing high-throughput screening.

Journal ArticleDOI
TL;DR: The medflex package offers a set of ready-made functions for fitting natural effect models, a novel class of causal models which directly parameterize the path-specific effects of interest, thereby adding flexibility to existing software packages for mediation analysis, in particular with respect to hypothesis testing and parsimony.
Abstract: Mediation analysis is routinely adopted by researchers from a wide range of applied disciplines as a statistical tool to disentangle the causal pathways by which an exposure or treatment affects an outcome. The counterfactual framework provides a language for clearly defining path-specific effects of interest and has fostered a principled extension of mediation analysis beyond the context of linear models. This paper describes medflex, an R package that implements some recent developments in mediation analysis embedded within the counterfactual framework. The medflex package offers a set of ready-made functions for fitting natural effect models, a novel class of causal models which directly parameterize the path-specific effects of interest, thereby adding flexibility to existing software packages for mediation analysis, in particular with respect to hypothesis testing and parsimony. In this paper, we give a comprehensive overview of the functionalities of the medflex package.

Journal ArticleDOI
TL;DR: Gmnl as mentioned in this paper is a package for estimating multinomial logit models with unobserved heterogeneity across individuals for cross-sectional and panel (longitudinal) data, by allowing the parameters to vary randomly over individuals according to a continuous, discrete, or discrete-continuous mixture distribution, which must be chosen a priori by the researcher.
Abstract: This paper introduces the package gmnl in R for estimation of multinomial logit models with unobserved heterogeneity across individuals for cross-sectional and panel (longitudinal) data. Unobserved heterogeneity is modeled by allowing the parameters to vary randomly over individuals according to a continuous, discrete, or discrete-continuous mixture distribution, which must be chosen a priori by the researcher. In particular, the models supported by gmnl are the multinomial or conditional logit, the mixed multinomial logit, the scale heterogeneity multinomial logit, the generalized multinomial logit, the latent class logit, and the mixed-mixed multinomial logit. These models are estimated using either the maximum likelihood estimator or the maximum simulated likelihood estimator. This article describes and illustrates with real databases all functionalities of gmnl, including the derivation of individual conditional estimates of both the random parameters and willingness-to-pay measures.

Journal ArticleDOI
TL;DR: This paper is a guide to the spatio-temporal modeling of epidemic phenomena, exemplified by analyses of public health surveillance data on measles and invasive meningococcal disease.
Abstract: The availability of geocoded health data and the inherent temporal structure of communicable diseases have led to an increased interest in statistical models and software for spatio-temporal data with epidemic features. The open source R package surveillance can handle various levels of aggregation at which infective events have been recorded: individual-level time-stamped geo-referenced data (case reports) in either continuous space or discrete space, as well as counts aggregated by period and region. For each of these data types, the surveillance package implements tools for visualization, likelihoood inference and simulation from recently developed statistical regression frameworks capturing endemic and epidemic dynamics. Altogether, this paper is a guide to the spatio-temporal modeling of epidemic phenomena, exemplified by analyses of public health surveillance data on measles and invasive meningococcal disease.

Journal ArticleDOI
TL;DR: The R package tscount provides likelihood-based estimation methods for analysis and modeling of count time series following generalized linear models, a flexible class of models which can describe serial correlation in a parsimonious way.
Abstract: The R package tscount provides likelihood-based estimation methods for analysis and modeling of count time series following generalized linear models. This is a flexible class of models which can describe serial correlation in a parsimonious way. The conditional mean of the process is linked to its past values, to past observations and to potential covariate effects. The package allows for models with the identity and with the logarithmic link function. The conditional distribution can be Poisson or negative binomial. An important special case of this class is the so-called INGARCH model and its log-linear extension. The package includes methods for model fitting and assessment, prediction and intervention analysis. This paper summarizes the theoretical background of these methods. It gives details on the implementation of the package and provides simulation results for models which have not been studied theoretically before. The usage of the package is illustrated by two data examples. Additionally, we provide a review of R packages which can be used for count time series analysis. This includes a detailed comparison of tscount to those packages.

Journal ArticleDOI
TL;DR: In this article, an R package for continuous time structural equation modeling of panel (N > 1) and time series (N = 1) data, using full information maximum likelihood, is presented.
Abstract: We introduce ctsem, an R package for continuous time structural equation modeling of panel (N > 1) and time series (N = 1) data, using full information maximum likelihood. Most dynamic models (e.g., cross-lagged panel models) in the social and behavioural sciences are discrete time models. An assumption of discrete time models is that time intervals between measurements are equal, and that all subjects were assessed at the same intervals. Violations of this assumption are often ignored due to the difficulty of accounting for varying time intervals, therefore parameter estimates can be biased and the time course of effects becomes ambiguous. By using stochastic differential equations to estimate an underlying continuous process, continuous time models allow for any pattern of measurement occasions. By interfacing to OpenMx, ctsem combines the flexible specification of structural equation models with the enhanced data gathering opportunities and improved estimation of continuous time models. ctsem can estimate relationships over time for multiple latent processes, measured by multiple noisy indicators with varying time intervals between observations. Within and between effects are estimated simultaneously by modeling both observed covariates and unobserved heterogeneity. Exogenous shocks with different shapes, group differences, higher order diffusion effects and oscillating processes can all be simply modeled. We first introduce and define continuous time models, then show how to specify and estimate a range of continuous time models using ctsem.

Journal ArticleDOI
TL;DR: The new R package that was created to address the shortcomings of existing tools has functions to create informative ROC curve plots, with sensible defaults and a simple interface, for use in print or as an interactive web-based plot.
Abstract: Plots of the receiver operating characteristic (ROC) curve are ubiquitous in medical research. Designed to simultaneously display the operating characteristics at every possible value of a continuous diagnostic test, ROC curves are used in oncology to evaluate screening, diagnostic, prognostic and predictive biomarkers. I reviewed a sample of ROC curve plots from the major oncology journals in order to assess current trends in usage and design elements. My review suggests that ROC curve plots are often ineffective as statistical charts and that poor design obscures the relevant information the chart is intended to display. I describe my new R package that was created to address the shortcomings of existing tools. The package has functions to create informative ROC curve plots, with sensible defaults and a simple interface, for use in print or as an interactive web-based plot. A web application was developed to reach a broader audience of scientists who do not use R.

Journal ArticleDOI
TL;DR: The SES algorithm, its implementation, and examples of use of the SES function in R are presented and initial evidence that the two methods perform comparably well in terms of predictive accuracy is provided.
Abstract: The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constraint-based learning of Bayesian networks. Most of the currently available feature selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. In that respect the SES algorithm subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. The SES algorithm is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.

Journal ArticleDOI
TL;DR: npmv as discussed by the authors performs nonparametric inference for the comparison of multivariate data samples and provides the results in easy-to-understand, but statistically correct, language, and can be used for low or highdimensional data with small or with large sample sizes and many or few factor levels.
Abstract: We introduce the R package npmv that performs nonparametric inference for the comparison of multivariate data samples and provides the results in easy-to-understand, but statistically correct, language. Unlike in classical multivariate analysis of variance, multivariate normality is not required for the data. In fact, the different response variables may even be measured on different scales (binary, ordinal, quantitative). p values are calculated for overall tests (permutation tests and F approximations), and, using multiple testing algorithms which control the familywise error rate, significant subsets of response variables and factor levels are identified. The package may be used for low- or highdimensional data with small or with large sample sizes and many or few factor levels.

Journal ArticleDOI
TL;DR: The R package icenReg is introduced which contains fast, reliable algorithms for fitting non-parametric maximum likelihood estimator and semi- Parametric regression models, which are fundamental estimators for interval censored data.
Abstract: The non-parametric maximum likelihood estimator and semi-parametric regression models are fundamental estimators for interval censored data, along with standard fullyparametric regression models. The R package icenReg is introduced which contains fast, reliable algorithms for fitting these models. In addition, the package contains functions for imputation of the censored response variables and diagnostics of both regression effects and baseline distribution.

Journal ArticleDOI
TL;DR: This paper illustrates how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed, and describes a software architecture and framework that can be used to parallelise constraint-based structure learning algorithms.
Abstract: It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: Its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios. Efficient implementations of score-based structure learning benefit from past and current research in optimization theory, which can be adapted to the task by using the network score as the objective function to maximize. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimization in widespread use, backtracking, leverages the symmetries implied by the definitions of neighborhood and Markov blanket. In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelize constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.

Journal ArticleDOI
TL;DR: This paper presents the R packages JADE and BSSasymp, which contain functions for computing the asymptotic covariance matrices as well as their data-based estimates for most of the BSS estimators included in package JADE.
Abstract: Blind source separation (BSS) is a well-known signal processing tool which is used to solve practical data analysis problems in various fields of science. In BSS, we assume that the observed data consists of linear mixtures of latent variables. The mixing system and the distributions of the latent variables are unknown. The aim is to find an estimate of an unmixing matrix which then transforms the observed data back to latent sources. In this paper we present the R packages JADE and BSSasymp. The package JADE offers several BSS methods which are based on joint diagonalization. Package BSSasymp contains functions for computing the asymptotic covariance matrices as well as their data-based estimates for most of the BSS estimators included in package JADE. Several simulated and real datasets are used to illustrate the functions in these two packages.

Journal ArticleDOI
TL;DR: The methodology and implementation of the functions to estimate the main econometric methods of balanced and unbalanced panel data analysis are described and their use with well-known examples are illustrated.
Abstract: __Panel Data Toolbox__ is a new package for __MATLAB__ that includes functions to estimate the main econometric methods of balanced and unbalanced panel data analysis. The package includes code for the standard fixed, between and random effects estimation methods, as well as for the existing instrumental panels and a wide array of spatial panels. A full set of relevant tests is also included. This paper describes the methodology and implementation of the functions and illustrates their use with well-known examples. We perform numerical checks against other popular commercial and free software to show the validity of the results

Journal ArticleDOI
TL;DR: The saemix package for R provides maximum likelihood estimates of parameters in nonlinear mixed effect models, using a modern and efficient estimation algorithm, the stochastic approximation expectation maximisation (SAEM) algorithm, in a new estimation tool to the R community.
Abstract: The saemix package for R provides maximum likelihood estimates of parameters in nonlinear mixed effect models, using a modern and efficient estimation algorithm, the stochastic approximation expectation-maximisation (SAEM) algorithm. In the present paper we describe the main features of the package, and apply it to several examples to illustrate its use. Making use of S4 classes and methods to provide user-friendly interaction, this package provides a new estimation tool to the R community.

Journal ArticleDOI
TL;DR: A new R package, PrevMap, is introduced for the analysis of spatially referenced prevalence data, including both classical maximum likelihood and Bayesian approaches to parameter estimation and plug-in or Bayesian prediction, based on two distinct approaches.
Abstract: In this paper we introduce a new R package, PrevMap, for the analysis of spatially referenced prevalence data, including both classical maximum likelihood and Bayesian approaches to parameter estimation and plug-in or Bayesian prediction. More specifically, the new package implements fitting of geostatistical models for binomial data, based on two distinct approaches. The first approach uses a generalized linear mixed model with logistic link function, binomial error distribution and a Gaussian spatial process as a stochastic component in the linear predictor. A simpler, but approximate, alternative approach consists of fitting a linear Gaussian model to empirical-logit-transformed data. The package also includes implementations of convolution-based low-rank approximations to the Gaussian spatial process to enable computationally efficient analysis of large spatial datasets. We illustrate the use of the package through the analysis of Loa loa prevalence data from Cameroon and Nigeria. We illustrate the use of the low rank approximation using a simulated geostatistical dataset.

Journal ArticleDOI
TL;DR: The ltmle package provides methods to estimate intervention-specific means and measures of association including the average treatment effect, causal odds ratio and causal risk ratio and parameters of a longitudinal working marginal structural model.
Abstract: In recent years, targeted minimum loss-based estimation methodology has been used to develop estimators of parameters in longitudinal data structures (Gruber and van der Laan 2012; Petersen, Schwab, Gruber, Blaser, Schomaker, and van der Laan 2014; Schnitzer, Moodie, van der Laan, Platt, and Klein 2013). These methods are implemented in the ltmle package for R. The ltmle package provides methods to estimate intervention-specific means and measures of association including the average treatment effect, causal odds ratio and causal risk ratio and parameters of a longitudinal working marginal structural model. The package allows for multiple time point treatments, time-varying covariates and right censoring of the outcome. In this paper we described the usage of the ltmle package and provide examples.

Journal ArticleDOI
TL;DR: A framework based on high-level wrapper functions for most common usage and basic computational elements to be combined at will, coupling user-friendliness with flexibility, is integrated in the plm package for panel data econometrics in R.
Abstract: The different robust estimators for the standard errors of panel models used in applied econometric practice can all be written and computed as combinations of the same simple building blocks. A framework based on high-level wrapper functions for most common usage and basic computational elements to be combined at will, coupling user-friendliness with flexibility, is integrated in the plm package for panel data econometrics in R. Statistical motivation and computational approach are reviewed, and applied examples are provided.

ReportDOI
TL;DR: An implementation contained in the R package REBayes is described with applications to a wide variety of mixture settings: Gaussian location and scale, Poisson and binomial mixtures for discrete data, Weibull and Gompertz models for survival data, and several Gaussian models intended for longitudinal data.
Abstract: Models of unobserved heterogeneity, or frailty as it is commonly known in survival analysis, can often be formulated as semiparametric mixture models and estimated by maximum likelihood as proposed by Robbins (1950) and elaborated by Kiefer and Wolfowitz (1956). Recent developments in convex optimization, as noted by Koenker and Mizera (2014b), have led to dramatic improvements in computational methods for such models. In this vignette we describe an implementation contained in the R package REBayes with applications to a wide variety of mixture settings: Gaussian location and scale, Poisson and binomial mixtures for discrete data, Weibull and Gompertz models for survival data, and several Gaussian models intended for longitudinal data. While the dimension of the nonparametric heterogeneity of these models is inherently limited by our present gridding strategy, we describe how additional fixed parameters can be relatively easily accommodated via profile likelihood. We also describe some nonparametric maximum likelihood methods for shape and norm constrained density estimation that employ related computational methods.

Journal ArticleDOI
TL;DR: The R package KFAS is described for state space modeling with the observations from an exponential family, namely Gaussian, Poisson, binomial, negative binomial and gamma distributions.
Abstract: State space modeling is an efficient and flexible method for statistical inference of a broad class of time series and other data. This paper describes the R package KFAS for state space modeling w ...