scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Statistical Software in 2014"


Journal ArticleDOI
TL;DR: The mediation package implements a comprehensive suite of statistical tools for conducting causal mediation analysis in applied empirical research and implements a statistical method for dealing with multiple (causally dependent) mediators, which are often encountered in practice.
Abstract: In this paper, we describe the R package mediation for conducting causal mediation analysis in applied empirical research. In many scientific disciplines, the goal of researchers is not only estimating causal effects of a treatment but also understanding the process in which the treatment causally affects the outcome. Causal mediation analysis is frequently used to assess potential causal mechanisms. The mediation package implements a comprehensive suite of statistical tools for conducting such an analysis. The package is organized into two distinct approaches. Using the model-based approach, researchers can estimate causal mediation effects and conduct sensitivity analysis under the standard research design. Furthermore, the design-based approach provides several analysis tools that are applicable under different experimental designs. This approach requires weaker assumptions than the model-based approach. We also implement a statistical method for dealing with multiple (causally dependent) mediators, which are often encountered in practice. Finally, the package also offers a methodology for assessing causal mediation in the presence of treatment noncompliance, a common problem in randomized trials.

2,417 citations


Journal ArticleDOI
TL;DR: The R package NbClust provides 30 indices which determine the number of clusters in a data set and it offers also the best clustering scheme from different results to the user.
Abstract: Clustering is the partitioning of a set of objects into groups (clusters) so that objects within a group are more similar to each others than objects in different groups. Most of the clustering algorithms depend on some assumptions in order to define the subgroups present in a data set. As a consequence, the resulting clustering scheme requires some sort of evaluation as regards its validity. The evaluation procedure has to tackle difficult problems such as the quality of clusters, the degree with which a clustering scheme fits a specific data set and the optimal number of clusters in a partitioning. In the literature, a wide variety of indices have been proposed to find the optimal number of clusters in a partitioning of a data set during the clustering process. However, for most of indices proposed in the literature, programs are unavailable to test these indices and compare them. The R package NbClust has been developed for that purpose. It provides 30 indices which determine the number of clusters in a data set and it offers also the best clustering scheme from different results to the user. In addition, it provides a function to perform k-means and hierarchical clustering with different distance measures and aggregation methods. Any combination of validation indices and clustering methods can be requested in a single function call. This enables the user to simultaneously evaluate several clustering schemes while varying the number of clusters, to help determining the most appropriate number of clusters for the data set of interest.

1,912 citations


Journal ArticleDOI
TL;DR: The pbkrtest package as discussed by the authors implements two alternatives to such approximate?2 tests: the package implements (1) a Kenward-Roger approximation for performing F tests for reduction of the mean structure and (2) parametric bootstrap methods for achieving the same goal.
Abstract: When testing for reduction of the mean value structure in linear mixed models, it is common to use an asymptotic ?2 test. Such tests can, however, be very poor for small and moderate sample sizes. The pbkrtest package implements two alternatives to such approximate ?2 tests: The package implements (1) a Kenward-Roger approximation for performing F tests for reduction of the mean structure and (2) parametric bootstrap methods for achieving the same goal. The implementation is focused on linear mixed models with independent residual errors. In addition to describing the methods and aspects of their implementation, the paper also contains several examples and a comparison of the various methods.

1,072 citations


Journal ArticleDOI
TL;DR: The changepoint package has been developed to provide users with a choice of multiple changepoint search methods to use in conjunction with a given changepoint method and in particular provides an implementation of the recently proposed PELT algorithm.
Abstract: One of the key challenges in changepoint analysis is the ability to detect multiple changes within a given time series or sequence. The changepoint package has been developed to provide users with a choice of multiple changepoint search methods to use in conjunction with a given changepoint method and in particular provides an implementation of the recently proposed PELT algorithm. This article describes the search methods which are implemented in the package as well as some of the available test statistics whilst highlighting their application with simulated and practical examples. Particular emphasis is placed on the PELT algorithm and how results differ from the binary segmentation approach.

1,068 citations


Journal ArticleDOI
TL;DR: This paper introduces an R package, known as OptimalCutpoints, for selecting optimal cutpoints in diagnostic tests, which incorporates criteria that take the costs of the different diagnostic decisions into account, as well as the prevalence of the target disease and several methods based on measures of diagnostic test accuracy.
Abstract: Continuous diagnostic tests are often used for discriminating between healthy and diseased populations. For the clinical application of such tests, it is useful to select a cutpoint or discrimination value c that defines positive and negative test results. In general, individuals with a diagnostic test value of c or higher are classified as diseased. Several search strategies have been proposed for choosing optimal cutpoints in diagnostic tests, depending on the underlying reason for this choice. This paper introduces an R package, known as OptimalCutpoints, for selecting optimal cutpoints in diagnostic tests. It incorporates criteria that take the costs of the different diagnostic decisions into account, as well as the prevalence of the target disease and several methods based on measures of diagnostic test accuracy. Moreover, it enables optimal levels to be calculated according to levels of given (categorical) covariates. While the numerical output includes the optimal cutpoint values and associated accuracy measures with their confidence intervals, the graphical output includes the receiver operating characteristic (ROC) and predictive ROC curves. An illustration of the use of OptimalCutpoints is provided, using a real biomedical dataset.

467 citations


Journal ArticleDOI
TL;DR: Momocs, a package intended to ease and popularize modern morphometrics with R, and particularly outline analysis, which aims to extract quantitative variables from shapes, is introduced.
Abstract: We introduce here Momocs, a package intended to ease and popularize modern mor- phometrics with R, and particularly outline analysis, which aims to extract quantitative variables from shapes. It mostly hinges on the functions published in the book entitled Modern Morphometrics Using R by Claude (2008). From outline extraction from raw data to multivariate analysis, Momocs provides an integrated and convenient toolkit to students and researchers who are, or may become, interested in describing the shape and its variation. The methods implemented so far in Momocs are introduced through a simplistic case study that aims to test if two sets of bottles have different shapes.

458 citations


Journal ArticleDOI
TL;DR: The R package lavaan.survey provides several features such as SEMs with replicate weights, a variety of resampling techniques for complex samples, and finite population corrections, features that should prove useful for SEM practitioners faced with the common situation of a sample that is not iid.
Abstract: This paper introduces the R package lavaan.survey, a user-friendly interface to design-based complex survey analysis of structural equation models (SEMs). By leveraging existing code in the lavaan and survey packages, the lavaan.survey package allows for SEM analyses of stratified, clustered, and weighted data, as well as multiply imputed complex survey data. lavaan.survey provides several features such as SEMs with replicate weights, a variety of resampling techniques for complex samples, and finite population corrections, features that should prove useful for SEM practitioners faced with the common situation of a sample that is not iid.

306 citations


Journal ArticleDOI
TL;DR: The capabilities of the package exams for automatic generation of (statistical) exams in R are extended by adding support for learning management systems by creating a new modular and extensible design that allows for reading all weaved exercises into R and managing associated supplementary files.
Abstract: The capabilities of the package exams for automatic generation of (statistical) exams in R are extended by adding support for learning management systems: As in earlier versions of the package exam generation is still based on separate Sweave files for each exercise – but rather than just producing different types of PDF output files, the package can now render the same exercises into a wide variety of output formats. These include HTML (with various options for displaying mathematical content) and XML specifications for online exams in learning management systems such as Moodle or OLAT. This flexibility is accomplished by a new modular and extensible design of the package that allows for reading all weaved exercises into R and managing associated supplementary files (such as graphics or data files). The manuscript discusses the readily available user interfaces, the design of the underlying infrastructure, and how new functionality can be built on top of the existing tools.

205 citations


Journal ArticleDOI
TL;DR: This paper shows how to estimate conditional quantile functions with random effects using the R package lqmm and describes the optimization algorithms used.
Abstract: Inference in quantile analysis has received considerable attention in the recent years. Linear quantile mixed models (Geraci and Bottai 2014) represent a flexible statistical tool to analyze data from sampling designs such as multilevel, spatial, panel or longitudinal, which induce some form of clustering. In this paper, I will show how to estimate conditional quantile functions with random effects using the R package lqmm. Modeling, estimation and inference are discussed in detail using a real data example. A thorough description of the optimization algorithms is also provided.

199 citations


Journal ArticleDOI
TL;DR: This work discusses issues with reference to the tools in R for nonlinear parameter estimation (NLPE) and optimization, though for the present article `optimization` will be limited to function minimization of essentially smooth functions with at most bounds constraints on the parameters.
Abstract: R (R Core Team 2014) provides a powerful and flexible system for statistical computations. It has a default-install set of functionality that can be expanded by the use of several thousand add-in packages as well as user-written scripts. While R is itself a programming language, it has proven relatively easy to incorporate programs in other languages, particularly Fortran and C. Success, however, can lead to its own costs: Users face a confusion of choice when trying to select packages in approaching a problem. A need to maintain workable examples using early methods may mean some tools offered as a default may be dated. In an open-source project like R, how to decide what tools offer "best practice" choices, and how to implement such a policy, present a serious challenge. We discuss these issues with reference to the tools in R for nonlinear parameter estimation (NLPE) and optimization, though for the present article `optimization` will be limited to function minimization of essentially smooth functions with at most bounds constraints on the parameters. We will abbreviate this class of problems as NLPE. We believe that the concepts proposed are transferable to other classes of problems seen by R users.

188 citations


Journal ArticleDOI
TL;DR: The package MissMech implements two tests of MCAR that can be run using a function called TestMCARNormality, which is valid if data are normally distributed, and another test does not require any distributional assumptions for the data.
Abstract: Researchers are often faced with analyzing data sets that are not complete. To properly analyze such data sets requires the knowledge of the missing data mechanism. If data are missing completely at random (MCAR), then many missing data analysis techniques lead to valid inference. Thus, tests of MCAR are desirable. The package MissMech implements two tests developed by Jamshidian and Jalal (2010) for this purpose. These tests can be run using a function called TestMCARNormality. One of the tests is valid if data are normally distributed, and another test does not require any distributional assumptions for the data. In addition to testing MCAR, in some special cases, the function TestMCARNormality is also able to test whether data have a multivariate normal distribution. As a bonus, the functions in MissMech can also be used for the following additional tasks: (i) test of homoscedasticity for several groups when data are completely observed, (ii) perform the k-sample test of Anderson-Darling to determine whether k groups of univariate data come from the same distribution, (iii) impute incomplete data sets using two methods, one where normality is assumed and one where no specic distributional assumptions are made, (iv) obtain normal-theory maximum likelihood estimates for mean and covariance matrix when data are incomplete, along with their standard errors, and nally (v) perform the Neyman’s test of uniformity. All of these features are explained in the paper, including examples.

Journal ArticleDOI
TL;DR: A review of so-called spectral projected gradient methods for convex constrained optimization for low-cost schemes that rely on choosing the step lengths according to novel ideas that are related to the spectrum of the underlying local Hessian.
Abstract: Over the last two decades, it has been observed that using the gradient vector as a search direction in large-scale optimization may lead to efficient algorithms. The effectiveness relies on choosing the step lengths according to novel ideas that are related to the spectrum of the underlying local Hessian rather than related to the standard decrease in the objective function. A review of these so-called spectral projected gradient methods for convex constrained optimization is presented. To illustrate the performance of these low-cost schemes, an optimization problem on the set of positive definite matrices is described.

Journal ArticleDOI
TL;DR: The evtree package is described, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R, providing unified infrastructure for summaries, visualizations, and predictions.
Abstract: Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. This paper describes the evtree package, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. Computationally intensive tasks are fully computed in C++ while the partykit package is leveraged for representing the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions. evtree is compared to the open-source CART implementation rpart, conditional inference trees (ctree), and the open-source C4.5 implementation J48. A benchmark study of predictive accuracy and complexity is carried out in which evtree achieved at least similar and most of the time better results compared to rpart, ctree, and J48. Furthermore, the usefulness of evtree in practice is illustrated in a textbook customer classification task.

Journal ArticleDOI
TL;DR: Diverging stacked bar charts are recommended as the primary graphical display technique for Likert and related scales and the perceptual and programming issues in constructing these graphs are discussed.
Abstract: Rating scales, such as Likert scales, are very common in marketing research, customer satisfaction studies, psychometrics, opinion surveys, population studies, and numerous other fields. We recommend diverging stacked bar charts as the primary graphical display technique for Likert and related scales. We also show other applications where diverging stacked bar charts are useful. Many examples of plots of Likert scales are given. We discuss the perceptual and programming issues in constructing these graphs. We present two implementations for diverging stacked bar charts. Most examples in this paper were drawn with the likert function included in the HH package in R. We also have a dashboard in Tableau.

Journal ArticleDOI
TL;DR: The R package compareGroups provides functions meant to facilitate the construction of bivariate tables (descriptives of several variables for comparison between groups) and generates reports in several formats (LATEX, HTML or plain text CSV) and a graphical user interface (GUI) has been implemented.
Abstract: The R package compareGroups provides functions meant to facilitate the construction of bivariate tables (descriptives of several variables for comparison between groups) and generates reports in several formats (LATEX, HTML or plain text CSV). Moreover, bivariate tables can be viewed directly on the R console in a nice format. A graphical user interface (GUI) has been implemented to build the bivariate tables more easily for those users who are not familiar with the R software. Some new functions and methods have been incorporated in the newest version of the compareGroups package (version 1.x) to deal with time-to-event variables, stratifying tables, merging several tables, and revising the statistical methods used. The GUI interface also has been improved, making it much easier and more intuitive to set the inputs for building the bivariate tables. The first version (version 0.x) and this version were presented at the 2010 useR! conference (Sanz, Subirana, and Vila 2010) and the 2011 useR! conference (Sanz, Subirana, and Vila 2011), respectively. Package compareGroups is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=compareGroups.

Journal ArticleDOI
TL;DR: The SSN package for R provides a set of functions for importing, simulating, and modeling of stream network data, including diagnostics and prediction, and traditional models that use Euclidean distance and simple random effects are included.
Abstract: The SSN package for R provides a set of functions for modeling stream network data. The package can import geographic information systems data or simulate new data as a ‘SpatialStreamNetwork’, a new object class that builds on the spatial sp classes. Functions are provided that fit spatial linear models (SLMs) for the ‘SpatialStreamNetwork’ object. The covariance matrix of the SLMs use distance metrics and geostatistical models that are unique to stream networks; these models account for the distances and topological configuration of stream networks, including the volume and direction of flowing water. In addition, traditional models that use Euclidean distance and simple random effects are included, along with Poisson and binomial families, for a generalized linear mixed model framework. Plotting and diagnostic functions are provided. Prediction (kriging) can be performed for missing data or for a separate set of unobserved locations, or block prediction (block kriging) can be used over sets of stream segments. This article summarizes the SSN package for importing, simulating, and modeling of stream network data, including diagnostics and prediction.

Journal ArticleDOI
TL;DR: This document provides a brief introduction to the R package gss for nonparametric statistical modeling in a variety of problem settings including regression, density estimation, and hazard estimation.
Abstract: This document provides a brief introduction to the R package gss for nonparametric statistical modeling in a variety of problem settings including regression, density estimation, and hazard estimation. Functional ANOVA (analysis of variance) decompositions are built into models on product domains, and modeling and inferential tools are provided for tasks such as interval estimates, the “testing” of negligible model terms, the handling of correlated data, etc. The methodological background is outlined, and data analysis is illustrated using real-data examples.

Journal ArticleDOI
TL;DR: The Natter is presented, a statistical software toolbox for natural images models that can provide consistency in the development and comparison of probabilistic models for natural image data.
Abstract: The statistical analysis and modeling of natural images is an important branch of statistics with applications in image signaling, image compression, computer vision, and human perception. Because the space of all possible images is too large to be sampled exhaustively, natural image models must inevitably make assumptions in order to stay tractable. Subsequent model comparison can then filter out those models that best capture the statistical regularities in natural images. Proper model comparison, however, often requires that the models and the preprocessing of the data match down to the implementation details. Here we present the Natter, a statistical software toolbox for natural images models, that can provide such consistency. The Natter includes powerful but tractable baseline model as well as standardized data preprocessing steps. It has an extensive test suite to ensure correctness of its algorithms, it interfaces to the modular toolkit for data processing toolbox MDP, and provides simple ways to log the results of numerical experiments. Most importantly, its modular structure can be extended by new models with minimal coding effort, thereby providing a platform for the development and comparison of probabilistic models for natural image data.

Journal ArticleDOI
TL;DR: The main fitting function of the R package movMF, which contains functionality to draw samples from finite mixtures of von Mises-Fisher distributions and to fit these models using the expectation-maximization algorithm for maximum likelihood estimation, is described and illustrated.
Abstract: Finite mixtures of von Mises-Fisher distributions allow to apply model-based clustering methods to data which is of standardized length, i.e., all data points lie on the unit sphere. The R package movMF contains functionality to draw samples from finite mixtures of von Mises-Fisher distributions and to fit these models using the expectation-maximization algorithm for maximum likelihood estimation. Special features are the possibility to use sparse matrix representations for the input data, different variants of the expectation-maximization algorithm, different methods for determining the concentration parameters in the M-step and to impose constraints on the concentration parameters over the components. In this paper we describe the main fitting function of the package and illustrate its application. In addition we compare the clustering performance of finite mixtures of von Mises-Fisher distributions to spherical k-means. We also discuss the resolution of several numerical issues which occur for estimating the concentration parameters and for determining the normalizing constant of the von Mises-Fisher distribution.

Journal ArticleDOI
TL;DR: This paper offers a tutorial in survival estimation for the time-varying coefficient model, implemented in SAS and R, and provides a macro coxtvc to facilitate estimation in SAS where the current functionality is more limited.
Abstract: Survival estimates are an essential compliment to multivariable regression models for time-to-event data, both for prediction and illustration of covariate effects. They are easily obtained under the Cox proportional-hazards model. In populations defined by an initial, acute event, like myocardial infarction, or in studies with long-term followup, the proportional-hazards assumption of constant hazard ratios is frequently violated. One alternative is to fit an interaction between covariates and a prespecified function of time, implemented as a time-dependent covariate. This effectively creates a time-varying coefficient that is easily estimated in software such as SAS and R. However, the usual programming statements for survival estimation are not directly applicable. Unique data manipulation and syntax is required, but is not well documented for either software. This paper offers a tutorial in survival estimation for the time-varying coefficient model, implemented in SAS and R. We provide a macro coxtvc to facilitate estimation in SAS where the current functionality is more limited. The macro is validated in simulated data and illustrated in an application.

Journal ArticleDOI
TL;DR: The main features of the Bergm package for the open-source R software which provides a comprehensive framework for Bayesian analysis of exponential random graph models: tools for parameter estimation, model selection and goodness-of- fit diagnostics are described.
Abstract: In this paper we describe the main features of the Bergm package for the open-source R software which provides a comprehensive framework for Bayesian analysis of exponential random graph models: tools for parameter estimation, model selection and goodness-of- fit diagnostics. We illustrate the capabilities of this package describing the algorithms through a tutorial analysis of three network datasets.

Journal ArticleDOI
TL;DR: The STARS toolset as discussed by the authors is designed for use with a landscape network (LSN), which is a topological data model produced by the FLoWS ArcGIS geoprocessing toolset.
Abstract: This paper describes the STARS ArcGIS geoprocessing toolset, which is used to calcu- late the spatial information needed to fit spatial statistical models to stream network data using the SSN package. The STARS toolset is designed for use with a landscape network (LSN), which is a topological data model produced by the FLoWS ArcGIS geoprocessing toolset. An overview of the FLoWS LSN structure and a few particularly useful tools is also provided so that users will have a clear understanding of the underlying data struc- ture that the STARS toolset depends on. This document may be used as an introduction to new users. The methods used to calculate the spatial information and format the final .ssn object are also explicitly described so that users may create their own .ssn object using other data models and software.

Journal ArticleDOI
TL;DR: The design of the yuima package is explained, some examples of applications are provided and the basic infrastructure on which complex models and inference procedures can be built on are provided.
Abstract: The YUIMA Project is an open source and collaborative effort aimed at developing the R package yuima for simulation and inference of stochastic differential equations. In the yuima package stochastic differential equations can be of very abstract type, multidimensional, driven by Wiener process or fractional Brownian motion with general Hurst parameter, with or without jumps specified as Levy noise. The yuima package is intended to offer the basic infrastructure on which complex models and inference procedures can be built on. This paper explains the design of the yuima package and provides some examples of applications.

Journal ArticleDOI
TL;DR: The RNetLogo package delivers an interface to embed the agent-based modeling platform NetLogo into the R environment with headless (no graphical user interface) or interactive GUI mode, which enables the modeler to design simulation experiments, store simulation results, and analyze simulation output in a more systematic way.
Abstract: The RNetLogo package delivers an interface to embed the agent-based modeling platform NetLogo into the R environment with headless (no graphical user interface) or interactive GUI mode. It provides functions to load models, execute commands, push values, and to get values from NetLogo reporters. Such a seamless integration of a widely used agent-based modeling platform with a well-known statistical computing and graphics environment opens up various possibilities. For example, it enables the modeler to design simulation experiments, store simulation results, and analyze simulation output in a more systematic way. It can therefore help close the gaps in agent-based modeling regarding standards of description and analysis. After a short overview of the agent-based modeling approach and the software used here, the paper delivers a step-by-step introduction to the usage of the RNetLogo package by examples.

Journal ArticleDOI
TL;DR: The R package ThreeWay is presented and its main features are illustrated and the most relevant available functions are T3 and CP, which implement, respectively, the Tucker3 and Candecomp/Parafac methods.
Abstract: The R package ThreeWay is presented and its main features are illustrated. The aim of ThreeWay is to offer a suit of functions for handling three-way arrays. In particular, the most relevant available functions are T3 and CP, which implement, respectively, the Tucker3 and Candecomp/Parafac methods. They are the two most popular tools for summarizing three-way arrays in terms of components. After briefly recalling both techniques from a theoretical point of view, the functions T3 and CP are described by considering three real life examples.

Journal ArticleDOI
TL;DR: The self-correcting nature of science has been questioned repeatedly in the decade since Ioannidis (2005), "Why Most Published Research Findings Are False" and whether public faith in science was misplaced.
Abstract: The self-correcting nature of science has been questioned repeatedly in the decade since Ioannidis (2005), “Why Most Published Research Findings Are False”. The title of the paper, if not the entirety of its content, was widely publicized and called into question whether scientific research was truly contributing to human knowledge and whether public faith in science was misplaced. The existential crisis that ensued within the scientific community identified a lack of replication and reproducibility as a major problem that needed to be addressed.

Journal ArticleDOI
TL;DR: The R package HAC is presented, which provides user friendly methods for dealing with hierarchical Archimedean copulae (HAC), and Computationally efficient estimation procedures allow to recover the structure and the parameters of HAC from data.
Abstract: This paper presents the R package HAC, which provides user friendly methods for dealing with hierarchical Archimedean copulae (HAC). Computationally efficient estimation procedures allow to recover the structure and the parameters of HAC from data. In addition, arbitrary HAC can be constructed to sample random vectors and to compute the values of the corresponding cumulative distribution plus density functions. Accurate graphics of the HAC structure can be produced by the plot method implemented for these objects.

Journal ArticleDOI
TL;DR: The R package pdfCluster performs cluster analysis based on a nonparametric estimate of the density of the observed variables based on kernel density estimation and its application is illustrated with the aid of two data sets.
Abstract: The R package pdfCluster performs cluster analysis based on a nonparametric estimate of the density of the observed variables. Functions are provided to encompass the whole process of clustering, from kernel density estimation, to clustering itself and subsequent graphical diagnostics. After summarizing the main aspects of the methodology, we describe the features and the usage of the package, and finally illustrate its application with the aid of two data sets.

Journal ArticleDOI
TL;DR: The 𝖱 package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures, and explains their applicability to general data sets.
Abstract: The 𝖱 package structSSI provides an accessible implementation of two recently developed simultaneous and selective inference techniques: the group Benjamini-Hochberg and hierarchical false discovery rate procedures Unlike many multiple testing schemes, these methods specifically incorporate existing information about the grouped or hierarchical dependence between hypotheses under consideration while controlling the false discovery rate Doing so increases statistical power and interpretability Furthermore, these procedures provide novel approaches to the central problem of encoding complex dependency between hypotheses We briefly describe the group Benjamini-Hochberg and hierarchical false discovery rate procedures and then illustrate them using two examples, one a measure of ecological microbial abundances and the other a global temperature time series For both procedures, we detail the steps associated with the analysis of these particular data sets, including establishing the dependence structures, performing the test, and interpreting the results These steps are encapsulated by 𝖱 functions, and we explain their applicability to general data sets

Journal ArticleDOI
TL;DR: This work focuses throughout primarily on implementations in the R environment that rely on solution methods linked to R, like MOSEK by the package Rmosek, dealing with empirical Bayes estimation of nonparametric mixture models.
Abstract: Convex optimization now plays an essential role in many facets of statistics. We briefly survey some recent developments and describe some implementations of these methods in R . Applications of linear and quadratic programming are introduced including quantile regression, the Huber M-estimator and various penalized regression methods. Applications to additively separable convex problems subject to linear equality and inequality constraints such as nonparametric density estimation and maximum likelihood estimation of general nonparametric mixture models are described, as are several cone programming problems. We focus throughout primarily on implementations in the R environment that rely on solution methods linked to R, like MOSEK by the package Rmosek. Code is provided in R to illustrate several of these problems. Other applications are available in the R package REBayes, dealing with empirical Bayes estimation of nonparametric mixture models.