Showing papers in "Journal of Statistical Software in 2009"

PDF

Open Access

Journal Article•DOI•

Data Analysis Using Regression and Multilevel/Hierarchical Models

[...]

Joseph Hilbe

27 Apr 2009-Journal of Statistical Software

4,114 citations

Journal Article•DOI•

CircStat: A Matlab Toolbox for Circular Statistics

[...]

Philipp Berens¹•Institutions (1)

Max Planck Society¹

23 Sep 2009-Journal of Statistical Software

TL;DR: The CircStat toolbox for MATLAB is implemented which provides methods for the descriptive and inferential statistical analysis of directional data and analyzes a dataset from neurophysiology to demonstrate the capabilities of the Circstat toolbox.

...read moreread less

Abstract: Directional data is ubiquitious in science. Due to its circular nature such data cannot be analyzed with commonly used statistical techniques. Despite the rapid development of specialized methods for directional statistics over the last fifty years, there is only little software available that makes such methods easy to use for practioners. Most importantly, one of the most commonly used programming languages in biosciences, MATLAB, is currently not supporting directional statistics. To remedy this situation, we have implemented the CircStat toolbox for MATLAB which provides methods for the descriptive and inferential statistical analysis of directional data. We cover the statistical background of the available methods and describe how to apply them to data. Finally, we analyze a dataset from neurophysiology to demonstrate the capabilities of the CircStat toolbox.

...read moreread less

2,557 citations

Journal Article•DOI•

mixtools: An R Package for Analyzing Finite Mixture Models

[...]

Tatiana Benaglia¹, Didier Chauveau, David R. Hunter¹, Derek S. Young¹•Institutions (1)

Pennsylvania State University¹

21 Oct 2009-Journal of Statistical Software

TL;DR: The mixtools package for R provides a set of functions for analyzing a variety of finite mixture models, which include both traditional methods, such as EM algorithms for univariate and multivariate normal mixtures, and newer methods that reflect some recent research in finite mixture Models.

...read moreread less

Abstract: The mixtools package for R provides a set of functions for analyzing a variety of nite mixture models. These functions include both traditional methods, such as EM algorithms for univariate and multivariate normal mixtures, and newer methods that reect some recent research in nite mixture models. In the latter category, mixtools provides algorithms for estimating parameters in a wide range of dierent mixture-of-regression contexts, in multinomial mixtures such as those arising from discretizing continuous multivariate data, in nonparametric situations where the multivariate component densities are completely unspecied, and in semiparametric situations such as a univariate location mixture of symmetric but otherwise unspecied densities. Many of the algorithms of the mixtools package are EM algorithms or are based on EM-like ideas, so this article includes an overview of EM algorithms for nite mixture models.

...read moreread less

1,079 citations

Journal Article•DOI•

Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package

[...]

Toni Giorgino

14 Aug 2009-Journal of Statistical Software

TL;DR: The dtw package allows R users to compute time series alignments mixing freely a variety of continuity constraints, restriction windows, endpoints, local distance definitions, and so on.

...read moreread less

Abstract: Dynamic time warping is a popular technique for comparing time series, providing both a distance measure that is insensitive to local compression and stretches and the warping which optimally deforms one of the two input series onto the other. A variety of algorithms and constraints have been discussed in the literature. The dtw package provides an unification of them; it allows R users to compute time series alignments mixing freely a variety of continuity constraints, restriction windows, endpoints, local distance definitions, and so on. The package also provides functions for visualizing alignments and constraints using several classic diagram types.

...read moreread less

833 citations

Journal Article•DOI•

Multidimensional Scaling Using Majorization: SMACOF in R

[...]

Jan de Leeuw, Patrick Mair

04 Aug 2009-Journal of Statistical Software

TL;DR: In this paper, the authors present the methodology of multidimensional scaling problems (MDS) solved by means of the majorization algorithm, where the objective function to be minimized is known as stress and functions which majorize stress are elaborated.

...read moreread less

Abstract: In this paper we present the methodology of multidimensional scaling problems (MDS) solved by means of the majorization algorithm. The objective function to be minimized is known as stress and functions which majorize stress are elaborated. This strategy to solve MDS problems is called SMACOF and it is implemented in an R package of the same name which is presented in this article. We extend the basic SMACOF theory in terms of configuration constraints, three-way data, unfolding models, and projection of the resulting configurations onto spheres and other quadratic surfaces. Various examples are presented to show the possibilities of the SMACOF approach offered by the corresponding package.

...read moreread less

476 citations

Journal Article•DOI•

cem: Software for Coarsened Exact Matching

[...]

Stefano Maria Iacus, Gary King, Giuseppe Porro

25 Jun 2009-Journal of Statistical Software

TL;DR: The program implements the coarsened exact matching (CEM) algorithm, described below, which may be used alone or in combination with any existing matching method.

...read moreread less

Abstract: This program is designed to improve causal inference via a method of matching that is widely applicable in observational data and easy to understand and use (if you understand how to draw a histogram, you will understand this method). The program implements the coarsened exact matching (CEM) algorithm, described below. CEM may be used alone or in combination with any existing matching method. This algorithm, and its statistical properties, are described in Iacus, King, and Porro (2008).

...read moreread less

395 citations

Journal Article•DOI•

Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package

[...]

John Fox, Jangman Hong

14 Oct 2009-Journal of Statistical Software

TL;DR: Substantial extensions to the effects package for R are described to construct effect displays for multinomial and proportional-odds logit models, which are limited to linear and generalized linear models.

...read moreread less

Abstract: Based on recent work by Fox and Andersen (2006), this paper describes substantial extensions to the effects package for R to construct effect displays for multinomial and proportional-odds logit models. The package previously was limited to linear and generalized linear models. Effect displays are tabular and graphical representations of terms — typically high-order terms — in a statistical model. For polytomous logit models, effect displays depict fitted category probabilities under the model, and can include point-wise confidence envelopes for the effects. The construction of effect displays by functions in the effects package is essentially automatic. The package provides several kinds of displays for polytomous logit models.

...read moreread less

267 citations

Journal Article•DOI•

Modern Multivariate Statistical Techniques: Regression, Classification and Manifold Learning

[...]

John H. Maindonald

20 Feb 2009-Journal of Statistical Software

257 citations

Journal Article•DOI•

Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods

[...]

Jan de Leeuw, Kurt Hornik, Patrick Mair

21 Oct 2009-Journal of Statistical Software

TL;DR: In this article, a generalized version of the pool-adjacent-violators algorithm (PAVA) is proposed to minimize a separable convex function with simple chain constraints.

...read moreread less

Abstract: In this paper we give a general framework for isotone optimization. First we discuss a generalized version of the pool-adjacent-violators algorithm (PAVA) to minimize a separable convex function with simple chain constraints. Besides of general convex functions we extend existing PAVA implementations in terms of observation weights, approaches for tie handling, and responses from repeated measurement designs. Since isotone optimization problems can be formulated as convex programming problems with linear constraints we then develop a primal active set method to solve such problem. This methodology is applied on specific loss functions relevant in statistics. Both approaches are implemented in the R package isotone. (authors' abstract)

...read moreread less

211 citations

Journal Article•DOI•

BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function

[...]

Ravi Varadhan, Paul Gilbert¹•Institutions (1)

Bank of Canada¹

14 Oct 2009-Journal of Statistical Software

TL;DR: R package BB is discussed, in particular, its capabilities for solving a nonlinear system of equations, and the utility of these functions for solving large systems of nonlinear equations, smooth, nonlinear estimating equations in statistical modeling, and non-smooth estimating equations arising in rank-based regression modeling of censored failure time data.

...read moreread less

Abstract: This introduction to the R package BB is a (slightly) modied version of Varadhan and Gilbert (2009), published in the Journal of Statistical Software. We discuss R package BB, in particular, its capabilities for solving a nonlinear system of equations. The function BBsolve in BB can be used for this purpose. We demonstrate the utility of these functions for solving: (a) large systems of nonlinear equations, (b) smooth, nonlinear estimating equations in statistical modeling, and (c) non-smooth estimating equations arising in rank-based regression modeling of censored failure time data. The function BBoptim can be used to solve smooth, box-constrained optimization problems. A main strength of BB is that, due to its low memory and storage requirements, it is ideally suited for solving high-dimensional problems with thousands of variables.

...read moreread less

194 citations

Journal Article•DOI•

Simple and Canonical Correspondence Analysis Using the R Package anacor

[...]

Jan de Leeuw, Patrick Mair

04 Aug 2009-Journal of Statistical Software

TL;DR: Anacor as mentioned in this paper is a package for the computation of simple and canonical correspondence analysis with missing values, which is specified in a rather general way by imposing covariates on the rows and/or the columns of the 2D frequency table.

...read moreread less

Abstract: This paper presents the R package anacor for the computation of simple and canonical correspondence analysis with missing values. The canonical correspondence analysis is specified in a rather general way by imposing covariates on the rows and/or the columns of the two-dimensional frequency table. The package allows for scaling methods such as standard, Benzecri, centroid, and Goodman scaling. In addition, along with well-known two- and three-dimensional joint plots including confidence ellipsoids, it offers alternative plotting possibilities in terms of transformation plots, Benzecri plots, and regression plots.

...read moreread less

Journal Article•DOI•

Maximum Entropy Bootstrap for Time Series: The meboot R Package

[...]

Hrishikesh D. Vinod, Javier Lopez-de-Lacalle

21 Jan 2009-Journal of Statistical Software

TL;DR: The maximum entropy bootstrap is an algorithm that creates an ensemble for time series inference and its scope is illustrated by means of several guided applications.

...read moreread less

Abstract: The maximum entropy bootstrap is an algorithm that creates an ensemble for time series inference. Stationarity is not required and the ensemble satisfies the ergodic theorem and the central limit theorem. The meboot R package implements such algorithm. This document introduces the procedure and illustrates its scope by means of several guided applications.

...read moreread less

Journal Article•DOI•

State of the Art in Parallel Computing with R

[...]

Markus Schmidberger¹, Martin Morgan, Dirk Eddelbuettel, Hao Yu, Luke Tierney, Ulrich Mansmann - Show less +2 more•Institutions (1)

Ludwig Maximilian University of Munich¹

04 Aug 2009-Journal of Statistical Software

TL;DR: An overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing is presented, comparing sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance.

...read moreread less

Abstract: R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly suited to general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems five different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix.

...read moreread less

Journal Article•DOI•

xsample(): An R Function for Sampling Linear Inverse Problems

[...]

Karel Van den Meersche, Nioo-Ceme Yerseke, Karline Soetaert, Dick van Oevelen

27 Apr 2009-Journal of Statistical Software

TL;DR: An R function is implemented that uses Markov chain Monte Carlo (MCMC) algorithms to uniformly sample the feasible region of constrained linear problems and a new algorithm where an MCMC step reflects on the inequality constraints.

...read moreread less

Abstract: An R function is implemented that uses Markov chain Monte Carlo (MCMC) algorithms to uniformly sample the feasible region of constrained linear problems. Two existing hit-and-run sampling algorithms are implemented, together with a new algorithm where an MCMC step reects on the inequality constraints. The new algorithm is more robust compared to the hit-and-run methods, at a small cost of increased calculation time.

...read moreread less

Journal Article•DOI•

Gifi Methods for Optimal Scaling in R: The Package homals

[...]

Jan de Leeuw, Patrick Mair

04 Aug 2009-Journal of Statistical Software

TL;DR: In this article, the authors present methodological and practical issues of the R package homals which performs homogeneity analysis and various extensions, such as nonlinear principal component analysis, nonlinear canonical correlation analysis, and predictive models which emulate discriminant analysis and regression models.

...read moreread less

Abstract: Homogeneity analysis combines the idea of maximizing the correlations between variables of a multivariate data set with that of optimal scaling. In this article we present methodological and practical issues of the R package homals which performs homogeneity analysis and various extensions. By setting rank constraints nonlinear principal component analysis can be performed. The variables can be partitioned into sets such that homogeneity analysis is extended to nonlinear canonical correlation analysis or to predictive models which emulate discriminant analysis and regression models. For each model the scale level of the variables can be taken into account by setting level constraints. All algorithms allow for missing values.

...read moreread less

Journal Article•DOI•

Generalized and Customizable Sets in R

[...]

David Meyer, Kurt Hornik

04 Aug 2009-Journal of Statistical Software

TL;DR: This introduction to the R package sets is a (slightly) modied version of Meyer and Hornik (2009a), published in the Journal of Statistical Software.

...read moreread less

Abstract: This introduction to the R package sets is a (slightly) modied version of Meyer and Hornik (2009a), published in the Journal of Statistical Software. We present data structures and algorithms for sets and some generalizations thereof (fuzzy sets, multisets, and fuzzy multisets) available for R through the sets package. Fuzzy (multi-)sets are based on dynamically bound fuzzy logic families. Further extensions include user-denable iterators and matching functions.

...read moreread less

Journal Article•DOI•

From Spider-man to Hero - archetypal analysis in R

[...]

Manuel J. A. Eugster, Friedrich Leisch¹•Institutions (1)

University of Wollongong¹

29 Apr 2009-Journal of Statistical Software

TL;DR: The R package archetypes is presented, which provides an implementation of the archetypal analysis algorithm within R and different exploratory tools to analyze the algorithm during its execution and its final result.

...read moreread less

Abstract: Archetypal analysis has the aim to represent observations in a multivariate data set as convex combinations of extremal points. This approach was introduced by Cutler and Breiman (1994); they defined the concrete problem, laid out the theoretical foundations and presented an algorithm written in Fortran. In this paper we present the R package archetypes which is available on the Comprehensive R Archive Network. The package provides an implementation of the archetypal analysis algorithm within R and different exploratory tools to analyze the algorithm during its execution and its final result. The application of the package is demonstrated on two examples.

...read moreread less

Journal Article•DOI•

Fitting the cusp catastrophe in R: a cusp-package primer

[...]

Raoul P. P. P. Grasman, Han L. J. van der Maas, Eric-Jan Wagenmakers

09 Nov 2009-Journal of Statistical Software

TL;DR: In this paper, the authors present a package that implements and extends the method of Cobb (C Cobb and Watson 1980; Cobb, Koppstein and Chen 1983), and makes it easy to quantitatively fit and compare different cusp catastrophe models in a statistically principled way.

...read moreread less

Abstract: Of the seven elementary catastrophes in catastrophe theory, the "cusp" model is the most widely applied. Most applications are however qualitative. Quantitative techniques for catastrophe modeling have been developed, but so far the limited availability of flexible software has hindered quantitative assessment. We present a package that implements and extends the method of Cobb (Cobb and Watson 1980; Cobb, Koppstein, and Chen 1983), and makes it easy to quantitatively fit and compare different cusp catastrophe models in a statistically principled way. After a short introduction to the cusp catastrophe, we demonstrate the package with two instructive examples.

...read moreread less

Journal Article•DOI•

MCPMod: An R Package for the Design and Analysis of Dose-Finding Studies

[...]

Björn Bornkamp, José Pinheiro, Frank Bretz

20 Feb 2009-Journal of Statistical Software

TL;DR: The MCPMod package provides tools for the analysis of dose finding trials, as well as a variety of tools necessary to plan an experiment to be analyzed using the MCP-Mod methodology.

...read moreread less

Abstract: In this article the MCPMod package for the R programming environment will be introduced. It implements a recently developed methodology for the design and analysis of dose-response studies that combines aspects of multiple comparison procedures and modeling approaches (Bretz et al. 2005). The MCPMod package provides tools for the analysis of dose finding trials, as well as a variety of tools necessary to plan an experiment to be analyzed using the MCP-Mod methodology.

...read moreread less

Journal Article•DOI•

Analysis of Integrated and Cointegrated Time Series with R (2nd Edition)

[...]

Dirk Eddelbuettel

27 Apr 2009-Journal of Statistical Software

TL;DR: The book, now in its second edition, provides an overview of this active area of research in time series econometrics and manages to be thorough (using formal notation), yet remains applied in its focus.

...read moreread less

Abstract: The book, now in its second edition, provides an overview of this active area of research in time series econometrics. It manages to be thorough (using formal notation), yet remains applied in its focus. A number of examples are discussed, often by using datasets from the original publications. Code examples are provided throughout, frequently using the contributed packages urca and vars by the same author.

...read moreread less

Journal Article•DOI•

BiplotGUI: Interactive Biplots in R

[...]

Anthony la Grange, Niël le Roux, Sugnet Gardner-Lubbe

25 Jun 2009-Journal of Statistical Software

TL;DR: The BiplotGUI package provides a graphical user interface for the construction of, interaction with, and manipulation of biplots in R that requires almost no knowledge of R syntax.

...read moreread less

Abstract: Biplots simultaneously provide information on both the samples and the variables of a data matrix in two- or three-dimensional representations. The BiplotGUI package provides a graphical user interface for the construction of, interaction with, and manipulation of biplots in R. The samples are represented as points, with coordinates determined either by the choice of biplot, principal coordinate analysis or multidimensional scaling. Various transformations and dissimilarity metrics are available. Information on the original variables is incorporated by linear or non-linear calibrated axes. Goodness-of-fit measures are provided. Additional descriptors can be superimposed, including convex hulls, alpha-bags, point densities and classification regions. Amongst the interactive features are dynamic variable value prediction, zooming and point and axis drag-and-drop. Output can easily be exported to the R workspace for further manipulation. Three-dimensional biplots are incorporated via the rgl package. The user requires almost no knowledge of R syntax.

...read moreread less

Journal Article•DOI•

MetaEasy: A Meta-Analysis Add-In for Microsoft Excel

[...]

Evangelos Kontopantelis, David Reeves

29 Apr 2009-Journal of Statistical Software

TL;DR: In this article, the authors implemented the meta-analysis methodology in an Microsoft (Excel) add-in which is freely available and incorporates more meta analysis models (including the iterative maximum likelihood and profile likelihood) than are usually available, while paying particular attention to the user-friendliness of the package.

...read moreread less

Abstract: Meta-analysis is a statistical methodology that combines or integrates the results of several independent clinical trials considered by the analyst to be 'combinable' (Huque 1988). However, completeness and user-friendliness are uncommon both in specialised meta-analysis software packages and in mainstream statistical packages that have to rely on user-written commands. We implemented the meta-analysis methodology in an Microsoft (Excel) add-in which is freely available and incorporates more meta-analysis models (including the iterative maximum likelihood and profile likelihood) than are usually available, while paying particular attention to the user-friendliness of the package.

...read moreread less

Journal Article•DOI•

The Construction of a Williams Design and Randomization in Cross-Over Clinical Trials Using SAS

[...]

Bing-Shun Wang, Xiao-Jin Wang, Li-Kun Gong

20 Feb 2009-Journal of Statistical Software

TL;DR: The present paper provides a general SAS program for the random construction of a Williams design and the relevant procedure for randomization, and meets the practical needs of researchers in the application of Williams designs.

...read moreread less

Abstract: A Williams design is a special and useful type of cross-over design. Balance is achieved by using only one particular Latin square if there are even numbers of treatments, and by using only two appropriate squares if there are odd numbers of treatments. PROC PLAN of SAS/STAT is a practical tool, not only for random construction of the Williams square, but also for randomly assigning treatment sequences to the subjects, which makes integration of the two procedures possible. The present paper provides a general SAS program for the random construction of a Williams design and the relevant procedure for randomization. Examples of a three-treatment, three-period (3 x 3) and a four-treatment, four-period (4 x 4) cross-over designs are given to illustrate the function of the SAS program. The results can be regenerated and replicated with the same random number seed. The general SAS program meets the practical needs of researchers in the application of Williams designs.

...read moreread less

Journal Article•DOI•

SMCTC : sequential Monte Carlo in C++

[...]

Adam M. Johansen

29 Apr 2009-Journal of Statistical Software

TL;DR: A C++ template class library for the efficient and convenient implementation of very general Sequential Monte Carlo algorithms is presented and two example applications are provided: a simple particle filter for illustrative purposes and a state-of-the-art algorithm for rare event estimation.

...read moreread less

Abstract: Sequential Monte Carlo methods are a very general class of Monte Carlo methods for sampling from sequences of distributions. Simple examples of these algorithms are used very widely in the tracking and signal processing literature. Recent developments illustrate that these techniques have much more general applicability, and can be applied very effectively to statistical inference problems. Unfortunately, these methods are often perceived as being computationally expensive and difficult to implement. This article seeks to address both of these problems. A C++ template class library for the efficient and convenient implementation of very general Sequential Monte Carlo algorithms is presented. Two example applications are provided: a simple particle filter for illustrative purposes and a state-of-the-art algorithm for rare event estimation.

...read moreread less

Journal Article•DOI•

LogConcDEAD: An R Package for Maximum Likelihood Estimation of a Multivariate Log-Concave Density

[...]

Madeleine Cule, Robert B. Gramacy, Richard J. Samworth

20 Jan 2009-Journal of Statistical Software

TL;DR: The R package LogConcDEAD (Log-concave density estimation in arbitrary dimensions) is introduced, its main function is to compute the nonparametric maximum likelihood estimator of a log-conCave density.

...read moreread less

Abstract: In this article we introduce the R package LogConcDEAD (Log-concave density estimation in arbitrary dimensions). Its main function is to compute the nonparametric maximum likelihood estimator of a log-concave density. Functions for plotting, sampling from the density estimate and evaluating the density estimate are provided. All of the functions available in the package are illustrated using simple, reproducible examples with simulated data.

...read moreread less

Journal Article•

A meta-analysis add-in for microsoft excel

[...]

Evangelos Kontopantelis, David Reeves

01 Apr 2009-Journal of Statistical Software

TL;DR: This work implemented the meta-analysis methodology in an Microsoft (Excel) add-in which is freely available and incorporates more meta- analysis models than are usually available, while paying particular attention to the user-friendliness of the package.

...read moreread less

Abstract: Meta-analysis is a statistical methodology that combines or integrates the results of several independent clinical trials considered by the analyst to be `combinable' (Huque 1988). However, completeness and user-friendliness are uncommon both in specialised meta-analysis software packages and in mainstream statistical packages that have to relyon user-written commands. We implemented the meta-analysis methodology in an Microsoft Excel add-in which is freely available and incorporates more meta-analysis models (including the iterative maximum likelihood and prole likelihood) than are usually available, while paying particular attention to the user-friendliness of the package.

...read moreread less

Journal Article•DOI•

Robust Likelihood-Based Survival Modeling with Microarray Data

[...]

HyungJun Cho¹, Ami Yu¹, Sukwoo Kim¹, Jaewoo Kang, Seung-Mo Hong² - Show less +1 more•Institutions (2)

Korea University¹, Johns Hopkins University²

13 Jan 2009-Journal of Statistical Software

TL;DR: For survival modeling with microarray data, a software program is developed which can be used conveniently and interactively in the R environment and can discover multiple sets of genes by iterative forward selection rather than one large set of genes.

...read moreread less

Abstract: Gene expression data can be associated with various clinical outcomes. In particular, these data can be of importance in discovering survival-associated genes for medical applications. As alternatives to traditional statistical methods, sophisticated methods and software programs have been developed to overcome the high-dimensional difficulty of microarray data. Nevertheless, new algorithms and software programs are needed to include practical functions such as the discovery of multiple sets of survival-associated genes and the incorporation of risk factors, and to use in the R environment which many statisticians are familiar with. For survival modeling with microarray data, we have developed a software program (called rbsurv) which can be used conveniently and interactively in the R environment. This program selects survival-associated genes based on the partial likelihood of the Cox model and separates training and validation sets of samples for robustness. It can discover multiple sets of genes by iterative forward selection rather than one large set of genes. It can also allow adjustment for risk factors in microarray survival modeling. This software package, the rbsurv package, can be used to discover survival-associated genes with microarray data conveniently.

...read moreread less

Journal Article•DOI•

SOCR Analyses: Implementation and Demonstration of a New Graphical Statistics Educational Toolkit

[...]

Annie Chu¹, Jenny Cui, Ivo D. Dinov•Institutions (1)

University of California, Los Angeles¹

15 Apr 2009-Journal of Statistical Software

TL;DR: The design framework, computational implementation and the utilization of SOCR Analyses are presented, which will help facilitate statistical learning for high school and undergraduate students.

...read moreread less

Abstract: The web-based, Java-written SOCR (Statistical Online Computational Resource) tools have been utilized in many undergraduate and graduate level statistics courses for seven years now (Dinov 2006; Dinov et al. 2008b). It has been proven that these resources can successfully improve students' learning (Dinov et al. 2008b). Being first published online in 2005, SOCR Analyses is a somewhat new component and it concentrate on data modeling for both parametric and non-parametric data analyses with graphical model diagnostics. One of the main purposes of SOCR Analyses is to facilitate statistical learning for high school and undergraduate students. As we have already implemented SOCR Distributions and Experiments, SOCR Analyses and Charts fulfill the rest of a standard statistics curricula. Currently, there are four core components of SOCR Analyses. Linear models included in SOCR Analyses are simple linear regression, multiple linear regression, one-way and two-way ANOVA. Tests for sample comparisons include t-test in the parametric category. Some examples of SOCR Analyses' in the non-parametric category are Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, Kolmogorov-Smirnoff test and Fligner-Killeen test. Hypothesis testing models include contingency table, Friedman's test and Fisher's exact test. The last component of Analyses is a utility for computing sample sizes for normal distribution. In this article, we present the design framework, computational implementation and the utilization of SOCR Analyses.

...read moreread less

Journal Article•DOI•

An Interactive Java Statistical Image Segmentation System: GemIdent.

[...]

Susan Holmes¹, Adam Kapelner, Peter P. Lee•Institutions (1)

Stanford University¹

01 Jun 2009-Journal of Statistical Software

TL;DR: The main innovation is the interactive feature extraction from color images, which is then coupled with the statistical learning algorithms and intensive feedback from the user over many classification-correction iterations, resulting in a highly accurate and user-friendly solution.

...read moreread less

Abstract: Supervised learning can be used to segment/identify regions of interest in images using both color and morphological information. A novel object identification algorithm was developed in Java to locate immune and cancer cells in images of immunohistochemically-stained lymph node tissue from a recent study published by Kohrt et al. (2005). The algorithms are also showing promise in other domains. The success of the method depends heavily on the use of color, the relative homogeneity of object appearance and on interactivity. As is often the case in segmentation, an algorithm specifically tailored to the application works better than using broader methods that work passably well on any problem. Our main innovation is the interactive feature extraction from color images. We also enable the user to improve the classification with an interactive visualization system. This is then coupled with the statistical learning algorithms and intensive feedback from the user over many classification-correction iterations, resulting in a highly accurate and user-friendly solution. The system ultimately provides the locations of every cell recognized in the entire tissue in a text file tailored to be easily imported into R (Ihaka and Gentleman 1996; R Development Core Team 2009) for further statistical analyses. This data is invaluable in the study of spatial and multidimensional relationships between cell populations and tumor structure. This system is available at http://www.GemIdent.com/ together with three demonstration videos and a manual.

...read moreread less

Journal Article•DOI•

Automatic generation of exams in R

[...]

Achim Zeileis, Bettina Gruen, Friedrich Leisch, Nikolaus Umlauf

23 Feb 2009-Journal of Statistical Software

TL;DR: Hands-on illustrations---based on example exercises and control files provided in the package---are presented to get new users started easily.

...read moreread less

Abstract: Package exams provides a framework for automatic generation of standardized statistical exams which is especially useful for large-scale exams. To employ the tools, users just need to supply a pool of exercises and a master file controlling the layout of the final PDF document. The exercises are specified in separate Sweave files (containing R code for data generation and LaTeX code for problem and solution description) and the master file is a LaTeX document with some additional control commands. This paper gives an overview of the main design aims and principles as well as strategies for adaptation and extension. Hands-on illustrations---based on example exercises and control files provided in the package---are presented to get new users started easily.

...read moreread less