scispace - formally typeset
Search or ask a question
Book

A Handbook of Statistical Analyses Using R

17 Feb 2006-
TL;DR: An Introduction to R What Is R?
Abstract: An Introduction to R What Is R? Installing R Help and Documentation Data Objects in R Data Import and Export Basic Data Manipulation Computing with Data Organizing an Analysis Data Analysis Using Graphical Displays Introduction Initial Data Analysis Analysis Using R Simple Inference Introduction Statistical Tests Analysis Using R Conditional Inference Introduction Conditional Test Procedures Analysis Using R Analysis of Variance Introduction Analysis of Variance Analysis Using R Simple and Multiple Linear Regression Introduction Simple Linear Regression Multiple Linear Regression Analysis Using R Logistic Regression and Generalized Linear Models Introduction Logistic Regression and Generalized Linear Models Analysis Using R Density Estimation Introduction Density Estimation Analysis Using R Recursive Partitioning Introduction Recursive Partitioning Analysis Using R Scatterplot Smoothers and Generalized Additive Models Introduction Scatterplot Smoothers and Generalized Additive Models Analysis Using R Survival Analysis Introduction Survival Analysis Analysis Using R Analyzing Longitudinal Data I Introduction Analyzing Longitudinal Data Linear Mixed Effects Models Analysis Using R Prediction of Random Effects The Problem of Dropouts Analyzing Longitudinal Data II Introduction Methods for Nonnormal Distributions Analysis Using R: GEE Analysis Using R: Random Effects Simultaneous Inference and Multiple Comparisons Introduction Simultaneous Inference and Multiple Comparisons Analysis Using R Meta-Analysis Introduction Systematic Reviews and Meta-Analysis Statistics of Meta-Analysis Analysis Using R Meta-Regression Publication Bias Principal Component Analysis Introduction Principal Component Analysis Analysis Using R Multidimensional Scaling Introduction Multidimensional Scaling Analysis Using R Cluster Analysis Introduction Cluster Analysis Analysis Using R Bibliography Index A Summary appears at the end of each chapter.

Content maybe subject to copyright    Report

Citations
More filters
Book
01 Jan 2008
TL;DR: This paper presents a meta-modelling framework for modeling mixed models of clustering, classification, and probability distributions using the 'R' programming language.
Abstract: Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data This textbook provides a straightforward introduction to the statistical analysis of language Designed for linguists with a non-mathematical background, it clearly introduces the basic principles and methods of statistical analysis, using 'R', the leading computational statistics programme The reader is guided step-by-step through a range of real data sets, allowing them to analyse acoustic data, construct grammatical trees for a variety of languages, quantify register variation in corpus linguistics, and measure experimental data using state-of-the-art models The visualization of data plays a key role, both in the initial stages of data exploration and later on when the reader is encouraged to criticize various models Containing over 40 exercises with model answers, this book will be welcomed by all linguists wishing to learn more about working with and presenting quantitative data

2,146 citations


Cites methods from "A Handbook of Statistical Analyses ..."

  • ...A short introduction to the more recent package (lme4 ) used in this chapter is Bates [2005], Everitt and Hothorn [2006] provide some introductory discussion as well. More comprehensive discussion is available in Faraway [2006] and Wood [2006]....

    [...]

  • ...A short introduction to the more recent package (lme4 ) used in this chapter is Bates [2005], Everitt and Hothorn [2006] provide some introductory discussion as well....

    [...]

  • ...A short introduction to the more recent package (lme4 ) used in this chapter is Bates [2005], Everitt and Hothorn [2006] provide some introductory discussion as well. More comprehensive discussion is available in Faraway [2006] and Wood [2006]. A technical overview of the mathematics underlying the implementation of mixed effect models in the lme4 package is Bates [2006]....

    [...]

Journal ArticleDOI
TL;DR: The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high-dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application.
Abstract: Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and bioinformatics within the past few years. High-dimensional problems are common not only in genetics, but also in some areas of psychological research, where only a few subjects can be measured because of time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications and to provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high-dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated with freely available implementations in the R system for statistical computing.

2,001 citations

Book
21 Jul 2008
TL;DR: In step-by-step detail, Benjamin Bolker teaches ecology graduate students and researchers everything they need to know in order to use maximum likelihood, information-theoretic, and Bayesian techniques to analyze their own data using the programming language R.
Abstract: Ecological Models and Data in R is the first truly practical introduction to modern statistical methods for ecology. In step-by-step detail, the book teaches ecology graduate students and researchers everything they need to know in order to use maximum likelihood, information-theoretic, and Bayesian techniques to analyze their own data using the programming language R. Drawing on extensive experience teaching these techniques to graduate students in ecology, Benjamin Bolker shows how to choose among and construct statistical models for data, estimate their parameters and confidence limits, and interpret the results. The book also covers statistical frameworks, the philosophy of statistical modeling, and critical mathematical functions and probability distributions. It requires no programming background--only basic calculus and statistics.

1,626 citations

Journal ArticleDOI
TL;DR: Chapman and Miller as mentioned in this paper, Subset Selection in Regression (Monographs on Statistics and Applied Probability, no. 40, 1990) and Section 5.8.
Abstract: 8. Subset Selection in Regression (Monographs on Statistics and Applied Probability, no. 40). By A. J. Miller. ISBN 0 412 35380 6. Chapman and Hall, London, 1990. 240 pp. £25.00.

1,154 citations

Proceedings ArticleDOI
08 Apr 2013
TL;DR: A simple, scalable, and informative classification method is presented that identifies a small number of longitudinal engagement trajectories in MOOCs and compares learners in each trajectory and course across demographics, forum participation, video access, and reports of overall experience.
Abstract: As MOOCs grow in popularity, the relatively low completion rates of learners has been a central criticism. This focus on completion rates, however, reflects a monolithic view of disengagement that does not allow MOOC designers to target interventions or develop adaptive course features for particular subpopulations of learners. To address this, we present a simple, scalable, and informative classification method that identifies a small number of longitudinal engagement trajectories in MOOCs. Learners are classified based on their patterns of interaction with video lectures and assessments, the primary features of most MOOCs to date.In an analysis of three computer science MOOCs, the classifier consistently identifies four prototypical trajectories of engagement. The most notable of these is the learners who stay engaged through the course without taking assessments. These trajectories are also a useful framework for the comparison of learner engagement between different course structures or instructional approaches. We compare learners in each trajectory and course across demographics, forum participation, video access, and reports of overall experience. These results inform a discussion of future interventions, research, and design directions for MOOCs. Potential improvements to the classification mechanism are also discussed, including the introduction of more fine-grained analytics.

1,011 citations


Cites methods from "A Handbook of Statistical Analyses ..."

  • ...A oneway analysis of variance (ANOVA) is performed on each dimension (Table 4) and Tukey Honest Significant Differences (HSD) adjustments (pHSD) are used for post hoc pair-wise cluster comparisons (Table 6) [11]....

    [...]

References
More filters
Book ChapterDOI
TL;DR: In this article, the product-limit (PL) estimator was proposed to estimate the proportion of items in the population whose lifetimes would exceed t (in the absence of such losses), without making any assumption about the form of the function P(t).
Abstract: In lifetesting, medical follow-up, and other fields the observation of the time of occurrence of the event of interest (called a death) may be prevented for some of the items of the sample by the previous occurrence of some other event (called a loss). Losses may be either accidental or controlled, the latter resulting from a decision to terminate certain observations. In either case it is usually assumed in this paper that the lifetime (age at death) is independent of the potential loss time; in practice this assumption deserves careful scrutiny. Despite the resulting incompleteness of the data, it is desired to estimate the proportion P(t) of items in the population whose lifetimes would exceed t (in the absence of such losses), without making any assumption about the form of the function P(t). The observation for each item of a suitable initial event, marking the beginning of its lifetime, is presupposed. For random samples of size N the product-limit (PL) estimate can be defined as follows: L...

52,450 citations

Book
01 Jan 1993
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

37,183 citations

Journal ArticleDOI
TL;DR: This paper examines eight published reviews each reporting results from several related trials in order to evaluate the efficacy of a certain treatment for a specified medical condition and suggests a simple noniterative procedure for characterizing the distribution of treatment effects in a series of studies.

33,234 citations


"A Handbook of Statistical Analyses ..." refers background in this paper

  • ...DerSimonian and Laird (1986) derive a suitable estimator for τ̂(2), which is as follows;...

    [...]

BookDOI
01 Dec 2010
TL;DR: A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods.
Abstract: A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods The emphasis is on presenting practical problems and full analyses of real data sets

18,346 citations

Journal ArticleDOI
TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

17,111 citations


"A Handbook of Statistical Analyses ..." refers methods in this paper

  • ...A suitable procedure was first suggested by Liang and Zeger (1986) and is known as generalised estimating equations (GEE). In essence GEE is a multivariate extension of the generalised linear model and quasi-likelihood methods outlined in Chapter 6. The use of the latter leads to consistent inferences about mean responses without requiring specific assumptions to be made about second and higher order moments, thus avoiding intractable likelihood functions with possibly many nuisance parameters. Full details of the method are given in Liang and Zeger (1986) and Zeger and Liang (1986) but the primary idea behind the GEE approach is that since the parameters specifying the structure of the correlation matrix are rarely of great practical interest, simple structures are used for the within-subject correlations giving rise to the so-called working correlation matrix....

    [...]

  • ...A suitable procedure was first suggested by Liang and Zeger (1986) and is known as generalised estimating equations (GEE). In essence GEE is a multivariate extension of the generalised linear model and quasi-likelihood methods outlined in Chapter 6. The use of the latter leads to consistent inferences about mean responses without requiring specific assumptions to be made about second and higher order moments, thus avoiding intractable likelihood functions with possibly many nuisance parameters. Full details of the method are given in Liang and Zeger (1986) and Zeger and Liang (1986) but the primary idea behind the GEE approach is that since the parameters specifying the structure of the correlation matrix are rarely of great practical interest, simple structures are used for the within-subject correlations giving rise to the so-called working correlation matrix. Liang and Zeger (1986) show that the estimates of the parameters of most interest, i....

    [...]

  • ...A suitable procedure was first suggested by Liang and Zeger (1986) and is known as generalised estimating equations (GEE)....

    [...]