Home
/
Authors
/
Douglas M. Bates

Author

Douglas M. Bates

Other affiliations: Kansas State University, University of Alberta

Bio: Douglas M. Bates is an academic researcher from University of Wisconsin-Madison. The author has contributed to research in topics: Generalized linear mixed model & Random effects model. The author has an hindex of 36, co-authored 80 publications receiving 88022 citations. Previous affiliations of Douglas M. Bates include Kansas State University & University of Alberta.

Papers published on a yearly basis

2021
2020
2019
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2005
2004
2003
2001
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Balancing Type I Error and Power in Linear Mixed Models

[...]

Hannes Matuschek¹, Reinhold Kliegl¹, Shravan Vasishth¹, R. Harald Baayen², Douglas M. Bates³ - Show less +1 more•Institutions (3)

University of Potsdam¹, University of Tübingen², University of Wisconsin-Madison³

05 Nov 2015-arXiv: Applications

TL;DR: The authors showed that for typical psychological and psycholinguistic data, higher power is achieved without inflating Type I error rate if a model selection criterion is used to select a random effect structure that is supported by the data.

...read moreread less

Abstract: Linear mixed-effects models have increasingly replaced mixed-model analyses of variance for statistical inference in factorial psycholinguistic experiments. Although LMMs have many advantages over ANOVA, like ANOVAs, setting them up for data analysis also requires some care. One simple option, when numerically possible, is to fit the full variance-covariance structure of random effects (the maximal model; Barr et al. 2013), presumably to keep Type I error down to the nominal alpha in the presence of random effects. Although it is true that fitting a model with only random intercepts may lead to higher Type I error, fitting a maximal model also has a cost: it can lead to a significant loss of power. We demonstrate this with simulations and suggest that for typical psychological and psycholinguistic data, higher power is achieved without inflating Type I error rate if a model selection criterion is used to select a random effect structure that is supported by the data.

...read moreread less

330 citations

Linear and Nonlinear Mixed Effects Models [R package nlme version 3.1-149]

[...]

José C. Pinheiro, Douglas M. Bates, R-core

23 Aug 2020

279 citations

Journal Article•DOI•

Estimating the Multilevel Rasch Model: With the lme4 Package

[...]

Harold Doran, Douglas M. Bates, Paul Bliese, Maritza Dowling

22 Feb 2007-Journal of Statistical Software

TL;DR: In this article, the multilevel Rasch model with cross or partially crossed random effects is used to estimate the teacher x content strand interaction in an educational testing scenario, where students are grouped into classrooms and many test items share a common grouping structure such as a content strand or a reading passage.

...read moreread less

Abstract: Traditional Rasch estimation of the item and student parameters via marginal maximum likelihood, joint maximum likelihood or conditional maximum likelihood, assume individuals in clustered settings are uncorrelated and items within a test that share a grouping structure are also uncorrelated. These assumptions are often violated, particularly in educational testing situations, in which students are grouped into classrooms and many test items share a common grouping structure, such as a content strand or a reading passage. Consequently, one possible approach is to explicitly recognize the clustered nature of the data and directly incorporate random effects to account for the various dependencies. This article demonstrates how the multilevel Rasch model can be estimated using the functions in R for mixed-effects models with crossed or partially crossed random effects. We demonstrate how to model the following hierarchical data structures: a) individuals clustered in similar settings (e.g., classrooms, schools), b) items nested within a particular group (such as a content strand or a reading passage), and c) how to estimate a teacher x content strand interaction.

...read moreread less

144 citations

Linear Mixed-Effects Models using 'Eigen' and S4 [R package lme4 version 1.1-27.1]

[...]

Douglas M. Bates, Martin Maechler, Ben Bolker, Steven C. Walker

22 Jun 2021

140 citations

Journal Article•DOI•

Technical note: an R package for fitting generalized linear mixed models in animal breeding.

[...]

Ana I. Vazquez¹, Douglas M. Bates¹, Guilherme J. M. Rosa¹, Daniel Gianola¹, Kent A. Weigel¹ - Show less +1 more•Institutions (1)

University of Wisconsin-Madison¹

01 Feb 2010-Journal of Animal Science

TL;DR: The pedigreemm package of R was developed as an extension of the lme4 package, and allows mixed models with correlated random effects to be fitted for Gaussian, binary, and count responses.

...read moreread less

Abstract: Mixed models have been used extensively in quantitative genetics to study continuous and discrete traits. A standard quantitative genetic model proposes that the effects of levels of some random factor (e.g., sire) are correlated accordingly with their relationships. For this reason, routines for mixed models available in standard packages cannot be used for genetic analysis. The pedigreemm package of R was developed as an extension of the lme4 package, and allows mixed models with correlated random effects to be fitted for Gaussian, binary, and count responses. Following the method of Harville and Callanan (1989), a correlation between levels of the grouping factor (e.g., sire) is induced by post-multiplying the incidence matrix of the levels of this random factor by the Cholesky factor of the corresponding (co)variance matrix (e.g., the numerator relationship matrix between sires). Estimation methods available in pedigreemm include approximations to maximum likelihood and REML. This note describes the classes of models that can be fitted using pedigreemm and presents examples that illustrate its use.

...read moreread less

136 citations

…
1
2
3
4
5
6
7
…
8
9
10
11
12
13
14
15
16

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Fitting Linear Mixed-Effects Models Using lme4

[...]

Douglas M. Bates, Martin Mächler, Benjamin M. Bolker, Steven C. Walker

07 Oct 2015-Journal of Statistical Software

TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.

...read moreread less

Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

...read moreread less

50,607 citations

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Max Planck Society¹, Harvard University²

05 Dec 2014-Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

47,038 citations

Journal Article•DOI•

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

[...]

Mark D. Robinson¹, Davis J. McCarthy¹, Gordon K. Smyth¹•Institutions (1)

Walter and Eliza Hall Institute of Medical Research¹

01 Jan 2010-Bioinformatics

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.

...read moreread less

Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

...read moreread less

29,413 citations

Journal Article•DOI•

limma powers differential expression analyses for RNA-sequencing and microarray studies

[...]

Matthew E. Ritchie¹, Belinda Phipson², Di Wu³, Yifang Hu¹, Charity W. Law⁴, Wei Shi¹, Gordon K. Smyth¹, Gordon K. Smyth⁵ - Show less +4 more•Institutions (5)

Walter and Eliza Hall Institute of Medical Research¹, Royal Children's Hospital², Harvard University³, University of Zurich⁴, University of Melbourne⁵

20 Apr 2015-Nucleic Acids Research

TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

22,147 citations

Posted Content•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Wolfgang Huber, Simon Anders•Institutions (1)

Harvard University¹

17 Nov 2014-bioRxiv

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

...read moreread less

17,014 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse