scispace - formally typeset
Open AccessBook ChapterDOI

limma: Linear Models for Microarray Data

Gordon K. Smyth
- pp 397-420
Reads0
Chats0
TLDR
This chapter starts with the simplest replicated designs and progresses through experiments with two or more groups, direct designs, factorial designs and time course experiments with technical as well as biological replication.
Abstract
A survey is given of differential expression analyses using the linear modeling features of the limma package. The chapter starts with the simplest replicated designs and progresses through experiments with two or more groups, direct designs, factorial designs and time course experiments. Experiments with technical as well as biological replication are considered. Empirical Bayes test statistics are explained. The use of quality weights, adaptive background correction and control spots in conjunction with linear modelling is illustrated on the β7 data.

read more

Content maybe subject to copyright    Report

limma:
Linear Models for Microarray and RNA-Seq Data
User’s Guide
Gordon K. Smyth, Matthew Ritchie, Natalie Thorne,
James Wettenhall, Wei Shi and Yifang Hu
Bioinformatics Division, The Walter and Eliza Hall Institute
of Medical Research, Melbourne, Australia
First edition 2 December 2002
Last revised 14 November 2021
This free open-source software implements academic research
by the authors and co-workers. If you use it, please support
the project by citing the appropriate journal articles listed in
Section 2.1.

Contents
1 Introduction 5
2 Preliminaries 7
2.1 Citing limma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 How to get help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Quick Start 11
3.1 A brief introduction to R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Sample limma Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Data Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Reading Microarray Data 15
4.1 Scope of this Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Recommended Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 The Targets Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Reading Two-Color Intensity Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.5 Reading Single-Channel Agilent Intensity Data . . . . . . . . . . . . . . . . . . . . . . 19
4.6 Reading Illumina BeadChip Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.7 Image-derived Spot Quality Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.8 Reading Probe Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.9 Printer Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.10 The Spot Types File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Quality Assessment 24
6 Pre-Processing Two-Color Data 26
6.1 Background Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2 Within-Array Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Between-Array Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.4 Using Objects from the marray Package . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7 Filtering unexpressed probes 34
1

8 Linear Models Overview 36
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
8.2 Single-Channel Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
8.3 Common Reference Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8.4 Direct Two-Color Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9 Single-Channel Experimental Designs 41
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
9.2 Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
9.3 Several Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.4 Additive Models and Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.4.1 Paired Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.4.2 Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.5 Interaction Models: 2 × 2 Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . 44
9.5.1 Questions of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.5.2 Analysing as for a Single Factor . . . . . . . . . . . . . . . . . . . . . . . . . . 45
9.5.3 A Nested Interaction Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9.5.4 Classic Interaction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9.6 Time Course Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.6.1 Replicated time points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.6.2 Many time points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.7 Multi-level Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
10 Two-Color Experiments with a Common Reference 52
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
10.2 Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
10.3 Several Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
11 Direct Two-Color Experimental Designs 55
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
11.2 Simple Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
11.2.1 Replicate Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
11.2.2 Dye Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
11.3 A Correlation Approach to Technical Replication . . . . . . . . . . . . . . . . . . . . . 57
12 Separate Channel Analysis of Two-Color Data 59
13 Statistics for Differential Expression 61
13.1 Summary Top-Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
13.2 Fitted Model Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
13.3 Multiple Testing Across Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
14 Array Quality Weights 65
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
14.2 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
14.3 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
14.4 When to Use Array Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2

15 RNA-Seq Data 70
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
15.2 Making a count matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
15.3 Normalization and filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
15.4 Differential expression: limma-trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
15.5 Differential expression: voom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
15.6 Voom with sample quality weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
15.7 Differential splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
16 Two-Color Case Studies 75
16.1 Swirl Zebrafish: A Single-Group Experiment . . . . . . . . . . . . . . . . . . . . . . . 75
16.2 Apoa1 Knockout Mice: A Two-Group Common-Reference Experiment . . . . . . . . . 86
16.3 Weaver Mutant Mice: A Composite 2x2 Factorial Experiment . . . . . . . . . . . . . . 89
16.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
16.3.2 Sample Preparation and Hybridizations . . . . . . . . . . . . . . . . . . . . . . 89
16.3.3 Data input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
16.3.4 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
16.3.5 Quality Assessment and Normalization . . . . . . . . . . . . . . . . . . . . . . . 91
16.3.6 Setting Up the Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
16.3.7 Probe Filtering and Array Quality Weights . . . . . . . . . . . . . . . . . . . . 94
16.3.8 Differential expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
16.4 Bob1 Mutant Mice: Arrays With Duplicate Spots . . . . . . . . . . . . . . . . . . . . . 95
17 Single-Channel Case Studies 99
17.1 Lrp Mutant E. Coli Strain with Affymetrix Arrays . . . . . . . . . . . . . . . . . . . . 99
17.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
17.1.2 Downloading the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
17.1.3 Background correction and normalization . . . . . . . . . . . . . . . . . . . . . 100
17.1.4 Gene annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
17.1.5 Differential expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
17.2 Effect of Estrogen on Breast Cancer Tumor Cells: A 2x2 Factorial Experiment with
Affymetrix Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
17.3 Comparing Mammary Progenitor Cell Populations with Illumina BeadChips . . . . . . 107
17.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
17.3.2 The target RNA samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
17.3.3 The expression profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
17.3.4 How many probes are truly expressed? . . . . . . . . . . . . . . . . . . . . . . . 110
17.3.5 Normalization and filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
17.3.6 Within-patient correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
17.3.7 Differential expression between cell types . . . . . . . . . . . . . . . . . . . . . 111
17.3.8 Signature genes for luminal progenitor cells . . . . . . . . . . . . . . . . . . . . 112
17.4 Time Course Effects of Corn Oil on Rat Thymus with Agilent 4x44K Arrays . . . . . 113
17.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
17.4.2 Data availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
17.4.3 Reading the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
17.4.4 Gene annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3

17.4.5 Background correction and normalize . . . . . . . . . . . . . . . . . . . . . . . 115
17.4.6 Gene filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
17.4.7 Differential expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
17.4.8 Gene ontology analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
18 RNA-Seq Case Studies 119
18.1 Profiles of Yoruba HapMap Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
18.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
18.1.2 Data availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
18.1.3 Yoruba Individuals and FASTQ Files . . . . . . . . . . . . . . . . . . . . . . . 119
18.1.4 Mapping reads to the reference genome . . . . . . . . . . . . . . . . . . . . . . 121
18.1.5 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
18.1.6 DGEList object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
18.1.7 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
18.1.8 Scale normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
18.1.9 Linear modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
18.1.10 Gene set testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
18.1.11 Session information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
18.1.12 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
18.2 Differential Splicing after Pasilla Knockdown . . . . . . . . . . . . . . . . . . . . . . . 133
18.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
18.2.2 GEO samples and SRA Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
18.2.3 Mapping reads to the reference genome . . . . . . . . . . . . . . . . . . . . . . 134
18.2.4 Read counts for exons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
18.2.5 Assemble DGEList and sum counts for technical replicates . . . . . . . . . . . 135
18.2.6 Gene annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
18.2.7 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
18.2.8 Scale normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
18.2.9 Linear modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
18.2.10 Alternate splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
18.2.11 Session information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
18.2.12 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4

Citations
More filters
Journal ArticleDOI

A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity.

TL;DR: The best overall methods among those that generated reasonably low false positive rates, when permuting phenotypes, were PLAGE, GLOBALTEST, and PADOG, and the best method in the category that generated higher than expected false positives was MRGSE.
Journal ArticleDOI

Constitutive Expression of OsGH3.1 Reduces Auxin Content and Enhances Defense Response and Resistance to a Fungal Pathogen in Rice

TL;DR: Results indicate that OsGH3.1 overexpression reduces auxin content, inhibits cell growth and cell wall loosening, and enhances resistance to a fungal pathogen, and provides evidence that auxin homeostasis can regulate the activation of the defense response in rice.
Journal ArticleDOI

A Variant in the Neuropeptide Receptor npr-1 is a Major Determinant of Caenorhabditis elegans Growth and Physiology

TL;DR: It is suggested that variation in npr-1 has broad pleiotropic effects mediated by altered exposure to bacterial food, which might cause a weak starvation state and reduce growth rate and fecundity in Caenorhabditis elegans.
Journal ArticleDOI

PU.1 target genes undergo Tet2-coupled demethylation and DNMT3b-mediated methylation in monocyte-to-osteoclast differentiation

TL;DR: Key changes in DNA methylation during monocyte-to-osteoclast differentiation is identified and novel roles for PU.1 are revealed in this process, suggesting participation in driving hypermethylation and hydroxymethylation-mediated hypomethylation.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Journal ArticleDOI

Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments

TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Journal ArticleDOI

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation

TL;DR: This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments.
Journal ArticleDOI

Variance stabilization applied to microarray data calibration and to the quantification of differential expression.

TL;DR: A statistical model for microarray gene expression data that comprises data calibration, the quantifying of differential expression, and the quantification of measurement error is introduced, and a difference statistic Deltah whose variance is approximately constant along the whole intensity range is derived.
Journal ArticleDOI

Normalization of cDNA microarray data.

TL;DR: The print-tip loess normalization as mentioned in this paper is a well-tested general purpose normalization method which has given good results on a wide range of arrays and can be refined by using quality weights for individual spots.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What is the way to estimate the residual variability of a gene?

Including the dye-effect in the model in this way uses up one degree of freedom which might otherwise be used to estimate the residual variability, but it is valuable if many genes show non-negligible dye-effects. 

With Affymetrix or single-channel data, or with two-color with a common reference, you will need as many coefficients as you have distinct RNA sources, no more and no less. 

Oshlack et al [23] show that loess normalization can tolerate up to about 30% asymmetric differential expression while still giving good results. 

Spatial heterogeneity on individual arrays can be highlighted by examining imageplots of the background intensities, for example> imageplot(log2(RG$Gb[,1]),RG$printer)plots the green background for the first array. 

Marray provides some normalization methods which are not in limma including 2-D loess normalization and print-tip-scale normalization. 

If there are at least two arrays with each dye-orientation, then it is possible to estimate and adjust for any probe-specific dye effects. 

In these cases one should either use global "loess" normalization or else use robust spline normalization> MA <- normalizeWithinArrays(RG, method="robustspline")which is an empirical Bayes compromise between print-tip and global loess normalization, with 5- parameter regression splines used in place of the loess curves. 

If you use the read.ilmn, nec or neqc functions to process Illumina BeadChip data, please cite:Shi, W, Oshlack, A, and Smyth, GK (2010). 

If the same channel has been used for the common reference throughout the experiment, then the expression log-ratios may be analysed exactly as if they were log-expression values from a single channel experiment. 

The TIFF images have then been processed using an image analysis program such a ArrayVision, ImaGene, GenePix, QuantArray or SPOT to acquire the red and green foreground and background intensities for each spot.