Book Chapter•DOI•

limma: Linear Models for Microarray Data

Q: What functions are used to process Illumina BeadChip data?

If you use the read.ilmn, nec or neqc functions to process Illumina BeadChip data, please cite:Shi, W, Oshlack, A, and Smyth, GK (2010).

Q: What is the way to compare two-color microarray experiments with a common reference?

If the same channel has been used for the common reference throughout the experiment, then the expression log-ratios may be analysed exactly as if they were log-expression values from a single channel experiment.

Q: What program is used to obtain the red and green foreground and background intensities for each?

The TIFF images have then been processed using an image analysis program such a ArrayVision, ImaGene, GenePix, QuantArray or SPOT to acquire the red and green foreground and background intensities for each spot.

Gordon K. Smyth

01 Jan 2005-pp 397-420

TL;DR: This chapter starts with the simplest replicated designs and progresses through experiments with two or more groups, direct designs, factorial designs and time course experiments with technical as well as biological replication.

read less

Abstract: A survey is given of differential expression analyses using the linear modeling features of the limma package. The chapter starts with the simplest replicated designs and progresses through experiments with two or more groups, direct designs, factorial designs and time course experiments. Experiments with technical as well as biological replication are considered. Empirical Bayes test statistics are explained. The use of quality weights, adaptive background correction and control spots in conjunction with linear modelling is illustrated on the β7 data.

...read moreread less

Content maybe subject to copyright Report

limma:

Linear Models for Microarray and RNA-Seq Data

User’s Guide

Gordon K. Smyth, Matthew Ritchie, Natalie Thorne,

James Wettenhall, Wei Shi and Yifang Hu

Bioinformatics Division, The Walter and Eliza Hall Institute

of Medical Research, Melbourne, Australia

First edition 2 December 2002

Last revised 14 November 2021

This free open-source software implements academic research

by the authors and co-workers. If you use it, please support

the project by citing the appropriate journal articles listed in

Section 2.1.

Contents

1 Introduction 5

2 Preliminaries 7

2.1 Citing limma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 How to get help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Quick Start 11

3.1 A brief introduction to R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Sample limma Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Data Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Reading Microarray Data 15

4.1 Scope of this Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Recommended Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.3 The Targets Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 Reading Two-Color Intensity Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.5 Reading Single-Channel Agilent Intensity Data . . . . . . . . . . . . . . . . . . . . . . 19

4.6 Reading Illumina BeadChip Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.7 Image-derived Spot Quality Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.8 Reading Probe Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.9 Printer Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.10 The Spot Types File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Quality Assessment 24

6 Pre-Processing Two-Color Data 26

6.1 Background Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.2 Within-Array Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.3 Between-Array Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.4 Using Objects from the marray Package . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Filtering unexpressed probes 34

8 Linear Models Overview 36

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8.2 Single-Channel Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

8.3 Common Reference Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.4 Direct Two-Color Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

9 Single-Channel Experimental Designs 41

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

9.2 Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

9.3 Several Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

9.4 Additive Models and Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

9.4.1 Paired Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

9.4.2 Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

9.5 Interaction Models: 2 × 2 Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . 44

9.5.1 Questions of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

9.5.2 Analysing as for a Single Factor . . . . . . . . . . . . . . . . . . . . . . . . . . 45

9.5.3 A Nested Interaction Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9.5.4 Classic Interaction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9.6 Time Course Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

9.6.1 Replicated time points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

9.6.2 Many time points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

9.7 Multi-level Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

10 Two-Color Experiments with a Common Reference 52

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

10.2 Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

10.3 Several Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

11 Direct Two-Color Experimental Designs 55

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

11.2 Simple Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

11.2.1 Replicate Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

11.2.2 Dye Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

11.3 A Correlation Approach to Technical Replication . . . . . . . . . . . . . . . . . . . . . 57

12 Separate Channel Analysis of Two-Color Data 59

13 Statistics for Diﬀerential Expression 61

13.1 Summary Top-Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

13.2 Fitted Model Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

13.3 Multiple Testing Across Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

14 Array Quality Weights 65

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

14.2 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

14.3 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

14.4 When to Use Array Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

15 RNA-Seq Data 70

15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

15.2 Making a count matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

15.3 Normalization and ﬁltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

15.4 Diﬀerential expression: limma-trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

15.5 Diﬀerential expression: voom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

15.6 Voom with sample quality weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

15.7 Diﬀerential splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

16 Two-Color Case Studies 75

16.1 Swirl Zebraﬁsh: A Single-Group Experiment . . . . . . . . . . . . . . . . . . . . . . . 75

16.2 Apoa1 Knockout Mice: A Two-Group Common-Reference Experiment . . . . . . . . . 86

16.3 Weaver Mutant Mice: A Composite 2x2 Factorial Experiment . . . . . . . . . . . . . . 89

16.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

16.3.2 Sample Preparation and Hybridizations . . . . . . . . . . . . . . . . . . . . . . 89

16.3.3 Data input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

16.3.4 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

16.3.5 Quality Assessment and Normalization . . . . . . . . . . . . . . . . . . . . . . . 91

16.3.6 Setting Up the Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

16.3.7 Probe Filtering and Array Quality Weights . . . . . . . . . . . . . . . . . . . . 94

16.3.8 Diﬀerential expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

16.4 Bob1 Mutant Mice: Arrays With Duplicate Spots . . . . . . . . . . . . . . . . . . . . . 95

17 Single-Channel Case Studies 99

17.1 Lrp Mutant E. Coli Strain with Aﬀymetrix Arrays . . . . . . . . . . . . . . . . . . . . 99

17.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

17.1.2 Downloading the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

17.1.3 Background correction and normalization . . . . . . . . . . . . . . . . . . . . . 100

17.1.4 Gene annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

17.1.5 Diﬀerential expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

17.2 Eﬀect of Estrogen on Breast Cancer Tumor Cells: A 2x2 Factorial Experiment with

Aﬀymetrix Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

17.3 Comparing Mammary Progenitor Cell Populations with Illumina BeadChips . . . . . . 107

17.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

17.3.2 The target RNA samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

17.3.3 The expression proﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

17.3.4 How many probes are truly expressed? . . . . . . . . . . . . . . . . . . . . . . . 110

17.3.5 Normalization and ﬁltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

17.3.6 Within-patient correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

17.3.7 Diﬀerential expression between cell types . . . . . . . . . . . . . . . . . . . . . 111

17.3.8 Signature genes for luminal progenitor cells . . . . . . . . . . . . . . . . . . . . 112

17.4 Time Course Eﬀects of Corn Oil on Rat Thymus with Agilent 4x44K Arrays . . . . . 113

17.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

17.4.2 Data availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

17.4.3 Reading the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

17.4.4 Gene annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

17.4.5 Background correction and normalize . . . . . . . . . . . . . . . . . . . . . . . 115

17.4.6 Gene ﬁltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

17.4.7 Diﬀerential expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

17.4.8 Gene ontology analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

18 RNA-Seq Case Studies 119

18.1 Proﬁles of Yoruba HapMap Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

18.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

18.1.2 Data availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

18.1.3 Yoruba Individuals and FASTQ Files . . . . . . . . . . . . . . . . . . . . . . . 119

18.1.4 Mapping reads to the reference genome . . . . . . . . . . . . . . . . . . . . . . 121

18.1.5 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

18.1.6 DGEList object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

18.1.7 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

18.1.8 Scale normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

18.1.9 Linear modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

18.1.10 Gene set testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

18.1.11 Session information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

18.1.12 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

18.2 Diﬀerential Splicing after Pasilla Knockdown . . . . . . . . . . . . . . . . . . . . . . . 133

18.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

18.2.2 GEO samples and SRA Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

18.2.3 Mapping reads to the reference genome . . . . . . . . . . . . . . . . . . . . . . 134

18.2.4 Read counts for exons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

18.2.5 Assemble DGEList and sum counts for technical replicates . . . . . . . . . . . 135

18.2.6 Gene annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

18.2.7 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

18.2.8 Scale normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

18.2.9 Linear modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

18.2.10 Alternate splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

18.2.11 Session information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

18.2.12 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

HTML Viewer

Frequently Asked Questions (10)

Q1. What is the way to estimate the residual variability of a gene?

Including the dye-effect in the model in this way uses up one degree of freedom which might otherwise be used to estimate the residual variability, but it is valuable if many genes show non-negligible dye-effects.

Q2. How many coefficients do you need to model the systematic part of your data?

With Affymetrix or single-channel data, or with two-color with a common reference, you will need as many coefficients as you have distinct RNA sources, no more and no less.

Q3. How can asymmetric differential expression be tolerated?

Oshlack et al [23] show that loess normalization can tolerate up to about 30% asymmetric differential expression while still giving good results.

Q4. What is the way to highlight the heterogeneity of an array?

Spatial heterogeneity on individual arrays can be highlighted by examining imageplots of the background intensities, for example> imageplot(log2(RG$Gb[,1]),RG$printer)plots the green background for the first array.

Q5. What are the normalization methods in limma?

Marray provides some normalization methods which are not in limma including 2-D loess normalization and print-tip-scale normalization.

Q6. What is the way to estimate the dye effects?

If there are at least two arrays with each dye-orientation, then it is possible to estimate and adjust for any probe-specific dye effects.

Q7. What is the way to normalize a print-tip array?

In these cases one should either use global "loess" normalization or else use robust spline normalization> MA <- normalizeWithinArrays(RG, method="robustspline")which is an empirical Bayes compromise between print-tip and global loess normalization, with 5- parameter regression splines used in place of the loess curves.

Q8. What functions are used to process Illumina BeadChip data?

If you use the read.ilmn, nec or neqc functions to process Illumina BeadChip data, please cite:Shi, W, Oshlack, A, and Smyth, GK (2010).

Q9. What is the way to compare two-color microarray experiments with a common reference?

If the same channel has been used for the common reference throughout the experiment, then the expression log-ratios may be analysed exactly as if they were log-expression values from a single channel experiment.

Q10. What program is used to obtain the red and green foreground and background intensities for each?

The TIFF images have then been processed using an image analysis program such a ArrayVision, ImaGene, GenePix, QuantArray or SPOT to acquire the red and green foreground and background intensities for each spot.

limma: Linear Models for Microarray Data

Citations

Cites methods from "limma: Linear Models for Microarray..."

Cites methods from "limma: Linear Models for Microarray..."

Cites background or methods from "limma: Linear Models for Microarray..."

Cites background from "limma: Linear Models for Microarray..."

References

"limma: Linear Models for Microarray..." refers methods in this paper

"limma: Linear Models for Microarray..." refers methods in this paper

"limma: Linear Models for Microarray..." refers background or methods in this paper

"limma: Linear Models for Microarray..." refers methods in this paper

Related Papers (5)

Frequently Asked Questions (10)

Q1. What is the way to estimate the residual variability of a gene?

Q2. How many coefficients do you need to model the systematic part of your data?

Q3. How can asymmetric differential expression be tolerated?

Q4. What is the way to highlight the heterogeneity of an array?

Q5. What are the normalization methods in limma?

Q6. What is the way to estimate the dye effects?

Q7. What is the way to normalize a print-tip array?

Q8. What functions are used to process Illumina BeadChip data?

Q9. What is the way to compare two-color microarray experiments with a common reference?

Q10. What program is used to obtain the red and green foreground and background intensities for each?