scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Microarray data normalization and transformation

01 Dec 2002-Nature Genetics (Nature Publishing Group)-Vol. 32, Iss: 4, pp 496-501
TL;DR: This review focuses on the much more mundane but indispensable tasks of 'normalizing' data from individual hybridizations to make meaningful comparisons of expression levels, and of 'transforming' them to select genes for further analysis and data mining.
Abstract: Underlying every microarray experiment is an experimental question that one would like to address. Finding a useful and satisfactory answer relies on careful experimental design and the use of a variety of data-mining tools to explore the relationships between genes or reveal patterns of expression. While other sections of this issue deal with these lofty issues, this review focuses on the much more mundane but indispensable tasks of 'normalizing' data from individual hybridizations to make meaningful comparisons of expression levels, and of 'transforming' them to select genes for further analysis and data mining.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The authors' data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycoleytic enzyme pyruvate kinase.
Abstract: The major cell classes of the brain differ in their developmental processes, metabolism, signaling, and function To better understand the functions and interactions of the cell types that comprise these classes, we acutely purified representative populations of neurons, astrocytes, oligodendrocyte precursor cells, newly formed oligodendrocytes, myelinating oligodendrocytes, microglia, endothelial cells, and pericytes from mouse cerebral cortex We generated a transcriptome database for these eight cell types by RNA sequencing and used a sensitive algorithm to detect alternative splicing events in each cell type Bioinformatic analyses identified thousands of new cell type-enriched genes and splicing isoforms that will provide novel markers for cell identification, tools for genetic manipulation, and insights into the biology of the brain For example, our data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycolytic enzyme pyruvate kinase This dataset will provide a powerful new resource for understanding the development and function of the brain To ensure the widespread distribution of these datasets, we have created a user-friendly website (http://webstanfordedu/group/barres_lab/brain_rnaseqhtml) that provides a platform for analyzing and comparing transciption and alternative splicing profiles for various cell classes in the brain

3,891 citations


Cites background or methods from "Microarray data normalization and t..."

  • ...Any FPKM that is 0.1 were set to 0.1 for fold enrichment calculations to avoid ratio inflation (Quackenbush, 2002)....

    [...]

  • ...1 for fold enrichment calculations to avoid ratio inflation (Quackenbush, 2002)....

    [...]

Journal ArticleDOI
TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.

2,141 citations


Cites methods from "Microarray data normalization and t..."

  • ...Normalization Normalization is the method of structuring database schema to minimize redundancy [26]....

    [...]

Journal ArticleDOI
TL;DR: EASE is a customizable software application for rapid biological interpretation of gene lists that result from the analysis of microarray, proteomics, SAGE and other high-throughput genomic data and is robust to varying methods of normalization, intensity calculation and statistical selection of genes.
Abstract: EASE is a customizable software application for rapid biological interpretation of gene lists that result from the analysis of microarray, proteomics, SAGE and other high-throughput genomic data. The biological themes returned by EASE recapitulate manually determined themes in previously published gene lists and are robust to varying methods of normalization, intensity calculation and statistical selection of genes. EASE is a powerful tool for rapidly converting the results of functional genomics studies from 'genes' to 'themes'.

1,985 citations


Cites background from "Microarray data normalization and t..."

  • ...Much work has addressed the issues of data normalization and statistical selection of the genes that are significantly modulated or clustered on the basis of expression profiles [2]....

    [...]

Book ChapterDOI
TL;DR: This chapter describes each component of the TM4 suite of open‐source tools for data management and reporting, image analysis, normalization and pipeline control, and data mining and visualization and includes a sample analysis walk‐through.
Abstract: Powerful specialized software is essential for managing, quantifying, and ultimately deriving scientific insight from results of a microarray experiment. We have developed a suite of software applications, known as TM4, to support such gene expression studies. The suite consists of open‐source tools for data managementandreporting,imageanalysis,normalizationandpipelinecontrol, and data mining and visualization. An integrated MIAME‐compliant MySQL database is included. This chapter describes each component of the suite and includes a sample analysis walk‐through.

1,931 citations

Journal ArticleDOI
TL;DR: In just a few years, microarrays have gone from obscurity to being almost ubiquitous in biological research, and points of consensus are emerging about the general approaches that warrant use and elaboration.
Abstract: In just a few years, microarrays have gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to a weekly deluge of papers that describe purportedly novel algorithms for analysing changes in gene expression. Although the many procedures that are available might be bewildering to biologists who wish to apply them, statistical geneticists are recognizing commonalities among the different methods. Many are special cases of more general models, and points of consensus are emerging about the general approaches that warrant use and elaboration.

1,349 citations

References
More filters
Journal ArticleDOI
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Abstract: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is de- scribed that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be inter- preted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly charac- terized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

16,371 citations

Book
01 Jun 1969
TL;DR: In this paper, Monte Carlo techniques are used to fit dependent and independent variables least squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood.
Abstract: Uncertainties in measurements probability distributions error analysis estimates of means and errors Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood. Appendices: numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal.

12,737 citations

Journal ArticleDOI
TL;DR: Numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal and Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-square fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood.
Abstract: Uncertainties in measurements probability distributions error analysis estimates of means and errors Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood. Appendices: numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal.

10,546 citations

Journal ArticleDOI
William S. Cleveland1
TL;DR: Robust locally weighted regression as discussed by the authors is a method for smoothing a scatterplot, in which the fitted value at z k is the value of a polynomial fit to the data using weighted least squares, where the weight for (x i, y i ) is large if x i is close to x k and small if it is not.
Abstract: The visual information on a scatterplot can be greatly enhanced, with little additional cost, by computing and plotting smoothed points. Robust locally weighted regression is a method for smoothing a scatterplot, (x i , y i ), i = 1, …, n, in which the fitted value at z k is the value of a polynomial fit to the data using weighted least squares, where the weight for (x i , y i ) is large if x i is close to x k and small if it is not. A robust fitting procedure is used that guards against deviant points distorting the smoothed points. Visual, computational, and statistical issues of robust locally weighted regression are discussed. Several examples, including data on lead intoxication, are used to illustrate the methodology.

10,225 citations

Book
01 Jan 1977
TL;DR: Simple linear regression Multiple linear regression Regression Diagnostics: Detection of Model Violations Qualitative Variables as Predictors Transformation of Variables Weighted Least Squares The Problem of Correlated Errors Analysis of Collinear Data Biased Estimation of Regression Coefficients Variable Selection Procedures Logistic Regression Appendix References as discussed by the authors
Abstract: Simple Linear Regression Multiple Linear Regression Regression Diagnostics: Detection of Model Violations Qualitative Variables as Predictors Transformation of Variables Weighted Least Squares The Problem of Correlated Errors Analysis of Collinear Data Biased Estimation of Regression Coefficients Variable Selection Procedures Logistic Regression Appendix References Index.

3,721 citations