Exploration and Analysis of DNA Microarray and Protein Array Data

Open AccessBook

Exploration and Analysis of DNA Microarray and Protein Array Data

Chats0

TLDR

This book presents a meta-modelling framework for estimating the level of uncertainty in the results of cDNA Microarray experiments, as well as some of the techniques used to assess the quality of these experiments.

Abstract:

Preface.1 A Brief Introduction.1.1 A Note on Exploratory Data Analysis.1.2 Computing Considerations and Software.1.3 A Brief Outline of the Book.2 Genomics Basics.2.1 Genes.2.2 DNA.2.3 Gene Expression.2.4 Hybridization Assays and Other Laboratory Techniques.2.5 The Human Genome.2.6 Genome Variations and Their Consequences.2.7 Genomics.2.8 The Role of Genomics in Pharmaceutical Research.2.9 Proteins.2.10 Bioinformatics.Supplementary Reading.Exercises.3 Microarrays.3.1 Types of Microarray Experiments.3.1.1 Experiment Type 1: Tissue-Specific Gene Expression.3.1.2 Experiment Type 2: Developmental Genetics.3.1.3 Experiment Type 3: Genetic Diseases.3.1.4 Experiment Type 4: Complex Diseases.3.1.5 Experiment Type 5: Pharmacological Agents.3.1.6 Experiment Type 6: Plant Breeding.3.1.7 Experiment Type 7: Environmental Monitoring.3.2 A Very Simple Hypothetical Microarray Experiment.3.3 A Typical Microarray Experiment.3.3.1 Microarray Preparation.3.3.2 Sample Preparation.3.3.3 The Hybridization Step.3.3.4 Scanning the Microarray.3.3.5 Interpreting the Scanned Image.3.4 Multichannel cDNA Microarrays.3.5 Oligonucleotide Arrays.3.6 Bead-Based Arrays.3.7 Confirmation of Microarray Results.Supplementary Reading and Electronic References.Exercises.4 Processing the Scanned Image.4.1 Converting the Scanned Image to the Spotted Image.4.1.1 Gridding.4.1.2 Segmentation.4.1.3 Quantification.4.2 Quality Assessment.4.2.1 Visualizing the Spotted Image.4.2.2 Numerical Evaluation of Array Quality.4.2.3 Spatial Problems.4.2.4 Spatial Randomness.4.2.5 Quality Control of Arrays.4.2.6 Assessment of Spot Quality.4.3 Adjusting for Background.4.3.1 Estimating the Background.4.3.2 Adjusting for the Estimated Background.4.4 Expression Level Calculation for Two-Channel cDNA Microarrays.4.5 Expression Level Calculation for Oligonucleotide Arrays.4.5.1 The Average Difference.4.5.2 A Weighted Average Difference.4.5.3 Perfect Matches Only.4.5.4 Background Adjustment Approach.4.5.5 Model-Based Approach.4.5.6 Absent-Present Calls.Supplementary Reading.Exercises.5 Preprocessing Microarray Data.5.1 Logarithmic Transformation.5.2 Variance Stabilizing Transformations.5.3 Sources of Bias.5.4 Normalization.5.5 Intensity-Dependent Normalization.5.5.1 Smooth Function Normalization.5.5.2 Quantile Normalization.5.5.3 Normalization of Oligonucleotide Arrays.5.5.4 Normalization of Two-Channel Arrays.5.5.5 Spatial Normalization.5.5.6 Stagewise Normalization.5.6 Judging the Success of a Normalization.5.7 Outlier Identification.5.7.1 Nonresistant Rules for Outlier Identification.5.7.2 Resistant Rules for Outlier Identification.5.8 Assessing Replicate Array Quality.Exercises.6 Summarization.6.1 Replication.6.2 Technical Replicates.6.3 Biological Replicates.6.4 Experiments with Both Technical and Biological Replicates.6.5 Multiple Oligonucleotide Arrays.6.6 Estimating Fold Change in Two-Channel Experiments.6.7 Bayes Estimation of Fold Change.Exercises.7 Two-Group Comparative Experiments.7.1 Basics of Statistical Hypothesis Testing.7.2 Fold Changes.7.3 The Two-Sample t Test.7.4 Diagnostic Checks.7.5 Robust t Tests.7.6 Randomization Tests.7.7 The Mann-Whitney-Wilcoxon Rank Sum Test.7.8 Multiplicity.7.8.1 A Pragmatic Approach to the Issue of Multiplicity.7.8.2 Simple Multiplicity Adjustments.7.8.3 Sequential Multiplicity Adjustments.7.9 The False Discovery Rate.7.9.1 The Positive False Discovery Rate.7.10 Small Variance-Adjusted t Tests and SAM.7.10.1 Modifying the t Statistic.7.10.2 Assesing Significance with the SAM t Statistic.7.10.3 Strategies for Using SAM.7.10.4 An Empirical Bayes Framework.7.10.5 Understanding the SAM Adjustment.7.11 Conditional t.7.12 Borrowing Strength across Genes.7.12.1 Simple Methods.7.12.2 A Bayesian Model.7.13 Two-Channel Experiments.7.13.1 The Paired Sample t Test and SAM.7.13.2 Borrowing Strength via Hierarchical Modeling.Supplementary Reading.Exercises.8 Model-Based Inference and Experimental Design Considerations.8.1 The F Test.8.2 The Basic Linear Model.8.3 Fitting the Model in Two Stages.8.4 Multichannel Experiments.8.5 Experimental Design Considerations.8.5.1 Comparing Two Varieties with Two-Channel Microarrays.8.5.2 Comparing Multiple Varieties with Two-Channel Microarrays.8.5.3 Single-Channel Microarray Experiments.8.6 Miscellaneous Issues.Supplementary Reading.Exercises.9 Pattern Discovery.9.1 Initial Considerations.9.2 Cluster Analysis.9.2.1 Dissimilarity Measures and Similarity Measures.9.2.2 Guilt by Association.9.2.3 Hierarchical Clustering.9.2.4 Partitioning Methods.9.2.5 Model-Based Clustering.9.2.6 Chinese Restaurant Clustering.9.2.7 Discussion.9.3 Seeking Patterns Visually.9.3.1 Principal Components Analysis.9.3.2 Factor Analysis.9.3.3 Biplots.9.3.4 Spectral Map Analysis.9.3.5 Multidimensional Scaling.9.3.6 Projection Pursuit.9.3.7 Data Visualization with the Grand Tour and Projection Pursuit.9.4 Two-Way Clustering.9.4.1 Block Clustering.9.4.2 Gene Shaving.9.4.3 The Plaid Model.Software Notes.Supplementary Reading.Exercises.10 Class Prediction.10.1 Initial Considerations.10.1.1 Misclassification Rates.10.1.2 Reducing the Number of Classifiers.10.2 Linear Discriminant Analysis.10.3 Extensions of Fisher's LDA.10.4 Nearest Neighbors.10.5 Recursive Partitioning.10.5.1 Classification Trees.10.5.2 Activity Region Finding.10.6 Neural Networks.10.7 Support Vector Machines.10.8 Integration of Genomic Information.10.8.1 Integration of Gene Expression Data and Molecular Structure Data.10.8.2 Pathway Inference.Software Notes.Supplementary Reading.Exercises.11 Protein Arrays.11.1 Introduction.11.2 Protein Array Experiments.11.3 Special Issues with Protein Arrays.11.4 Analysis.11.5 Using Antibody Antigen Arrays to Measure Protein Concentrations.Exercises.References.Author Index.Subject Index.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Variable selection with error control: another look at stability selection

Rajen D. Shah, +1 more

- 01 Jan 2013 -

Journal of The Royal Statistical Society...

TL;DR: In this article, a variant of stability selection, called complementary pairs stability selection (CPSS), is introduced, and bounds are derived on the expected number of variables included by CPSS that have low selection probability under the original procedure.

...read moreread less

Journal ArticleDOI

A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion

Alessio Farcomeni

- 01 Aug 2008 -

Statistical Methods in Medical Research

TL;DR: This paper attempts to review research in modern multiple hypothesis testing with particular attention to the false discovery proportion, loosely defined as the number of false rejections divided by thenumber of rejections.

...read moreread less

Journal ArticleDOI

Enriched random forests

Dhammika Amaratunga, +2 more

- 15 Sep 2008 -

Bioinformatics

TL;DR: This work proposes a novel, yet simple, adjustment that has demonstrably superior performance: choose the eligible subsets at each node by weighted random sampling instead of simple random sampling, with the weights tilted in favor of the informative features.

...read moreread less

Journal ArticleDOI

Some Results on the Control of the False Discovery Rate under Dependence.

Alessio Farcomeni

- 01 Jun 2007 -

Scandinavian Journal of Statistics

TL;DR: In this article, it was shown that a certain degree of dependence is allowed among the test statistics, when the number of tests is large, with no need for any correction, and a way to conservatively estimate the proportion of false nulls, both under dependence and independence, and discuss the advantages of using such estimators when controlling the false discovery rate.

...read moreread less

Journal ArticleDOI

Evaluation of Normalization Methods to Pave the Way Towards Large-Scale LC-MS-Based Metabolomics Profiling Experiments

Bedilu Alamirie Ejigu, +7 more

- 03 Sep 2013 -

Omics A Journal of Integrative Biology

TL;DR: Data-driven normalization methods are the best option to normalize datasets from untargeted LC-MS experiments and increases the statistical power of the analysis, hence paving the way to increase the scale ofLC-MS metabolomics experiments.

...read moreread less