scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Analyzing time series gene expression data

01 Nov 2004-Bioinformatics (Oxford University Press)-Vol. 20, Iss: 16, pp 2493-2503
TL;DR: This review is intended to serve as both, a point of reference for experimental biologists looking for practical solutions for analyzing their data, and a starting point for computer scientists interested in working on the computational problems related to time series expression analysis.
Abstract: Motivation: Time series expression experiments are an increasingly popular method for studying a wide range of biological systems. However, when analyzing these experiments researchers face many new computational challenges. Algorithms that are specifically designed for time series experiments are required so that we can take advantage of their unique features (such as the ability to infer causality from the temporal response pattern) and address the unique problems they raise (e.g. handling the different non-uniform sampling rates). Results: We present a comprehensive review of the current research in time series expression data analysis. We divide the computational challenges into four analysis levels: experimental design, data analysis, pattern recognition and networks. For each of these levels, we discuss computational and biological problems at that level and point out some of the methods that have been proposed to deal with these issues. Many open problems in all these levels are discussed. This review is intended to serve as both, a point of reference for experimental biologists looking for practical solutions for analyzing their data, and a starting point for computer scientists interested in working on the computational problems related to time series expression analysis.

Content maybe subject to copyright    Report

Citations
More filters
01 Mar 2001
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
Abstract: ‡We describe the use of singular value decomposition in transforming genome-wide expression data from genes 3 arrays space to reduced diagonalized ‘‘eigengenes’’ 3 ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.

1,815 citations

Journal ArticleDOI
TL;DR: In this paper, a two-regression step approach is proposed to identify genes that show different gene expression profiles across analytical groups in time-course experiments, where the experimental groups are identified by dummy variables and a variable selection strategy is applied to study differences between groups and to find statistically significant different profiles.
Abstract: Motivation: Multi-series time-course microarray experiments are useful approaches for exploring biological processes. In this type of experiments, the researcher is frequently interested in studying gene expression changes along time and in evaluating trend differences between the various experimental groups. The large amount of data, multiplicity of experimental conditions and the dynamic nature of the experiments poses great challenges to data analysis. Results: In this work, we propose a statistical procedure to identify genes that show different gene expression profiles across analytical groups in time-course experiments. The method is a two-regression step approach where the experimental groups are identified by dummy variables. The procedure first adjusts a global regression model with all the defined variables to identify differentially expressed genes, and in second a variable selection strategy is applied to study differences between groups and to find statistically significant different profiles. The methodology is illustrated on both a real and a simulated microarray dataset. Availability: The method has been implemented in the statistical language R and is freely available from the Bioconductor contributed packages repository and from http://www.ivia.es/centrogenomica/bioinformatics.htm Contact:[email protected]; [email protected]

365 citations

Proceedings ArticleDOI
14 Jun 2005
TL;DR: A novel algorithm, TRICLUSTER, for mining coherent clusters in three-dimensional (3D) gene expression datasets, which can mine arbitrarily positioned and overlapping clusters, and depending on different parameter values, it can mine different types of clusters.
Abstract: In this paper we introduce a novel algorithm called TRICLUSTER, for mining coherent clusters in three-dimensional (3D) gene expression datasets. TRICLUSTER can mine arbitrarily positioned and overlapping clusters, and depending on different parameter values, it can mine different types of clusters, including those with constant or similar values along each dimension, as well as scaling and shifting expression patterns. TRICLUSTER relies on graph-based approach to mine all valid clusters. For each time slice, i.e., a gene×sample matrix, it constructs the range multigraph, a compact representation of all similar value ranges between any two sample columns. It then searches for constrained maximal cliques in this multigraph to yield the set of bi-clusters for this time slice. Then TRICLUSTER constructs another graph using the biclusters (as vertices) from each time slice; mining cliques from this graph yields the final set of triclusters. Optionally, TRICLUSTER merges/deletes some clusters having large overlaps. We present a useful set of metrics to evaluate the clustering quality, and we show that TRICLUSTER can find significant triclusters in the real microarray datasets.

235 citations


Cites background from "Analyzing time series gene expressi..."

  • ...Besides biclustering along the gene-sample dimensions, there has been a lot of interest in mining gene expression patterns across time [4]....

    [...]

  • ...[4] Z. Bar-Joseph....

    [...]

  • ...For a more comprehensive look at time-series gene expres­sion analysis, see the recent paper by Bar-Joseph [4]....

    [...]

  • ...For a more comprehensive look at time-series gene expression analysis, see the recent paper by Bar-Joseph [4]....

    [...]

Journal ArticleDOI
TL;DR: The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis.
Abstract: Background The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval.

207 citations


Cites methods from "Analyzing time series gene expressi..."

  • ...Although LSA had its roots grounded in microbial community analysis, the technique can be readily applied to other biological time series data, such as replicated gene expression time series data from microarray and RNASeq experiments [27-29]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Abstract: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is de- scribed that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be inter- preted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly charac- terized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

16,371 citations

Journal ArticleDOI
15 Oct 1999-Science
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Abstract: Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

12,530 citations


"Analyzing time series gene expressi..." refers background in this paper

  • ...Many algorithms have been introduced for identifying genes differentially expressed between two experiments in the static expression case (Golub et al., 1999; Dudiot et al., 2004)....

    [...]

Journal ArticleDOI
TL;DR: A comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle is created, and it is found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins.
Abstract: We sought to create a comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle. To this end, we used DNA microarrays and samples from yeast cultures sync...

5,176 citations


"Analyzing time series gene expressi..." refers background or methods in this paper

  • ...Some of the time series experiments are performed to detect periodic genes (Spellman et al., 1998)....

    [...]

  • ...…clustering was used to determine function for unknown genes (Eisen et al., 1998), to look at expression programs for different systems in the cell (Spellman et al., 1998) and for identifying sets of genes that are specifically involved in a certain type of cancer or other diseases (Alon et al.,…...

    [...]

  • ..., 1998), to look at expression programs for different systems in the cell (Spellman et al., 1998) and for identifying sets of genes that are specifically involved in a certain type of cancer or other diseases (Alon et al....

    [...]

  • ...the three alpha cell cycle experiments (Spellman et al., 1998; Zhu et al., 2000; Pramilla et al., 2002) were sampled every 7, 15 and 10 min, respectively]....

    [...]

  • ...WT alpha (Spellman et al., 1998) Alpha mating factor 0–119 64 Every 7 min...

    [...]

Journal ArticleDOI
TL;DR: Analysis of genomic expression patterns in the yeast Saccharomyces cerevisiae implicated the transcription factors Yap1p, as well as Msn2p and Msn4p, in mediating specific features of the transcriptional response, while the identification of novel sequence elements provided clues to novel regulators.
Abstract: We explored genomic expression patterns in the yeast Saccharomyces cerevisiae responding to diverse environmental transitions. DNA microarrays were used to measure changes in transcript levels over time for almost every yeast gene, as cells responded to temperature shocks, hydrogen peroxide, the superoxide-generating drug menadione, the sulfhydryl-oxidizing agent diamide, the disulfide-reducing agent dithiothreitol, hyper- and hypo-osmotic shock, amino acid starvation, nitrogen source depletion, and progression into stationary phase. A large set of genes (approximately 900) showed a similar drastic response to almost all of these environmental changes. Additional features of the genomic responses were specialized for specific conditions. Promoter analysis and subsequent characterization of the responses of mutant strains implicated the transcription factors Yap1p, as well as Msn2p and Msn4p, in mediating specific features of the transcriptional response, while the identification of novel sequence elements provided clues to novel regulators. Physiological themes in the genomic responses to specific environmental stresses provided insights into the effects of those stresses on the cell.

4,836 citations


"Analyzing time series gene expressi..." refers background or methods in this paper

  • ...These include cell cycle double knockouts (Zhu et al., 2000; Pramilla et al., 2002) and knockouts under stress conditions (Gasch et al., 2000)....

    [...]

  • ...These methods include cluster analysis (Zhu et al., 2000; Gasch et al., 2000), in which clusters of genes are compared across the two experiments and generalized singular value decomposition (SVD) [presented by Alter et al. (2003)], which are also used to detect differences between sets of genes,…...

    [...]

  • ..., 2002) and knockouts under stress conditions (Gasch et al., 2000)....

    [...]

  • ...These methods include cluster analysis (Zhu et al., 2000; Gasch et al., 2000), in which clusters of genes are compared across the two experiments and generalized singular value decomposition (SVD) [presented by Alter et al....

    [...]

  • ..., 2002) and stress (Gasch et al., 2000)], they react by activating a new expression program....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a two-way clustering algorithm was applied to both the genes and the tissues, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues.
Abstract: Oligonucleotide arrays can provide a broad picture of the state of the cell, by monitoring the expression level of thousands of genes at the same time. It is of interest to develop techniques for extracting useful information from the resulting data sets. Here we report the application of a two-way clustering method for analyzing a data set consisting of the expression patterns of different cell types. Gene expres- sion in 40 tumor and 22 normal colon tissue samples was analyzed with an Affymetrix oligonucleotide array comple- mentary to more than 6,500 human genes. An efficient two- way clustering algorithm was applied to both the genes and the tissues, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues. Coregulated families of genes clustered together, as demonstrated for the ribosomal proteins. Clustering also separated cancerous from noncancerous tissue and cell lines from in vivo tissues on the basis of subtle distributed patterns of genes even when expression of individual genes varied only slightly between the tissues. Two-way clustering thus may be of use both in classifying genes into functional groups and in classifying tissues based on gene expression.

4,131 citations