scispace - formally typeset
Search or ask a question
Author

Roland Somogyi

Other affiliations: Ames Research Center
Bio: Roland Somogyi is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Boolean network & Gene expression. The author has an hindex of 14, co-authored 19 publications receiving 3489 citations. Previous affiliations of Roland Somogyi include Ames Research Center.

Papers
More filters
Proceedings Article
01 Jan 1998
TL;DR: This study investigates the possibility of completely infer a complex regulatory network architecture from input/output patterns of its variables using binary models of genetic networks, and finds the problem to be tractable within the conditions tested so far.
Abstract: Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.

1,031 citations

Journal ArticleDOI
TL;DR: A high-resolution temporal map of fluctuations in mRNA expression of 112 genes during rat central nervous system development, focusing on the cervical spinal cord, found that genes belonging to distinct functional classes and gene families clearly map to particular expression profiles.
Abstract: We used reverse transcription–coupled PCR to produce a high-resolution temporal map of fluctuations in mRNA expression of 112 genes during rat central nervous system development, focusing on the cervical spinal cord. The data provide a temporal gene expression “fingerprint” of spinal cord development based on major families of inter- and intracellular signaling genes. By using distance matrices for the pair-wise comparison of these 112 temporal gene expression patterns as the basis for a cluster analysis, we found five basic “waves” of expression that characterize distinct phases of development. The results suggest functional relationships among the genes fluctuating in parallel. We found that genes belonging to distinct functional classes and gene families clearly map to particular expression profiles. The concepts and data analysis discussed herein may be useful in objectively identifying coherent patterns and sequences of events in the complex genetic signaling network of development. Functional genomics approaches such as this may have applications in the elucidation of complex developmental and degenerative disorders.

669 citations

Proceedings ArticleDOI
01 Dec 1998
TL;DR: This work presents a linear modeling approach that allows one to infer interactions between all the genes included in the data set and can be used to generate interesting hypotheses to direct further experiments.
Abstract: Large-scale gene expression data sets are revolutionizing the field of functional genomics. However, few data analysis techniques fully exploit this entirely new class of data. We present a linear modeling approach that allows one to infer interactions between all the genes included in the data set. The resulting model can be used to generate interesting hypotheses to direct further experiments.

545 citations

Journal ArticleDOI
TL;DR: An introduction to Boolean networks and their relevance to present-day experimental research is provided, bringing us closer to an understanding of complex molecular physiological processes like brain development and intractable medical problems of immediate importance.
Abstract: Molecular genetics presents an increasingly complex picture of the genome and biological function. Evidence is mounting for distributed function, redundancy, and combinatorial coding in the regulation of genes. Satisfactory explanation will require the concept of a parallel processing signaling network. Here we provide an introduction to Boolean networks and their relevance to present-day experimental research. Boolean network models exhibit global complex behavior, self-organization, stability, redundancy and periodicity, properties that deeply characterize biological systems. While the life sciences must inevitably face the issue of complexity, we may well look to cybernetics for a modeling language such as Boolean networks which can manageably describe parallel processing biological systems and provide a framework for the growing accumulation of data. We finally discuss experimental strategies and database systems that will enable mapping of genetic networks. The synthesis of these approaches holds an immense potential for new discoveries on the intimate nature of genetic networks, bringing us closer to an understanding of complex molecular physiological processes like brain development, and intractable medical problems of immediate importance, such as neurodegenerative disorders, cancer, and a variety of genetic diseases.

365 citations

Proceedings Article
01 Jan 1998
TL;DR: This work presents a strategy for the analysis for large-scale quantitative gene expression measurement data from time course experiments that takes advantage of cluster analysis and graphical visualization methods to reveal correlated patterns of gene expression from time series data.
Abstract: The discovery of any new gene requires an analysis of the expression context for that gene. Now that the cDNA and genomic sequencing projects are progressing at such a rapid rate, high throughput gene expression screening approaches are beginning to appear to take advantage of that data. We present a strategy for the analysis for large-scale quantitative gene expression measurement data from time course experiments. Our approach takes advantage of cluster analysis and graphical visualization methods to reveal correlated patterns of gene expression from time series data. The coherence of these patterns suggests an order that conforms to a notion of shared pathways and control processes that can be experimentally verified.

221 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, a two-way clustering algorithm was applied to both the genes and the tissues, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues.
Abstract: Oligonucleotide arrays can provide a broad picture of the state of the cell, by monitoring the expression level of thousands of genes at the same time. It is of interest to develop techniques for extracting useful information from the resulting data sets. Here we report the application of a two-way clustering method for analyzing a data set consisting of the expression patterns of different cell types. Gene expres- sion in 40 tumor and 22 normal colon tissue samples was analyzed with an Affymetrix oligonucleotide array comple- mentary to more than 6,500 human genes. An efficient two- way clustering algorithm was applied to both the genes and the tissues, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues. Coregulated families of genes clustered together, as demonstrated for the ribosomal proteins. Clustering also separated cancerous from noncancerous tissue and cell lines from in vivo tissues on the basis of subtle distributed patterns of genes even when expression of individual genes varied only slightly between the tissues. Two-way clustering thus may be of use both in classifying genes into functional groups and in classifying tissues based on gene expression.

4,131 citations

Journal ArticleDOI
TL;DR: This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Abstract: Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled. We review a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology and discuss recent development...

4,123 citations

Journal ArticleDOI
TL;DR: A new framework for discovering interactions between genes based on multiple expression measurements is proposed and a method for recovering gene interactions from microarray data is described using tools for learning Bayesian networks.
Abstract: DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a "snapshot" of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).

3,507 citations

Journal ArticleDOI
TL;DR: In this article, the application of self-organizing maps, a type of mathematical cluster analysis that is particularly well suited for recognizing and classifying features in complex, multidi-mensional data, is described.
Abstract: Array technologies have made it straightfor- ward to monitor simultaneously the expression pattern of thousands of genes. The challenge now is to interpret such massive data sets. The first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of self-organizing maps, a type of mathematical cluster analysis that is particularly well suited for recognizing and classifying features in complex, multidi- mensional data. The method has been implemented in a publicly available computer package, GENECLUSTER, that per- forms the analytical calculations and provides easy data visualization. To illustrate the value of such analysis, the approach is applied to hematopoietic differentiation in four well studied models (HL-60, U937, Jurkat, and NB4 cells). Expression patterns of some 6,000 human genes were assayed, and an online database was created. GENECLUSTER was used to organize the genes into biologically relevant clusters that suggest novel hypotheses about hematopoietic differentia- tion—for example, highlighting certain genes and pathways involved in ''differentiation therapy'' used in the treatment of acute promyelocytic leukemia.

3,186 citations

01 Aug 2001
TL;DR: The study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence, is concentrated on in this work.
Abstract: With digital equipment becoming increasingly networked, either on wired or wireless networks, for personal and professional use alike, distributed software systems have become a crucial element in information and communications technologies. The study of these systems forms the core of the ARLES' work, which is specifically concerned with defining new system software architectures, based on the use of emerging networking technologies. In this context, we concentrate on the study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence.

2,774 citations