scispace - formally typeset
Search or ask a question

Showing papers by "Satoru Miyano published in 2010"


Journal ArticleDOI
Thomas J. Hudson1, Thomas J. Hudson2, Warwick Anderson3, Axel Aretz4  +270 moreInstitutions (92)
15 Apr 2010
TL;DR: Systematic studies of more than 25,000 cancer genomes will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
Abstract: The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.

2,041 citations


Journal ArticleDOI
TL;DR: The analysis of a Japanese male using high-throughput sequencing to ×40 coverage suggests that considerable variation remains undiscovered in the human genome and that whole-genome sequencing is an invaluable tool for obtaining a complete understanding of human genetic variation.
Abstract: Tatsuhiko Tsunoda and Hidewaki Nakagawa report the genome sequence of a Japanese male individual generated with high-throughput sequencing technology. Their analyses reveal a number of novel single nucleotide variants, insertions, deletions and other variants.

118 citations


Journal ArticleDOI
TL;DR: The latest version of Cell Illustrator 4.0 as discussed by the authors uses Java Web Start technology and is enhanced with new capabilities, including automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to
Abstract: Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions.

66 citations



Journal Article
TL;DR: Empirical evaluations demonstrate that the proposed algorithm can learn optimal Bayesian networks for some graphs containing several hundreds of vertices, and even for super-structures having a high average degree, which is a drastic improvement in feasibility over the previous optimal algorithm.
Abstract: We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the super-structure. The previously derived constrained optimal search (COS) remains limited even for sparse super-structures. To extend its feasibility, we propose to divide the super-structure into several clusters and perform an optimal search on each of them. Further, to ensure acyclicity, we introduce the concept of ancestral constraints (ACs) and derive an optimal algorithm satisfying a given set of ACs. Finally, we theoretically derive the necessary and sufficient sets of ACs to be considered for finding an optimal constrained graph. Empirical evaluations demonstrate that our algorithm can learn optimal Bayesian networks for some graphs containing several hundreds of vertices, and even for super-structures having a high average degree (up to four), which is a drastic improvement in feasibility over the previous optimal algorithm. Learnt networks are shown to largely outperform state-of-the-art heuristic algorithms both in terms of score and structural hamming distance.

45 citations


Proceedings ArticleDOI
01 Jan 2010
TL;DR: A new calculation technique for EM algorithm that does not require the calculation of inverse matrices is introduced, which is applied to time course microarray data of lung cells treated by stimulating EGF receptors and dosing an anticancer drug, Gefitinib.
Abstract: We propose a state space representation of vector autoregressive model and its sparse learning based on L1 regularization to achieve efficient estimation of dynamic gene networks based on time course microarray data. The proposed method can overcome drawbacks of the vector autoregressive model and state space model; the assumption of equal time interval and lack of separation ability of observation and systems noises in the former method and the assumption of modularity of network structure in the latter method. However, in a simple implementation the proposed model requires the calculation of large inverse matrices in a large number of times during parameter estimation process based on EM algorithm. This limits the applicability of the proposed method to a relatively small gene set. We thus introduce a new calculation technique for EM algorithm that does not require the calculation of inverse matrices. The proposed method is applied to time course microarray data of lung cells treated by stimulating EGF receptors and dosing an anticancer drug, Gefitinib. By comparing the estimated network with the control network estimated using non-treated lung cells, perturbed genes by the anticancer drug could be found, whose up- and down-stream genes in the estimated networks may be related to side effects of the anticancer drug.

29 citations


Book ChapterDOI
31 Aug 2010
TL;DR: The concept of Granger causality is described and recent advances and applications in gene expression regulatory networks are explored by using extensions of Vector Autoregressive models.
Abstract: Understanding the molecular biological processes underlying disease onset requires a detailed description of which genes are expressed at which time points and how their products interact in so-called cellular networks. High-throughput technologies, such as gene expression analysis using DNA microarrays, have been extensively used with this purpose. As a consequence, mathematical methods aiming to infer the structure of gene networks have been proposed in the last few years. Granger causality-based models are among them, presenting well established mathematical interpretations to directionality at the edges of the regulatory network. Here, we describe the concept of Granger causality and explore recent advances and applications in gene expression regulatory networks by using extensions of Vector Autoregressive models.

27 citations


Journal ArticleDOI
10 Nov 2010-PLOS ONE
TL;DR: A phosphoproteomics-based methodology for characterizing the regulatory mechanism underlying aberrant EGFR signaling using computational network modeling is reported, which could facilitate the development of a systematic strategy toward controlling disease-related cell signaling.
Abstract: Background Mutation of the epidermal growth factor receptor (EGFR) results in a discordant cell signaling, leading to the development of various diseases. However, the mechanism underlying the alteration of downstream signaling due to such mutation has not yet been completely understood at the system level. Here, we report a phosphoproteomics-based methodology for characterizing the regulatory mechanism underlying aberrant EGFR signaling using computational network modeling. Methodology/Principal Findings Our phosphoproteomic analysis of the mutation at tyrosine 992 (Y992), one of the multifunctional docking sites of EGFR, revealed network-wide effects of the mutation on EGF signaling in a time-resolved manner. Computational modeling based on the temporal activation profiles enabled us to not only rediscover already-known protein interactions with Y992 and internalization property of mutated EGFR but also further gain model-driven insights into the effect of cellular content and the regulation of EGFR degradation. Our kinetic model also suggested critical reactions facilitating the reconstruction of the diverse effects of the mutation on phosphoproteome dynamics. Conclusions/Significance Our integrative approach provided a mechanistic description of the disorders of mutated EGFR signaling networks, which could facilitate the development of a systematic strategy toward controlling disease-related cell signaling.

19 citations


Journal ArticleDOI
TL;DR: This work proposes an integrated approach for inferring multiple gene networks from time series expression data under varying conditions and proposes a state-of-the-art parameter estimation method, relevance-weighted recursive elastic net, for providing higher precision and recall than existing reverse-engineering methods.
Abstract: Motivation: Elucidating the differences between cellular responses to various biological conditions or external stimuli is an important challenge in systems biology. Many approaches have been developed to reverse engineer a cellular system, called gene network, from time series microarray data in order to understand a transcriptomic response under a condition of interest. Comparative topological analysis has also been applied based on the gene networks inferred independently from each of the multiple time series datasets under varying conditions to find critical differences between these networks. However, these comparisons often lead to misleading results, because each network contains considerable noise due to the limited length of the time series. Results: We propose an integrated approach for inferring multiple gene networks from time series expression data under varying conditions. To the best of our knowledge, our approach is the first reverse-engineering method that is intended for transcriptomic network comparison between varying conditions. Furthermore, we propose a state-of-the-art parameter estimation method, relevance-weighted recursive elastic net, for providing higher precision and recall than existing reverse-engineering methods. We analyze experimental data of MCF-7 human breast cancer cells stimulated by epidermal growth factor or heregulin with several doses and provide novel biological hypotheses through network comparison. Availability: The software NETCOMP is available at http://bonsai.ims.u-tokyo.ac.jp/~shima/NETCOMP/. Contact: shima@ims.u-tokyo.ac.jp Supplementary information:Supplementary data are available at Bioinformatics online.

19 citations


Journal ArticleDOI
TL;DR: It is shown that the detection rate of border quarantine was low and the timing of the intervention was the most important factor involved in the control of the pandemic, with the maximum reduction in daily cases obtained after interventions started on day 6 or 11.
Abstract: We simulated the early phase of the 2009 influenza A(H1N1) pandemic and assessed the effectiveness of public health interventions in Japan. We show that the detection rate of border quarantine was low and the timing of the intervention was the most important factor involved in the control of the pandemic, with the maximum reduction in daily cases obtained after interventions started on day 6 or 11. Early interventions were not always effective.

18 citations


Journal ArticleDOI
TL;DR: A likelihood ratio test with Bartlett correction is proposed in order to identify Granger causality between sets of time series gene expression data and is shown to be significantly faster and statistically powerful even within non-Normal distributions.
Abstract: Summary: We propose a likelihood ratio test (LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrap-based approach. LRT is shown to be significantly faster and statistically powerful even within non-Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests is also provided. Availability: http://dnagarden.ims.u-tokyo.ac.jp/afujita/en/doku.php?id=ggranger. Contact: andrefujita@riken.jp Supplementary information:Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: The multivariate Granger causality concept is generalized in order to identify Granger causalities between sets of gene expressions, i.e. whether a set of n genes Granger-causes another set of m genes, aiming at identifying the flow of information between gene networks (or pathways).
Abstract: Wiener and Granger have introduced an intuitive concept of causality (Granger causality) between two variables which is based on the idea that an effect never occurs before its cause. Later, Geweke generalized this concept to a multivariate Granger causality, i.e. n variables Granger-cause another variable. Although Granger causality is not "effective causality" in the Aristothelic sense, this concept is useful to infer directionality and information flow in observational data. Granger causality is usually identified by using VAR (Vector Autoregressive) models due to their simplicity. In the last few years, several VAR-based models were presented in order to model gene regulatory networks. Here, we generalize the multivariate Granger causality concept in order to identify Granger causalities between sets of gene expressions, i.e. whether a set of n genes Granger-causes another set of m genes, aiming at identifying the flow of information between gene networks (or pathways). The concept of Granger causality for sets of variables is presented. Moreover, a method for its identification with a bootstrap test is proposed. This method is applied in simulated and also in actual biological gene expression data in order to model regulatory networks. This concept may be useful for the understanding of the complete information flow from one network or pathway to the other, mainly in regulatory networks. Linking this concept to graph theory, sink and source can be generalized to node sets. Moreover, hub and centrality for sets of genes can be defined based on total information flow. Another application is in annotation, when the functionality of a set of genes is unknown, but this set is Granger-caused by another set of genes which is well studied. Therefore, this information may be useful to infer or construct some hypothesis about the unknown set of genes.

Journal ArticleDOI

[...]

TL;DR: The release of Java-based software (DA 1.0) with an intuitive and user-friendly interface to allow users to carry out parameters estimation using DA.
Abstract: Summary: Data assimilation (DA) is a computational approach that estimates unknown parameters in a pathway model using time-course information. Particle filtering, the underlying method used, is a well-established statistical method that approximates the joint posterior distributions of parameters by using sequentially generated Monte Carlo samples. In this article, we report the release of Java-based software (DA 1.0) with an intuitive and user-friendly interface to allow users to carry out parameters estimation using DA. Availability and Implementation: DA 1.0 was developed using Java and thus would be executable on any platform installed with JDK 6.0 (not JRE 6.0) or later. DA 1.0 is freely available for academic users and can be launched or downloaded from http://da.csml.org. Contact: masao@ims.u-tokyo.ac.jp

Journal ArticleDOI
TL;DR: A new grid-layout algorithm based on the spring embedder algorithm that can handle location information and provide layouts with harmonized appearance is proposed and applied to three biological pathways; endothelial cell model, Fas-induced apoptosis model, and C. elegans cell fate simulation model.
Abstract: Graph drawing is one of the important techniques for understanding biological regulations in a cell or among cells at the pathway level. Among many available layout algorithms, the spring embedder algorithm is widely used not only for pathway drawing but also for circuit placement and www visualization and so on because of the harmonized appearance of its results. For pathway drawing, location information is essential for its comprehension. However, complex shapes need to be taken into account when torus-shaped location information such as nuclear inner membrane, nuclear outer membrane, and plasma membrane is considered. Unfortunately, the spring embedder algorithm cannot easily handle such information. In addition, crossings between edges and nodes are usually not considered explicitly. We proposed a new grid-layout algorithm based on the spring embedder algorithm that can handle location information and provide layouts with harmonized appearance. In grid-layout algorithms, the mapping of nodes to grid points that minimizes a cost function is searched. By imposing positional constraints on grid points, location information including complex shapes can be easily considered. Our layout algorithm includes the spring embedder cost as a component of the cost function. We further extend the layout algorithm to enable dynamic update of the positions and sizes of compartments at each step. The new spring embedder-based grid-layout algorithm and a spring embedder algorithm are applied to three biological pathways; endothelial cell model, Fas-induced apoptosis model, and C. elegans cell fate simulation model. From the positional constraints, all the results of our algorithm satisfy location information, and hence, more comprehensible layouts are obtained as compared to the spring embedder algorithm. From the comparison of the number of crossings, the results of the grid-layout-based algorithm tend to contain more crossings than those of the spring embedder algorithm due to the positional constraints. For a fair comparison, we also apply our proposed method without positional constraints. This comparison shows that these results contain less crossings than those of the spring embedder algorithm. We also compared layouts of the proposed algorithm with and without compartment update and verified that latter can reach better local optima.

Journal ArticleDOI
TL;DR: 1 Bach PB, Brabin L, Stretch R, et al.

Proceedings ArticleDOI
01 Jan 2010
TL;DR: This EEM based meta-analysis successfully revealed a prevailing cancer transcriptional network which functions in a large fraction of cancer transcriptomes; they include cell-cycle and immune related sub-networks.
Abstract: Although microarray technology has revealed transcriptomic diversities underlining various cancer phenotypes, transcriptional programs controlling them have not been well elucidated. To decode transcriptional programs governing cancer transcriptomes, we have recently developed a computational method termed EEM, which searches for expression modules from prescribed gene sets defined by prior biological knowledge like TF binding motifs. In this paper, we extend our EEM approach to predict cancer transcriptional networks. Starting from functional TF binding motifs and expression modules identified by EEM, we predict cancer transcriptional networks containing regulatory TFs, associated GO terms, and interactions between TF binding motifs. To systematically analyze transcriptional programs in broad types of cancer, we applied our EEM-based network prediction method to 122 microarray datasets collected from public databases. The data sets contain about 15000 experiments for tumor samples of various tissue origins including breast, colon, lung etc. This EEM based meta-analysis successfully revealed a prevailing cancer transcriptional network which functions in a large fraction of cancer transcriptomes; they include cell-cycle and immune related sub-networks. This study demonstrates broad applicability of EEM, and opens a way to comprehensive understanding of transcriptional networks in cancer cells.

Journal ArticleDOI
TL;DR: A method to incorporate the concept of time for the inclusion of dynamics of signaling pathway in a Petri net model, i.e., to use timed Petri nets, and the suitability of this algorithm has been confirmed by the results of an application to the IL-1 signaling pathway.
Abstract: This paper proposes a method to incorporate the concept of time for the inclusion of dynamics of signaling pathway in a Petri net model, i.e., to use timed Petri nets. Incorporation of delay times into a Petri net model makes it possible to conduct quantitative evaluation on a target signaling pathway. However, experimental data describing detailed reactions are not available in most cases. An algorithm given in this paper determines delay times of a timed Petri net only from the structural information of it. The suitability of this algorithm has been confirmed by the results of an application to the IL-1 signaling pathway.

Journal ArticleDOI
TL;DR: This paper proposes a method to determine firing delay times of transitions for Petri net models of signaling pathways by introducing stochastic decision rules and enables to determine the range of firing delay time which realizes smooth token flows in the PetriNet model of a signaling pathway.
Abstract: Parameter determination is important in modeling and simulating biological pathways including signaling pathways. Parameters are determined according to biological facts obtained from biological experiments and scientific publications. However, such reliable data describing detailed reactions are not reported in most cases. This prompted us to develop a general methodology of determining the parameters of a model in the case of that no information of the underlying biological facts is provided. In this study, we use the Petri net approach for modeling signaling pathways, and propose a method to determine firing delay times of transitions for Petri net models of signaling pathways by introducing stochastic decision rules. Petri net technology provides a powerful approach to modeling and simulating various concurrent systems, and recently have been widely accepted as a description method for biological pathways. Our method enables to determine the range of firing delay time which realizes smooth token flows in the Petri net model of a signaling pathway. The availability of this method has been confirmed by the results of an application to the interleukin-1 induced signaling pathway.

Journal ArticleDOI
TL;DR: It is suggested for the first time that the sigma(F) activation in the prespore might be switched off by the decrease in the ratio of AA to AB after the transient genetic asymmetry is to an end by completion of DNA translocation into the Prespore.
Abstract: The prespore-specific activation of sigma factor SigF (sigma(F)) in Bacillus subtilis has been explained mainly by two factors, i.e., the transient genetic asymmetry and the volume difference between the mother cell and the prespore. Here, we systematically surveyed the effect of these two factors on sporulation using a quantitative modeling and simulation architecture named hybrid functional Petri net with extension (HFPNe). Considering the fact that the transient genetic asymmetry and the volume difference in sporulation of B. subtilis finally bring about the concentration difference in two proteins SpoIIAB (AB) and SpoIIAA (AA) between the mother cell and the prespore, we have surveyed the effect of AB and AA concentration on the prespore-specific activation of sigma(F) occurring in the early stage of sporulation. Our results show that the prespore-specific activation of sigma(F) could be governed by the ratio of AA to AB rather than their concentrations themselves. Our model also suggests that B. subtilis could maximize the ratio of AA to AB in the prespore and minimize it in the mother cell by employing both the transient genetic asymmetry and the volume difference simultaneously. This might give a good explanation to the co-occurrence of the transient asymmetry and the volume difference during sporulation of B. subtilis. In addition, we suggest for the first time that the sigma(F) activation in the prespore might be switched off by the decrease in the ratio of AA to AB after the transient genetic asymmetry is to an end by completion of DNA translocation into the prespore.

Journal ArticleDOI
09 Jun 2010-PLOS ONE
TL;DR: BEEM is introduced as a powerful tool for decoding regulatory programs from a compendium of gene expression profiles by showing that, when applied to expression profiles of human multiple tissues, BEEM finds expression modules missed by two existing approaches that are based on the coherent expression and the single tissue-specific differential expression.
Abstract: Decoding transcriptional programs governing transcriptomic diversity across human multiple tissues is a major challenge in bioinformatics. To address this problem, a number of computational methods have focused on cis-regulatory codes driving overexpression or underexpression in a single tissue as compared to others. On the other hand, we recently proposed a different approach to mine cis-regulatory codes: starting from gene sets sharing common cis-regulatory motifs, the method screens for expression modules based on expression coherence. However, both approaches seem to be insufficient to capture transcriptional programs that control gene expression in a subset of all samples. Especially, this limitation would be serious when analyzing multiple tissue data. To overcome this limitation, we developed a new module discovery method termed BEEM (Biclusering-based Extraction of Expression Modules) in order to discover expression modules that are functional in a subset of tissues. We showed that, when applied to expression profiles of human multiple tissues, BEEM finds expression modules missed by two existing approaches that are based on the coherent expression and the single tissue-specific differential expression. From the BEEM results, we obtained new insights into transcriptional programs controlling transcriptomic diversity across various types of tissues. This study introduces BEEM as a powerful tool for decoding regulatory programs from a compendium of gene expression profiles.

Proceedings ArticleDOI
01 Jan 2010
TL;DR: Both quantitative and qualitative comparisons among three major gene expression quantification techniques are presented, namely: CAGE, illumina microarray and Real Time RT-PCR, by showing that the quantitative values of each method are not interchangeable, however, each of them has unique characteristics which render all of them essential and complementary.
Abstract: Several technologies are currently used for gene expression profiling, such as Real Time RT-PCR, microarray and CAGE (Cap Analysis of Gene Expression). CAGE is a recently developed method for constructing transcriptome maps and it has been successfully applied to analyzing gene expressions in diverse biological studies. The principle of CAGE has been developed to address specific issues such as determination of transcriptional starting sites, the study of promoter regions and identification of new transcripts. Here, we present both quantitative and qualitative comparisons among three major gene expression quantification techniques, namely: CAGE, illumina microarray and Real Time RT-PCR, by showing that the quantitative values of each method are not interchangeable, however, each of them has unique characteristics which render all of them essential and complementary. Understanding the advantages and disadvantages of each technology will be useful in selecting the most appropriate technique for a determined purpose.

Journal ArticleDOI
TL;DR: Building ASTD is a useful means to convert a hybrid model dealing with discrete, continuous and more complicated events to finite time-dependent states and various analytical approaches can be applied to obtain new insights into not only systematic mechanisms but also dynamics.
Abstract: Background With an accumulation of in silico data obtained by simulating large-scale biological networks, a new interest of research is emerging for elucidating how living organism functions over time in cells. Investigating the dynamic features of current computational models promises a deeper understanding of complex cellular processes. This leads us to develop a method that utilizes structural properties of the model over all simulation time steps. Further, user-friendly overviews of dynamic behaviors can be considered to provide a great help in understanding the variations of system mechanisms.

Proceedings ArticleDOI
31 May 2010
TL;DR: This research is the first study to theoretically characterize missing genes in gene networks and practically utilize this information to refine network estimation.
Abstract: In the estimation of gene networks from microarray gene expression data, we propose a statistical method for quantification of the hidden confounders in gene networks, which were possibly removed from the set of genes on the gene networks or are novel biological elements that are not measured by microarrays. Due to high computational cost of the structural learning of Bayesian networks and the limited source of the microarray data, it is usual to perform gene selection prior to the estimation of gene networks. Therefore, there exist missing genes that decrease accuracy and interpretability of the estimated gene networks. The proposed method can identify hidden confounders based on the conflicts of the estimated local Bayesian network structures and estimate their ideal profiles based on the proposed Bayesian networks with hidden variables with an EM algorithm. From the estimated ideal profiles, we can identify genes which are missing in the network or suggest the existence of the novel biological elements if the ideal profiles are not significantly correlated with any expression profiles of genes. To the best of our knowledge, this research is the first study to theoretically characterize missing genes in gene networks and practically utilize this information to refine network estimation.

Journal ArticleDOI
TL;DR: This study introduces MIEA as a broadly applicable gene set screening tool for mining regulatory programs from transcriptome data because it can detect singular expression profiles that the other methods fail to find, and performs broadly well for various types of input data.
Abstract: Motivation: A number of unsupervised gene set screening methods have recently been developed for search of putative functional gene sets based on their expression profiles. Most of the methods statistically evaluate whether the expression profiles of each gene set are fit to assumed models: e.g. co-expression across all samples or a subgroup of samples. However, it is possible that they fail to capture informative gene sets whose expression profiles are not fit to the assumed models. Results: To overcome this limitation, we propose a model-free unsupervised gene set screening method, Matrix Information Enrichment Analysis (MIEA). Without assuming any specific models, MIEA screens gene sets based on information richness of their expression profiles. We extensively compared the performance of MIEA to those of other unsupervised gene set screening methods, using various types of simulated and real data. The benchmark tests demonstrated that MIEA can detect singular expression profiles that the other methods fail to find, and performs broadly well for various types of input data. Taken together, this study introduces MIEA as a broadly applicable gene set screening tool for mining regulatory programs from transcriptome data. Contact: aniida@ims.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: Through the applications, the whole strategy of biological state space model analysis involving experimental design of time-course data, model building and analysis of the estimated networks are shown, including discovery of drug mode of action.
Abstract: Since time-course microarray data are short but contain a large number of genes, most of statistical models should be extended so that they can handle such statistically irregular situations. We introduce biological state space models that are established as suitable computational models for constructing gene networks from microarray gene expression data. This chapter elucidates theory and methodology of our biological state space models together with some representative analyses including discovery of drug mode of action. Through the applications we show the whole strategy of biological state space model analysis involving experimental design of time-course data, model building and analysis of the estimated networks.

Proceedings ArticleDOI
01 Jan 2010
TL;DR: A statistical model realizing simultaneous estimation of gene regulatory network and gene module identification from time series gene expression data from microarray experiments is proposed and verified the effectiveness of the proposed model.
Abstract: We propose a statistical model realizing simultaneous estimation of gene regulatory network and gene module identification from time series gene expression data from microarray experiments. Under the assumption that genes in the same module are densely connected, the proposed method detects gene modules based on the variational Bayesian technique. The model can also incorporate existing biological prior knowledge such as protein subcellular localization. We apply the proposed model to the time series data from a synthetically generated network and verified the effectiveness of the proposed model. The proposed model is also applied the time series microarray data from HeLa cell. Detected gene module information gives the great help on drawing the estimated gene network.


Proceedings ArticleDOI
01 Dec 2010
TL;DR: This work proposes a statistical method for uncovering gene pathways that characterize cancer heterogeneity that can reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes.
Abstract: We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the sparse probabilistic principal component analysis. A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.

Proceedings ArticleDOI
01 Dec 2010
TL;DR: A novel statistical method, NetworkProfiler, is developed, which aims to infer modulator-dependent gene regulatory networks from a collection of gene expression data and builds a sequence of gene networks with the change of a specific cancer characteristic such as cancer progression by reordering the samples according to the specified cancer characteristic.
Abstract: Construction and analysis of molecular networks for each cancer patient is a promising strategy for making individual risk predictions and treatment decisions in cancer therapy. Systems biology enables us to reconstruct a gene network within a cell from gene expression data. However, from a collection of gene expression data for clinical samples, traditional methods including Bayesian networks can only provide an “averaged” network across the samples. Therefore, these methods do not find patient-specific varying structures of molecular networks during a change of cancer characteristic. Here we develop a novel statistical method, NetworkProfiler, which aims to infer modulator-dependent gene regulatory networks from a collection of gene expression data. NetworkProfiler builds a sequence of gene networks with the change of a specific cancer characteristic such as cancer progression by reordering the samples according to the specified cancer characteristic. We applied NetworkProfiler to microarray gene expression data of 762 cancer cell lines and extracted system changes related to epithelial-mesenchymal transition (EMT). NetworkProfiler identified 25 regulators for E-cadherin, an EMT maker, from 1732 candidate regulators and about half of them are supported by literature. Also, EMT-dependent regulations for gene sets of adhesion, migration, and metastasis were predicted by NetworkProfiler.

Book ChapterDOI
18 Nov 2010
TL;DR: Numerical experiments indicate that constraint optimal search outperforms state-of-the-art heuristic algorithms in terms of accuracy, even if the super-structure is also learned by data.
Abstract: Optimal search on Bayesian network structure is known as an NP-hard problem and the applicability of existing optimal algorithms is limited in small Bayesian networks with 30 nodes or so To learn larger Bayesian networks from observational data, some heuristic algorithms were used, but only a local optimal structure is found and its accuracy is not high in many cases In this paper, we review optimal search algorithms in a constraint search space; The skeleton of the learned Bayesian network is a sub-graph of the given undirected graph called super-structure The introduced optimal search algorithm can learn Bayesian networks with several hundreds of nodes when the degree of super-structure is around four Numerical experiments indicate that constraint optimal search outperforms state-of-the-art heuristic algorithms in terms of accuracy, even if the super-structure is also learned by data