Showing papers by "Satoru Miyano published in 2002"

PDF

Open Access

Journal Article•DOI•

[...]

Michiel J. L. de Hoon¹, Seiya Imoto¹, Satoru Miyano¹•Institutions (1)

01 Jan 2002-Genome Informatics

TL;DR: An improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix is created, and a Python and a Perl interface to the C Clustering Library is generated, thereby combining the flexibility of a scripting language with the speed of C.

...read moreread less

Abstract: SUMMARY We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C. AVAILABILITY The C Clustering Library and the corresponding Python C extension module Pycluster were released under the Python License, while the Perl module Algorithm::Cluster was released under the Artistic License. The GUI code Cluster 3.0 for Windows, Macintosh and Linux/Unix, as well as the corresponding command-line program, were released under the same license as the original Cluster code. The complete source code is available at http://bonsai.ims.u-tokyo.ac.jp/mdehoon/software/cluster. Alternatively, Algorithm::Cluster can be downloaded from CPAN, while Pycluster is also available as part of the Biopython distribution.

...read moreread less

1,493 citations

Journal Article•DOI•

Extensive feature detection of N-terminal protein sorting signals.

[...]

Hideo Bannai¹, Yoshinori Tamada², Osamu Maruyama, Kenta Nakai¹, Satoru Miyano¹ - Show less +1 more•Institutions (2)

University of Tokyo¹, Tokai University²

01 Feb 2002-Bioinformatics

TL;DR: This work has succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form.

...read moreread less

Abstract: Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005‐1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules. Availability: An (experimental) web service using rules obtained by our method is provided at http:

...read moreread less

721 citations

Proceedings Article•DOI•

Inferring gene regulatory networks from time-ordered gene expression data of Bacillus subtilis using differential equations.

[...]

Michiel J. L. de Hoon¹, Seiya Imoto, Kazuo Kobayashi, Naotake Ogasawara, Satoru Miyano - Show less +1 more•Institutions (1)

University of Tokyo¹

01 Dec 2002

TL;DR: This work proposes to infer the degree of sparseness of the gene regulatory network from the data, where Akaike's Information Criterion is used to determine which coefficients are nonzero in a linear system of differential equations.

...read moreread less

Abstract: We describe a new method to infer a gene regulatory network, in terms of a linear system of differential equations, from time course gene expression data. As biologically the gene regulatory network is known to be sparse, we expect most coefficients in such a linear system of differential equations to be zero. In previously proposed methods, the number of nonzero coefficients in the system was limited based on ad hoc assumptions. Instead, we propose to infer the degree of sparseness of the gene regulatory network from the data, where we use Akaike's Information Criterion to determine which coefficients are nonzero. We apply our method to MMGE time course data of Bacillus subtilis.

...read moreread less

210 citations

Book Chapter•DOI•

Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data Using Differential Equations

[...]

Michiel J. L. de Hoon¹, Seiya Imoto¹, Satoru Miyano¹•Institutions (1)

University of Tokyo¹

24 Nov 2002

TL;DR: This work proposes to infer the degree of sparseness of the gene regulatory network from the data, where it determines which coefficients are nonzero by using Akaike's Information Criterion.

...read moreread less

Abstract: Spurred by advances in cDNA microarray technology, gene expression data are increasingly becoming available. In time-ordered data, the expression levels are measured at several points in time following some experimental manipulation. A gene regulatory network can be inferred by fitting a linear system of differential equations to the gene expression data. As biologically the gene regulatory network is known to be sparse, we expect most coefficients in such a linear system of differential equations to be zero. In previously proposed methods to infer such a linear system, ad hoc assumptions were made to limit the number of nonzero coefficients in the system. Instead, we propose to infer the degree of sparseness of the gene regulatory network from the data, where we determine which coefficients are nonzero by using Akaike's Information Criterion.

...read moreread less

76 citations

Journal Article•DOI•

Association between single-nucleotide polymorphisms in selectin genes and immunoglobulin A nephropathy.

[...]

Takashi Takei¹, Aritoshi Iida, Kosaku Nitta, Toshihiro Tanaka, Yozo Ohnishi, Ryo Yamada, Shiro Maeda, Tatsuhiko Tsunoda, Sachiyo Takeoka, Kyoko Ito, Kazuho Honda, Keiko Uchida, Ken Tsuchiya, Yasushi Suzuki², Tomoaki Fujioka², Takashi Ujiie, Yutaka Nagane, Satoru Miyano¹, Ichiei Narita³, Fumitake Gejyo³, Hiroshi Nihei, Yusuke Nakamura¹ - Show less +18 more•Institutions (3)

University of Tokyo¹, Iwate Medical University², Niigata University³

01 Mar 2002-American Journal of Human Genetics

TL;DR: The results suggest that these eight SNPs in selectin genes may be useful for screening populations susceptible to the IgAN phenotype that involves interstitial infiltration.

...read moreread less

Abstract: Although intensive efforts have been undertaken to elucidate the genetic background of immunoglobulin A nephropathy (IgAN), genetic factors associated with the pathogenesis of this disease are still not well understood. We designed a case-control association study that was based on linkage disequilibrium among single-nucleotide polymorphisms (SNPs) in the selectin gene cluster on chromosome 1q24-25, and we found two SNPs in the E-selectin gene (SELE8 and SELE13) and six SNPs in the L-selectin gene (SELL1, SELL4, SELL5, SELL6, SELL10, and SELL11) that were significantly associated with IgAN in Japanese patients. All eight SNPs were in almost complete linkage disequilibrium. SELE8 and SELL10 caused amino acid substitutions from His to Tyr and from Pro to Ser (χ2=9.02, P=.0026, odds ratio = 2.73 [95% confidence interval {CI} 1.38–5.38] for His-to-Tyr substitutions; χ2=17.4, P=.000031, odds ratio = 3.61 [95% CI 1.91–6.83] for Pro-to-Ser substitutions), and SELL1 could affect promoter activity of the L-selectin gene (χ2=19.5, P=.000010, odds ratio = 3.77 [95% CI 2.02–7.05]). The TGT haplotype at these three loci was associated significantly with IgAN (χ2=18.67, P=.000016, odds ratio = 1.88 [95% CI 1.41–2.51]). Our results suggest that these eight SNPs in selectin genes may be useful for screening populations susceptible to the IgAN phenotype that involves interstitial infiltration.

...read moreread less

72 citations

Journal Article•DOI•

Statistical analysis of a small set of time-ordered gene expression data using linear splines.

[...]

M J L De Hoon¹, Seiya Imoto¹, Satoru Miyano¹•Institutions (1)

University of Tokyo¹

01 Nov 2002-Bioinformatics

TL;DR: This work uses the maximum likelihood method together with Akaike's Information Criterion to fit linear splines to a small set of time-ordered gene expression data in order to infer statistically meaningful information from the measurements.

...read moreread less

Abstract: Motivation: Recently, the temporal response of genes to changes in their environment has been investigated using cDNA microarray technology by measuring the gene expression levels at a small number of time points. Conventional techniques for time series analysis are not suitable for such a short series of time-ordered data. The analysis of gene expression data has therefore usually been limited to a fold-change analysis, instead of a systematic statistical approach. Methods: We use the maximum likelihood method together with Akaike’s Information Criterion to fit linear splines to a small set of time-ordered gene expression data in order to infer statistically meaningful information from the measurements. The significance of measured gene expression data is assessed using Student’s t-test. Results: Previous gene expression measurements of the cyanobacterium Synechocystis sp. PCC6803 were reanalyzed using linear splines. The temporal response was identified of many genes that had been missed by a fold-change analysis. Based on our statistical analysis, we found that about four gene expression measurements or more are needed at each time point. Availability: An extension module for Python to calculate linear spline functions is available at http://bonsai.ims. u-tokyo.ac.jp/∼mdehoon. This software package (with patent pending) is free of charge for academic use only.

...read moreread less

69 citations

Proceedings Article•DOI•

Boundary Formation by Notch Signaling in Drosophila Multicellular Systems: Experimental Observations and Gene Network Modeling by Genomic Object Net

[...]

Hiroshi Matsuno¹, Ryutaro Murakami, Rie Yamane, Naoyuki Yamasaki, Sachie Fujita, Haruka Yoshimori, Satoru Miyano - Show less +3 more•Institutions (1)

Yamaguchi University¹

01 Dec 2002

TL;DR: Simulation results suggest that parameter values representing the strength of cell-autonomous suppression of Notch signaling by Delta are essential for generating two different modes of patterning: lateral inhibition and boundary formation, which could explain how a common gene regulatory network results in two different patterning modes in vivo.

...read moreread less

Abstract: The Delta-Notch signaling system plays an essential role in various morphogenetic systems of multicellular animal development. Here we analyzed the mechanism of Notch-dependent boundary formation in the Drosophila large intestine, by experimental manipulation of Delta expression and computational modeling and simulation by Genomic Object Net. Boundary formation representing the situation in normal large intestine was shown by the simulation. By manipulating Delta expression in the large intestine, a few types of disorder in boundary cell differentiation were observed, and similar abnormal patterns were generated by the simulation. Simulation results suggest that parameter values representing the strength of cell-autonomous suppression of Notch signaling by Delta are essential for generating two different modes of patterning: lateral inhibition and boundary formation, which could explain how a common gene regulatory network results in two different patterning modes in vivo. Genomic Object Net proved to be a useful and flexible biosimulation system that is suitable for analyzing complex biological phenomena such as patternings of multicellular systems as well as intracellular changes in cell states including metabolic activities, gene regulation, and enzyme reactions.

...read moreread less

46 citations

Patent•

Biological discovery using gene regulatory networks generated from multiple-disruption expression libraries

[...]

Seiya Imoto, Takao Goto, Satoru Miyano, Kousuke Tashiro, Michiel J. L. de Hoon, Christopher J. Savoie, Satoru Kuhara - Show less +3 more

26 Sep 2002

TL;DR: In this article, the authors proposed a method for the analysis of complex biological information, including gene networks, using Boolean inferential methods and Bayesian methods, for determining cause and effect relationships between expressed genes, and for determining upstream effectors of regulated genes.

...read moreread less

Abstract: Embodiments of this invention include application of new inferential methods to analysis of complex biological information, including gene networks. In some embodiments, disruptant data and/or drug induction/inhibition data are obtained simultaneously for a number of genes in ' an organism. New methods include modifications of Boolean inferential methods and application of those methods to determining relationships between expressed genes in organisms. Additional new methods include modifications of Bayesian inferential methods and application of those methods to determining cause and effect relationships between expressed genes, and in some embodiments, for determining upstream effectors of regulated genes. Additional modifications of Bayesian methods include use of heterogeneous variance and different curve fitting methods, including spline functions, to improve estimation of graphs of networks of expressed genes. Other embodiments include the use of bootstrapping methods and determination of edge effects to more accurately provide network information between expressed genes. Methods of this invention were validated using information obtained from prior studies, as well as from newly carried out studies of gene expression.

...read moreread less

35 citations

Proceedings Article•DOI•

Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network

[...]

Seiya Imoto¹, Kim Sunyong¹, Takahiro Goto¹, Sachiyo Aburatani², Kosuke Tashiro², Satoru Kuhara², Satoru Miyano¹ - Show less +3 more•Institutions (2)

University of Tokyo¹, Kyushu University²

14 Aug 2002

TL;DR: A new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network is proposed and a new graph selection criterion from Bayesian approach in general situations is theoretically derived.

...read moreread less

Abstract: We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. A problem still remains to be solved in selecting an optimal graph, which gives the best representation of the system among genes. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes.

...read moreread less

32 citations

Journal Article•DOI•

Single-nucleotide polymorphisms in the class II region of the major histocompatibility complex in Japanese patients with immunoglobulin A nephropathy

[...]

Fumihiro Akiyama¹, Toshihiro Tanaka, Ryo Yamada, Yozo Ohnishi, Tatsuhiko Tsunoda, Shiro Maeda, Takashi Takei, Wataru Obara¹, Kyoko Ito, Kazuho Honda, Keiko Uchida, Ken Tsuchiya, Kosaku Nitta, Wako Yumura, Hiroshi Nihei, Takashi Ujiie, Yutaka Nagane, Satoru Miyano¹, Yasushi Suzuki², Tomoaki Fujioka², Ichiei Narita³, Fumitake Gejyo³, Yusuke Nakamura¹ - Show less +19 more•Institutions (3)

University of Tokyo¹, Iwate Medical University², Niigata University³

01 Oct 2002-Journal of Human Genetics

TL;DR: The data imply that some haplotype of the HLA-DRA locus has an important role in the development of IgAN in Japanese patients, and this relationship between HLA class II genes and IgAN is investigated.

...read moreread less

Abstract: Immunoglobulin A nephropathy (IgAN) is a form of chronic glomerulonephritis of unknown etiology and pathogenesis. Immunogenetic studies have not conclusively indicated that human leukocyte antigen (HLA) is involved. As a first step in investigating a possible relationship between HLA class II genes and IgAN, we analyzed the extent of linkage disequilibrium (LD) in this region of chromosome 6p21.3 in a Japanese test population and found extended LD blocks within the class II locus. We designed a case-control association study of single-nucleotide polymorphisms (SNPs) in each of those LD blocks, and determined that SNPs located in the HLA-DRA gene were significantly associated with an increased risk of IgAN (P = 0.000001, odds ratio = 1.91 [95% confidence interval 1.46–2.49]); SNPs in other LD blocks were not. Our data imply that some haplotype of the HLA-DRA locus has an important role in the development of IgAN in Japanese patients.

...read moreread less

31 citations

Journal Article•DOI•

Bootstrap Analysis of Gene Networks Based on Bayesian Networks and Nonparametric Regression

[...]

Seiya Imoto¹, Sun Yong Kim¹, Hidetoshi Shimodaira², Sachiyo Aburatani³, Kousuke Tashiro³, Satoru Kuhara³, Satoru Miyano¹ - Show less +3 more•Institutions (3)

University of Tokyo¹, Tokyo Institute of Technology², Kyushu University³

01 Jan 2002-Genome Informatics

TL;DR: The method for measuring the reliability of the estimated gene network by using the bootstrap method is proposed, which shows good results in both the accuracy and the efficiency of the estimation.

...read moreread less

Abstract: The development of the microarray technology provides us a huge amount of gene expression profiles. The estimation of a gene network has received considerable attention in the field of bioinformatics and several methodologies have been proposed such as the Boolean network [1], the Bayesian network [3, 4, 5] and so on. In this paper, we propose the method for measuring the reliability of the estimated gene network by using the bootstrap method [2].

...read moreread less

Journal Article•DOI•

A string pattern regression algorithm and its application to pattern discovery in long introns.

[...]

Hideo Bannai¹, Shunsuke Inenaga¹, Ayumi Shinohara¹, Masayuki Takeda¹, Satoru Miyano¹ - Show less +1 more•Institutions (1)

University of Tokyo¹

01 Jan 2002-Genome Informatics

TL;DR: A new approach to pattern discovery called string pattern regression is presented, where a data set is given that consists of a string attribute and an objective numerical attribute, and an exact but efficient branch-and-bound algorithm is presented which is applicable to various pattern classes.

...read moreread less

Abstract: We present a new approach to pattern discovery called string pattern regression, where we are given a data set that consists of a string attribute and an objective numerical attribute. The problem is to find the best string pattern that divides the data set in such a way that the distribution of the numerical attribute values of the set for which the pattern matches the string attribute, is most distinct, with respect to some appropriate measure, from the distribution of the numerical attribute values of the set for which the pattern does not match the string attribute. By solving this problem, we are able to discover, at the same time, a subset of the data whose objective numerical attributes are significantly different from rest of the data, as well as the splitting rule in the form of a string pattern that is conserved in the subset. Although the problem can be solved in linear time for the substring pattern class, the problem is NP-hard in the general case (i.e. more complex patterns), and we present an exact but efficient branch-and-bound algorithm which is applicable to various pattern classes. We apply our algorithm to intron sequences of human, mouse, fly, and zebrafish, and show the practicality of our approach and algorithm. We also discuss possible extensions of our algorithm, as well as promising applications, such as microarray gene expression data.

...read moreread less

Journal Article•DOI•

Dynamic Bayesian Network and Nonparametric Regression Model for Inferring Gene Networks

[...]

Sun Yong Kim¹, Seiya Imoto¹, Satoru Miyano¹•Institutions (1)

University of Tokyo¹

01 Jan 2002-Genome Informatics

TL;DR: A dynamic Bayesian network and nonparametric regression model for estimating a gene network with cyclic regulations from time series microarray data is proposed and a criterion for selecting a network from Bayes approach is derived.

...read moreread less

Abstract: A Bayesian network is a powerful tool for modeling relations among a large number of random variables. Therefore the Bayesian network has received considerable attention from the studies of gene network estimation using microarray gene expression data. Imoto et al. [1, 2] proposed a Bayesian network and nonparametric regression model for capturing nonlinear relations between genes from the continuous gene expression data. However, a Bayesian network still has a problem that it cannot construct cyclic regulations, while real gene networks have cyclic regulations. For a solution of this problem, in this paper, we propose a dynamic Bayesian network and nonparametric regression model for estimating a gene network with cyclic regulations from time series microarray data. We also derive a criterion for selecting a network from Bayes approach. The effectiveness of our method is displayed though the analysis of the Saccharomyces cerevisiae gene expression data.

...read moreread less

Proceedings Article•DOI•

Intrasplicing--analysis of long intron sequences.

[...]

Sascha Ott¹, Yoshinori Tamada¹, Hideo Bannai, Kenta Nakai¹, Satoru Miyano¹ - Show less +1 more•Institutions (1)

University of Tokyo¹

01 Dec 2002

TL;DR: A new computational method for the analysis of DNA sequences with respect to splicing is developed and several results are derived indicating that intrasplicing may be an appropriate model for the splicing of at least part of the long intron sequences.

...read moreread less

Abstract: We propose a new model for the splicing of long introns, which we call intrasplicing. The basic idea of this model is that the splicing of long introns may be facilitated by the splicing of inner parts of the intron prior to the splicing of the long intron itself. Since long introns have up to about 100,000 bases, this model seems to be a likely explanation of their splicing. To investigate the possibility of this model, we develop a new computational method for the analysis of DNA sequences with respect to splicing. We analyze the genomic sequence of four species with our method and derive several results indicating that intrasplicing may be an appropriate model for the splicing of at least part of the long intron sequences.

...read moreread less

Journal Article•

Inferring gene regulatory networks from time-ordered gene expression data using differential equations

[...]

Michiel J. L. de Hoon, Seiya Imoto, Satoru Miyano

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: In this article, the degree of sparseness of the gene regulatory network from the data, where they determine which coefficients are nonzero by using Akaike's Information Criterion.

...read moreread less

Journal Article•DOI•

A Visualization Tool for Gene Network Discovery-G.NET

[...]

Ken Aoshima, Masayuki Ikawa, Satoshi Tanaka, Koji Yanagisawa, Sun Yong Kim¹, Naoki Nariai¹, Seiya Imoto¹, Satoru Miyano¹ - Show less +4 more•Institutions (1)

University of Tokyo¹

01 Jan 2002-Genome Informatics

TL;DR: This work has developed a computer software, named G.NET, for visualizing and analyzing the large-scale gene network, and developed the gene network layout algorithms, named GNL algorithm, in order to display the big gene network in 2 and 3 dimensional spaces effectively.

...read moreread less

Abstract: In recent years, for solving the whole aspect of gene regulation mechanism, the analysis of a gene network attracts considerable attention in the field of molecular biology and bioinformatics. Various methodologies [1, 3, 4] have been developed for inferring a gene network from cDNA microarray gene expression data. However, after constructing a gene network, there are still some problems to be solved in how to extract valuable information from such large-scale network. For example, finding the complex interactions among genes, the evaluation of the estimated gene pathways and so on. For a solution of these problems, we have developed a computer software, named G.NET, for visualizing and analyzing the large-scale gene network. We have also developed the gene network layout algorithms, named GNL algorithm, in order to display the large-scale gene network in 2 and 3 dimensional spaces effectively.

...read moreread less

Journal Article•DOI•

XML Pathway File Conversion between Genomic Object Net and SBML

[...]

Masafumi Nakano¹, Hironori Kitakaze¹, Hiroshi Matsuno², Satoru Miyano³•Institutions (3)

Oshima National College of Maritime Technology¹, Yamaguchi University², University of Tokyo³

01 Jan 2002-Genome Informatics

TL;DR: This paper compares xml description of GON with XML description of SBML, examines whether it could be converted from GON Assembler to SBML or vice versa, and investigates the automatic conversion between GON and SBML.

...read moreread less

Abstract: Recently, the importance of biosimulation softwares in systems biology has been emphasized and received considerable attentions. Since the information conventionally created with most of biosimulation softwares does not have a common format for modeling, it has been very difficult to exchange pathway models among them. At present, the Systems Biology group of ERATO Kitano Symbiosis System Project takes the lead, and has proposed the standard language System Biology Mark-up Language (SBML) [1],which can give a common infrastructure for several biosimulation tools such as Bio/SPISE, E-Cell, DBSolve, Gepasi, Stochsim, and Virtual Cell. On the other hand, Genomic Object Net (GON) [2, 3] is a biosimulation system which uses hybrid functional Petri net (HFPN) and extensible markup language (XML) as basic mechanisms. With GON, even the researchers, who are not familiar with the mathematical modeling techniques such as differential equation and programming, can perform modeling and the simulation of a biological phenomenon easily. This paper compares XML description of GON with XML description of SBML, examines whether it could be converted from GON Assembler to SBML or vice versa, and investigates the automatic conversion between GON and SBML.

...read moreread less

Proceedings Article•

Fast Algorithm for Extracting Multiple Unordered Short Motifs Using Bit Operations

[...]

Osamu Maruyama¹, Hideo Bannai², Yoshinori Tamada³, Satoru Kuhara¹, Satoru Miyano² - Show less +1 more•Institutions (3)

Kyushu University¹, University of Tokyo², Tokai University³

01 Dec 2002

TL;DR: In this paper, the problem of extracting multiple unordered short motifs in upstream regions of given genes was considered, and a fast method was developed to exhaustively search collections of short motif over given short motif for a particular set of genes, and rank collections with using multiple objective functions.

...read moreread less

Abstract: In this paper, we consider the problem of extracting multiple unordered short motifs in upstream regions of given genes. Multiple unordered short motifs can be considered as a set of short motifs, say M = {m1, m2,..., mk}. For a gene g, if each of the motifs m1, .... ,mk occurs in either the upstream region or its complement of g, the gene g is said to be consistent with M. We have developed a fast method to exhaustively search collections of short motifs over given short motifs for a particular set of genes, and rank collections with using multiple objective functions. This method is implemented by employing bit operations in the process of matching short motifs with upstream regions, and identifying the members of genes which are consistent with short motifs. On various putatively co-regulated genes of Sacchrormyces cerevisiae, determined by gene expression profiles, our computational experiments show biologically interesting results.

...read moreread less

Journal Article•DOI•

Fast algorithm for extracting multiple unordered short motifs using bit operations

[...]

Osamu Maruyama¹, Hideo Bannai², Yoshinori Tamada³, Satoru Kuhara¹, Satoru Miyano² - Show less +1 more•Institutions (3)

Kyushu University¹, University of Tokyo², Tokai University³

01 Oct 2002-Information Sciences

TL;DR: A fast method to exhaustively search collections of short motifs over given long motifs for a particular set of genes, and rank collections with using multiple objective functions is developed.

...read moreread less

On the Complexity of Deriving Position Specific Score Matrices

[...]

Tatsuya Akutsu, Hideo Bannai, Satoru Miyano, Sascha Ott

24 Jan 2002

Book Chapter•DOI•

On the Complexity of Deriving Position Specific Score Matrices from Examples

[...]

Tatsuya Akutsu¹, Hideo Bannai², Satoru Miyano¹, Satoru Miyano², Sascha Ott² - Show less +1 more•Institutions (2)

Kyoto University¹, University of Tokyo²

03 Jul 2002

TL;DR: In this paper, the authors studied the problem of finding a position-specific score matrix (PSSM) which correctly discriminates between positive and negative examples, and proved that this problem is solved in polynomial time if the size of a PSSM is bounded by a constant.

...read moreread less

Abstract: PSSMs (Position-Specific Score Matrices) have been applied to various problems in Bioinformatics. We study the following problem: given positive examples (sequences) and negative examples (sequences), finda PSSM which correctly discriminates between positive and negative examples. We prove that this problem is solvedin polynomial time if the size of a PSSM is bounded by a constant. On the other hand, we prove that this problem is NP-hard if the size is not bounded. We also prove similar results on deriving a mixture of PSSMs.

...read moreread less

Book Chapter•DOI•

Toward Drawing an Atlas of Hypothesis Classes: Approximating a Hypothesis via Another Hypothesis Model

[...]

Osamu Maruyama¹, Takayoshi Shoudai¹, Satoru Miyano², Satoru Miyano³•Institutions (3)

Kyushu University¹, Kyoto University², University of Tokyo³

24 Nov 2002

TL;DR: This paper defines a measure of approximation of a hypothesis class C1 to another class C2 and discusses lower bounds of the approximation ratios among representative classes of hypotheses like decision lists, decision trees, linear discriminant functions and so on.

...read moreread less

Abstract: Computational knowledge discovery can be considered to be a complicated human activity concerned with searching for something new from data with computer systems. The optimization of the entire process of computational knowledge discovery is a big challenge in computer science. If we had an atlas of hypothesis classes which describes prior and basic knowledge on relative relationship between the hypothesis classes, it would be helpful in selecting hypothesis classes to be searched in discovery processes. In this paper, to give a foundation for an atlas of various classes of hypotheses, we have defined a measure of approximation of a hypothesis class C1 to another class C2. The hypotheses we consider here are restricted to m-ary Boolean functions. For 0 ? ? ? 1, we say that C1 is (1-?)-approximated to C2 if, for every distribution D over {0, 1}m, and for each hypothesis h1 ? C1, there exists a hypothesis h1 ? C1 such that, with the probability at most ?, we have h1(x) ? h2(x) where x ? {0, 1}m is drawn randomly and independently according to D. Thus, we can use the approximation ratio of C1 to C2 as an index of how similar C1 is to C2. We discuss lower bounds of the approximation ratios among representative classes of hypotheses like decision lists, decision trees, linear discriminant functions and so on. This prior knowledge would come in useful when selecting hypothesis classes in the initial stage and the sequential stages involved in the entire discovery process.

...read moreread less

Journal Article•DOI•

Experimental Observations and Simulations by Genomic Object Net of Notch Signaling in Drosophila Multicellular Systems

[...]

Hiroshi Matsuno¹, Ryutaro Murakami¹, Rie Yamane¹, Naoyuki Yamasaki¹, Sachie Fujita¹, Haruka Yoshimori¹, Satoru Miyano² - Show less +3 more•Institutions (2)

Yamaguchi University¹, University of Tokyo²

01 Jan 2002-Genome Informatics

TL;DR: GON is proved to be a useful and flexible biosimulation system that is suitable for analyzing complex biological phenomena such as patternings of multicellular systems as well as intracellular changes in cell states including metabolic activities, gene regulation, and enzyme reactions.

...read moreread less

Abstract: The Delta-Notch signaling system plays an essential role in various morphogenetic systems of multicellular animal development [1]. Here we analyzed the mechanism of Notch-dependent boundary formation in the Drosophila large intestine by experimental manipulation of Delta expression and computational modeling and simulation by Genomic Object Net (GON) [3]. GON is proved to be a useful and flexible biosimulation system that is suitable for analyzing complex biological phenomena such as patternings of multicellular systems as well as intracellular changes in cell states including metabolic activities, gene regulation, and enzyme reactions.

...read moreread less

Journal Article•

Toward drawing an atlas of hypothesis classes: Approximating a hypothesis via another hypothesis model

[...]

Osamu Maruyama¹, Takayoshi Shoudai¹, Satoru Miyano², Satoru Miyano³•Institutions (3)

Kyushu University¹, Kyoto University², University of Tokyo³

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: In this article, a measure of approximation of a hypothesis class C 1 to another class C 2 is defined, where C 1 is (1 - e)-approximated to C 2 if, for every distribution D over {0, 1} m, and for each hypothesis h 1 E C 1, there exists a hypothesis h 2 E C 2 such that, with the probability at most e, we have h 1 (x) ¬= h 2 e C 2 (x), where x E is drawn randomly and independently according to D.

...read moreread less

Abstract: Computational knowledge discovery can be considered to be a complicated human activity concerned with searching for something new from data with computer systems. The optimization of the entire process of computational knowledge discovery is a big challenge in computer science. If we had an atlas of hypothesis classes which describes prior and basic knowledge on relative relationship between the hypothesis classes, it would be helpful in selecting hypothesis classes to be searched in discovery processes. In this paper, to give a foundation for an atlas of various classes of hypotheses, we have defined a measure of approximation of a hypothesis class C 1 to another class C 2 . The hypotheses we consider here are restricted to m-ary Boolean functions. For 0 < e < 1, we say that C 1 is (1 - e)-approximated to C 2 if, for every distribution D over {0,1} m , and for each hypothesis h 1 E C 1 , there exists a hypothesis h 2 E C 2 such that, with the probability at most e, we have h 1 (x) ¬= h 2 (x) where x E {0,1} m is drawn randomly and independently according to D. Thus, we can use the approximation ratio of C 1 to C 2 as an index of how similar C 1 is to C 2 . We discuss lower bounds of the approximation ratios among representative classes of hypotheses like decision lists, decision trees, linear discriminant functions and so on. This prior knowledge would come in useful when selecting hypothesis classes in the initial stage and the sequential stages involved in the entire discovery process.

...read moreread less

Journal Article•DOI•

Development of Genomic Object Net Builder for Supporting XML Design for Visualization

[...]

Toshinori Tanaka¹, Hironori Kitakaze¹, Hiroshi Matsuno², Satoru Miyano³•Institutions (3)

Oshima National College of Maritime Technology¹, Yamaguchi University², University of Tokyo³

01 Jan 2002-Genome Informatics

TL;DR: In GON, biopathway is modeled as an extension of hybrid PetriNet called hybrid functional Petri net (HFPN) for which XML documentation is defined, and user has to write XML documents for personalized visualization.

...read moreread less

Abstract: Genomic Object Net (GON) is a biosimlation tool that allows us to model various kinds of biopathways including gene regulatory networks, metabolic pathways, and signal transduction pathways in a biologically intuitive way [4] In GON, biopathway is modeled as an extension of hybrid Petri net called hybrid functional Petri net (HFPN) for which XML documentation is defined As a graphical editor of a biopathway, GON equips with a tool GON Assembler for drawing and simulating the biopathway Of couse, user need not touch any XML definitions Furthermore, neither ordinary differential equations with messy coefficients nor programming labors are explicity required for user to model biopathways in GON Since it is designed so powerful and flexible, E-CELL can also be realized as a subsystem of GON [1] GON also has a visulalization tool for simulation called GON Visualizer By writing an XML document for GON Visualizer, user can animate and see visually the interactions in a biopathway so that user can create and test hypotheses for biological phenomena [2, 3] However, user has to write XML documents for personalized visualization although it is not a disastrous obstacle like the ODE designs and C++ programming for biologists

...read moreread less

Journal Article•DOI•

Genomic Object Net in JAVA: A Platform for Biopathway Modeling and Simulation

[...]

Masao Nagasaki¹, Atsushi Doi², Makiko Sasaki¹, Christopher J. Savoie², Hiroshi Matsuno, Satoru Miyano¹ - Show less +2 more•Institutions (2)

University of Tokyo¹, Yamaguchi University²

01 Jan 2002-Genome Informatics

TL;DR: A new version Genomic Object Net in JAVA (JAVA GON for short) is developed, which inherits basic ideas and concepts in GenomicObject Net while enhancing the ability for handling not only biopathways but also localization information and multicellular processes.

...read moreread less

Abstract: In the post-genome era, biopathway information processing will be one of the most important issues in Bioinformatics. Development of Genomic Object Net [6] is our approach to this issue. This software aims at describing and simulating structurally complex dynamic causal interactions and processes such as metabolic pathways, signal transduction cascades, gene regulations. We have released Genomic Object Net (ver. 0.919) in 2001. With this system, we have shown that we can reorganize and represent various biopathway information so that biopathways can be modeled and simulated for new hypothesis generation and testing (see [3, 4, 5, 6]). Although we have succeeded in modeling and simulating various biopathways without so much efforts, we have further identified more inconveniences through our biopathway modeling activities. This motivated us to develop a new version Genomic Object Net in JAVA (JAVA GON for short) from scratch. It inherits basic ideas and concepts in Genomic Object Net (ver. 0.919) while enhancing the ability for handling not only biopathways but also localization information and multicellular processes.

...read moreread less

Journal Article•DOI•

Analysis of Aberrant Splicing Data

[...]

Yoshinori Tamada¹, Sascha Ott², Hideo Bannai², Sun Yong Kim², Kenta Nakai², Satoru Miyano² - Show less +2 more•Institutions (2)

Tokai University¹, University of Tokyo²

01 Jan 2002-Genome Informatics

TL;DR: This work analyzed a dataset of several hundred cases in which aberrant splicing is caused by mutations and made use of the human genomic sequence and studied the sequences in the regions of these mutations.

...read moreread less

Abstract: Splicing is a process that removes introns from the pre-mRNA transcript of genes and thereby connects the exons to form the mature mRNA. It takes place in the cell nucleus and is known to play a major role in the expression of genetic information in eukaryotes. Alternative splicing, splicing enhancers, and splicing inhibitors are fields of active research [1, 2, 3]. In this work, we focus on aberrant splicing. Aberrant splicing refers to abnormal variations in the splicing process that can cause diseases. We analyze a dataset of several hundred cases in which aberrant splicing is caused by mutations. The mutations are substitutions, insertions, deletions and duplications. In about 95% of these cases, the splicing was affected and changed in one of the following ways: (at least one) exon was skipped, an intron was retained, the length of one exon was changed. In order to understand the differential effect of mutations on splicing, we made use of the human genomic sequence and studied the sequences in the regions of these mutations.

...read moreread less

Journal Article•

On the complexity of deriving position specific Score matrices from examples

[...]

Tatsuya Akutsu¹, Hideo Bannai², Satoru Miyano¹, Satoru Miyano², Sascha Ott² - Show less +1 more•Institutions (2)

Kyoto University¹, University of Tokyo²

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: This work proves that the problem of finding a PSSM which correctly discriminates between positive and negative examples is solved in polynomial time and proves that this problem is NP-hard if the size is not bounded.

...read moreread less

Abstract: PSSMs (Position-Specific Score Matrices) have been applied to various problems in Bioinformatics. We study the following problem: given positive examples (sequences) and negative examples (sequences), find a PSSM which correctly discriminates between positive and negative examples. We prove that this problem is solved in polynomial time if the size of a PSSM is bounded by a constant. On the other hand, we prove that this problem is NP-hard if the size is not bounded. We also prove similar results on deriving a mixture of PSSMs.

...read moreread less

Book Chapter•DOI•

Foundations of Designing Computational Knowledge Discovery Processes

[...]

Yoshinori Tamada¹, Hideo Bannai², Osamu Maruyama³, Satoru Miyano²•Institutions (3)

Tokai University¹, University of Tokyo², Kyushu University³

01 Jan 2002

TL;DR: The aim is to construct a principle of computational knowledge discovery, which will be used for building actual applications or discovery systems, and for accelerating such entire processes, called VOX (View Oriented eXploration).

...read moreread less

Abstract: We propose a new paradigm for computational knowledge discovery, called VOX (View Oriented eXploration). Recent research has revealed that actual discoveries cannot be achieved using only component technologies such as machine learning theory or data mining algorithms. Recognizing how the computer can assist the actual discovery tasks, we developed a solution to this problem. Our aim is to construct a principle of computational knowledge discovery, which will be used for building actual applications or discovery systems, and for accelerating such entire processes. VOX is a mathematical abstraction of knowledge discovery processes, and provides a unified description method for the discovery processes. We present advantages obtained by using VOX. Through an actual computational experiment, we show the usefulness of this new paradigm. We also designed a programming language based on this concept. The language is called VML (View Modeling Language), which is defined as an extension of a functional language ML. Finally, we present the future plans and directions in this research.

...read moreread less