scispace - formally typeset
Search or ask a question
Author

Takayoshi Shoudai

Bio: Takayoshi Shoudai is an academic researcher from International University, Cambodia. The author has contributed to research in topics: K-ary tree & Tree (data structure). The author has an hindex of 14, co-authored 97 publications receiving 752 citations. Previous affiliations of Takayoshi Shoudai include Yamaguchi University & Kyushu University.


Papers
More filters
Book ChapterDOI
06 May 2002
TL;DR: This work proposes a new method for discovering frequent tree structured patterns in semistructured Web documents by using a tag tree pattern as a hypothesis, and presents an algorithm for generating all maximally frequent tag tree patterns.
Abstract: Many Web documents such as HTML files and XML files have no rigid structure and are called semistructured data. In general, such semistructuredWeb documents are represented by rooted trees with ordered children. We propose a new method for discovering frequent tree structured patterns in semistructured Web documents by using a tag tree pattern as a hypothesis. A tag tree pattern is an edge labeled tree with ordered children which has structured variables. An edge label is a tag or a keyword in such Web documents, and a variable can be substituted by an arbitrary tree. So a tag tree pattern is suited for representing tree structured patterns in such Web documents. First we show that it is hard to compute the optimum frequent tag tree pattern. So we present an algorithm for generating all maximally frequent tag tree patterns and give the correctness of it. Finally, we report some experimental results on our algorithm. Although this algorithm is not efficient, experiments show that we can extract characteristic tree structured patterns in those data.

59 citations

Book ChapterDOI
16 Apr 2001
TL;DR: An algorithm for finding all maximally frequent tag tree patterns in semistructured data such as Web documents is presented and some experimental results on XML documents are reported by using the algorithm.
Abstract: Many documents such as Web documents or XML files have no rigid structure. Such semistructured documents have been rapidly increasing. We propose a new method for discovering frequent tree structured patterns in semistructured Web documents. We consider the data mining problem of finding all maximally frequent tag tree patterns in semistructured data such as Web documents. A tag tree pattern is an edge labeled tree which has hyperedges as variables. An edge label is a tag or a keyword inWeb documents, and a variable can be substituted by any tree. So a tag tree pattern is suited for representing tree structured patterns in semistructured Web documents. We present an algorithm for finding all maximally frequent tag tree patterns. Also we report some experimental results on XML documents by using our algorithm.

58 citations

Journal ArticleDOI
24 Nov 2002
TL;DR: In this paper, a linear ordered term tree (LOMT) is proposed to represent structural features common to semistructured data, which is a rooted tree pattern consisting of ordered tree structures and internal structured variables with distinct variable labels.
Abstract: In the fields of data mining and knowledge discovery, many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t are ordered and t has edge labels. In order to represent structural features common to such semistructured data, we propose a linear ordered term tree, which is a rooted tree pattern consisting of ordered tree structures and internal structured variables with distinct variable labels. For a set of edge labels Λ, let OTTΛ be the set of all linear ordered term trees. For a linear ordered term tree t in OTTΛ, the term tree language of t, denoted by LΛ (t), is the set of all ordered trees obtained from t by substituting arbitrary ordered trees for all variables in t. Given a set of ordered trees S, the minimal language problem for OTTLΛ = {LΛ (t) | t ∈ OTTΛ} is to find a linear ordered term tree t in OTTΛ such that LΛ (t) is minimal among all term tree languages which contain all ordered trees in S. We show that the class OTTLΛ is polynomial time inductively inferable from positive data, by giving a polynomial time algorithm for solving the minimal language problem for OTTLΛ.

30 citations

Book ChapterDOI
08 Jul 2002
TL;DR: In this paper, polynomial time algorithms for the following two problems for two fundamental classes of term trees are proposed and it is shown that the two classes ofterm trees are polynometric time inductively inferable from positive data.
Abstract: Tree structured data such as HTML/XML files are represented by rooted trees with ordered children and edge labels. As a representation of a tree structured pattern in such tree structured data, we propose an ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and internal structured variables. A term tree is a generalization of standard tree patterns representing first order terms in formal logic. For a set of edge labels ? and a term tree t, the term tree language of t, denoted by L?(t), is the set of all labeled trees which are obtained from a term tree t by substituting arbitrary labeled trees for all variables in t. In this paper, we propose polynomial time algorithms for the following two problems for two fundamental classes of term trees. The membership problem is, given a term tree t and a tree T, to decide whether or not L?(t) includes T. The minimal language problem is, given a set of labeled trees S, to find a term tree t such that L?(t) is minimal among all term tree languages which contain all trees in S. Then, by using these two algorithms, we show that the two classes of term trees are polynomial time inductively inferable from positive data.

28 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: This work has succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form.
Abstract: Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005‐1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules. Availability: An (experimental) web service using rules obtained by our method is provided at http:

721 citations

Book
06 Apr 1995
TL;DR: In providing an up-to-date survey of parallel computing research from 1994, Topics in Parallel Computing will prove invaluable to researchers and professionals with an interest in the super computers of the future.
Abstract: This volume provides an ideal introduction to key topics in parallel computing. With its cogent overview of the essentials of the subject as well as lists of P -complete- and open problems, extensive remarks corresponding to each problem, a thorough index, and extensive references, the book will prove invaluable to programmers stuck on problems that are particularly difficult to parallelize. In providing an up-to-date survey of parallel computing research from 1994, Topics in Parallel Computing will prove invaluable to researchers and professionals with an interest in the super computers of the future.

533 citations

Journal ArticleDOI
TL;DR: A history of cellular automata from their beginnings with von Neumann to the present day is traced, mainly on topics closer to computer science and mathematics rather than physics, biology or other applications.
Abstract: Cellular automata are simple models of computation which exhibit fascinatingly complex behavior. They have captured the attention of several generations of researchers, leading to an extensive body of work. Here we trace a history of cellular automata from their beginnings with von Neumann to the present day. The emphasis is mainly on topics closer to computer science and mathematics rather than physics, biology or other applications. The work should be of interest to both new entrants into the field as well as researchers working on particular aspects of cellular automata.

353 citations

Journal ArticleDOI
TL;DR: A review of a number of existing methods developed to solve the discovery of patterns in biosequences and how these relate to each other, focusing on the algorithms underlying the approaches.
Abstract: This paper surveys approaches to the discovery of patterns in biosequences and places these approaches within a formal framework that systematises the types of patterns and the discovery algorithms. Patterns with expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering the patterns that are the most frequently used in molecular bioinformatics. A formulation is given of the problem of the automatic discovery of such patterns from a set of sequences, and an analysis is presented of the ways in which an assessment can be made of the significance of the discovered patterns. It is shown that the problem is related to problems studied in the field of machine learning. The major part of this paper comprises a review of a number of existing methods developed to solve the problem and how these relate to each other, focusing on the algorithms underlying the approaches. A comparison is given of the algorithms, and examp...

351 citations