scispace - formally typeset
Search or ask a question
Author

Setsuo Arikawa

Bio: Setsuo Arikawa is an academic researcher from Kyushu University. The author has contributed to research in topics: Pattern matching & Compressed pattern matching. The author has an hindex of 30, co-authored 110 publications receiving 2972 citations. Previous affiliations of Setsuo Arikawa include International Institute of Minnesota.


Papers
More filters
Book ChapterDOI
01 Jul 2001
TL;DR: It is shown that the algorithm is crucial to the effective use of block-sorting compression and a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information is presented.
Abstract: We present a linear-time algorithm to compute the longest common prefix information in suffix arrays As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information

520 citations

Kenji Abe1, Shinji Kawasoe1, Tatsuya Asai1, Hiroki Arimura1, Setsuo Arikawa1 
01 Dec 2002
TL;DR: In this paper, the problem of discovering interesting substructures from a large collection of semi-structured data in the framework of optimized pattern discovery is considered, and an efficient algorithm that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, is presented.
Abstract: In this paper, we consider the problem of discovering interesting substructures from a large collection of semi-structured data in the framework of optimized pattern discovery. We model semi-structured data and patterns with labeled ordered trees, and present an efficient algorithm that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data. We give theoretical analyses of the computational complexity of the algorithm for patterns with bounded and unbounded size. Experiments show that the algorithm performs well and discovered interesting patterns on real datasets.

152 citations

Book ChapterDOI
Kenji Abe1, Shinji Kawasoe1, Tatsuya Asai1, Hiroki Arimura1, Setsuo Arikawa1 
19 Aug 2002
TL;DR: An efficient algorithm is presented that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data.
Abstract: In this paper, we consider the problem of discovering interesting substructures from a large collection of semi-structured data in the framework of optimized pattern discovery. We model semi-structured data and patterns with labeled ordered trees, and present an efficient algorithm that discovers the best labeled ordered trees that optimize a given statistical measure, such as the information entropy and the classification accuracy, in a collection of semi-structured data. We give theoretical analyses of the computational complexity of the algorithm for patterns with bounded and unbounded size. Experiments show that the algorithm performs well and discovered interesting patterns on real datasets.

116 citations

Journal ArticleDOI
TL;DR: A general framework suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions is introduced, which includes such compression methods as Lempel-Ziv family, RE-PAIR, SEQUITUR, and the static Dictionary-based method.

109 citations


Cited by
More filters
Proceedings ArticleDOI
09 Dec 2002
TL;DR: A novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation by building a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label.
Abstract: We investigate new approaches for frequent graph-based pattern mining in graph datasets and propose a novel algorithm called gSpan (graph-based substructure pattern mining), which discovers frequent substructures without candidate generation. gSpan builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order gSpan adopts the depth-first search strategy to mine frequent connected subgraphs efficiently. Our performance study shows that gSpan substantially outperforms previous algorithms, sometimes by an order of magnitude.

2,282 citations

Book
02 Jan 1991

1,377 citations

Book
01 Dec 1988
TL;DR: In this paper, the basic processes in Atomization are discussed, and the drop size distributions of sprays are discussed.Preface 1.General Considerations 2.Basic Processes of Atomization 3.Drop Size Distributions of Sprays 4.Atomizers 5.Flow in Atomizers 6.AtOMizer Performance 7.External Spray Charcteristics 8.Drop Evaporation 9.Drop Sizing Methods Index
Abstract: Preface 1.General Considerations 2.Basic Processes in Atomization 3.Drop Size Distributions of Sprays 4.Atomizers 5.Flow in Atomizers 6.Atomizer Performance 7.External Spray Charcteristics 8.Drop Evaporation 9.Drop Sizing Methods Index

1,214 citations

Journal ArticleDOI
TL;DR: This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art.
Abstract: Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art.

1,044 citations

Proceedings ArticleDOI
23 Feb 2013
TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
Abstract: There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to process large graphs, these systems have emphasized the ability to run on distributed memory machines. Today, however, a single multicore server can support more than a terabyte of memory, which can fit graphs with tens or even hundreds of billions of edges. Furthermore, for graph algorithms, shared-memory multicores are generally significantly more efficient on a per core, per dollar, and per joule basis than distributed memory systems, and shared-memory algorithms tend to be simpler than their distributed counterparts.In this paper, we present a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write. The framework has two very simple routines, one for mapping over edges and one for mapping over vertices. Our routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search (BFS), our routines automatically adapt to the density of vertex sets. We implement several algorithms in this framework, including BFS, graph radii estimation, graph connectivity, betweenness centrality, PageRank and single-source shortest paths. Our algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a 40-core machine and are significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

816 citations