Top 16 papers published by Hiroyuki Kitagawa from University of Tsukuba in 2005

Proceedings Article•DOI•

Example-based robust outlier detection in high dimensional datasets

[...]

Cui Zhu¹, Hiroyuki Kitagawa¹, Christos Faloutsos²•Institutions (2)

University of Tsukuba¹, Carnegie Mellon University²

27 Nov 2005

TL;DR: This paper presents a novel robust solution which detects high dimensional outliers based on user examples and tolerates incorrect inputs and studies the behavior of projections of such a few examples, to discover further objects that are outstanding in the projection where many examples are outlying.

...read moreread less

Abstract: Detecting outliers is an important problem. Most of its applications typically possess high dimensional datasets. In high dimensional space, the data becomes sparse which implies that every object can be regarded as an outlier from the point of view of similarity. Furthermore, a fundamental issue is that the notion of which objects are outliers typically varies between users, problem domains or, even, datasets. In this paper, we present a novel robust solution which detects high dimensional outliers based on user examples and tolerates incorrect inputs. It studies the behavior of projections of such a few examples, to discover further objects that are outstanding in the projection where many examples are outlying. Our experiments on both real and synthetic datasets demonstrate the ability of the proposed method to detect outliers corresponding to the user examples.

...read moreread less

52 citations

Journal Article•DOI•

Example-Based Outlier Detection for High Dimensional Datasets

[...]

Cui Zhu¹, Hiroyuki Kitagawa¹, Christos Faloutsos²•Institutions (2)

University of Tsukuba¹, Carnegie Mellon University²

15 Jun 2005-Ipsj Digital Courier

TL;DR: A novel solution to the problem of detecting outliers based on user examples for high dimensional datasets by discovering the hidden view of outliers and picking out further objects that are outstanding in the projection where the examples stand out greatly is presented.

...read moreread less

Abstract: Detecting outliers is an important problem, in applications such as fraud detection, financial analysis, health monitoring and so on. It is typical of most such applications to possess high dimensional datasets. Many recent approaches detect outliers according to some reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). Most of these concepts are proximity-based which define an outlier by its relationship to the rest of the data. However, in high dimensional space, the data becomes sparse which implies that every object can be regarded as an outlier from the point of view of similarity. Furthermore, a fundamental issue is that the notion of which objects are outliers typically varies between users, problem domains or, even, datasets. In this paper, we present a novel solution to this problem, by detecting outliers based on user examples for high dimensional datasets. By studying the behavior of projections of such a few outlier examples in the dataset, the proposed method discovers the hidden view of outliers and picks out further objects that are outstanding in the projection where the examples stand out greatly. Our experiments on both real and synthetic datasets demonstrate the ability of the proposed method to detect outliers that match users' intentions.

...read moreread less

16 citations

Proceedings Article•DOI•

Topic activation analysis for document streams based on document arrival rate and relevance

[...]

Chunhua Cui¹, Hiroyuki Kitagawa¹•Institutions (1)

University of Tsukuba¹

13 Mar 2005

TL;DR: A novel topic activation analysis scheme is proposed that incorporates both document arrival rate and relevance to address the first problem and an incremental scheme more appropriate for a document streaming environment is presented.

...read moreread less

Abstract: With the advance of network technology in recent years, the dissemination and exchange of massive documents has become commonplace. Accordingly, the importance of content analysis techniques is increasing. Topic analysis in large-scale document streams such as E-mails and news articles is an important research issue. This paper addresses techniques for "topic activation analysis" for document streams. For example, when news articles with a strong relationship to a given topic arrive frequently in a news stream, we can regard the activation level of the topic as high. In [1], Kleinberg proposed a method for analyzing document streams. Although the main objective of his method was to detect bursts of topics, it can also be used for topic activation analysis. His method, however, has a serious limitation in that it only looks at the arrival rate of documents and ignores the degree of relevance for each document. Another limitation is that his method is "batch-oriented." This paper first proposes a novel topic activation analysis scheme that incorporates both document arrival rate and relevance to address the first problem. It then presents an incremental scheme more appropriate for a document streaming environment. The proposed schemes are validated by experiments using real CNN news articles.

...read moreread less

13 citations

A Comparative Study of Feature Vector-Based Topic Detection Schemes for Text Streams

[...]

Masafumi Hamamoto¹, Hiroyuki Kitagawa¹, Jia-Yu Pan Christos Faloutsos²•Institutions (2)

University of Tsukuba¹, Carnegie Mellon University²

01 Jan 2005

TL;DR: A method to detect topics in text data using feature vectors using Singular Value Decomposition, clustering, and Independent Component Analysis is examined.

...read moreread less

Abstract: Topic detection is an important subject when voluminous text data is sent continuously to a user. We examine a method to detect topics in text data using feature vectors. Feature vectors represent the main distribution of data and they are obtained by various data analysis methods. This paper examines three methods: Singular Value Decomposition (SVD), clustering, and Independent Component Analysis (ICA). SVD and clustering are popular existing methods. Clustering, especially, is applied to many topic detection methods. ICA was recently developed in signal processing research. In applications related to text data, however, ICA has not been compared with SVD and clustering, nor has its relationship with them been explored. This paper reports comparative experiments for these three methods and then shows properties as they apply to text data.

...read moreread less

9 citations

A Tool to Compute Reliable Web Links and Its Applications

[...]

Akiyoshi Nakamizo¹, Toshinari Iida², Atsuyuki Morishima², Shigeo Sugimoto², Hiroyuki Kitagawa² - Show less +1 more•Institutions (2)

Shibaura Institute of Technology¹, University of Tsukuba²

01 Jan 2005

TL;DR: How the software tool that finds destinations (new URLs) of Web pages after pages are moved works internally and some results of its applications to the problem of finding new locations of real Web pages are shown.

...read moreread less

Abstract: We are developing a software tool that finds destinations (new URLs) of Web pages after pages are moved. A point of the tool is that it tries to find “reliable Web links,” which are links always to be kept updated. We believe this is a new approach in finding new URLs for Web pages. This paper explains how the tool works internally and shows some results of its applications to the problem of finding new locations of real Web pages.

...read moreread less

8 citations

Proceedings Article•DOI•

A Comparative Study of Feature Vector-Based Topic Detection Schemes A Comparative Study of Feature Vector-Based Topic Detection Schemes

[...]

Masafumi Hamamoto¹, Hiroyuki Kitagawa¹, Jia-Yu Pan², Christos Faloutsos²•Institutions (2)

University of Tsukuba¹, Carnegie Mellon University²

08 Apr 2005

TL;DR: Three methods to detect topics in text data using feature vectors are examined: singular value decomposition, clustering, and independent component analysis (ICA).

...read moreread less

Abstract: Topic detection is an important subject when voluminous text data is sent continuously to a user. We examine a method to detect topics in text data using feature vectors. Feature vectors represent the main distribution of data and they are obtained by various data analysis methods. This paper examines three methods: singular value decomposition (SVD), clustering, and independent component analysis (ICA). SVD and clustering are popular existing methods. Clustering, especially, is applied to many topic detection methods. ICA was recently developed in signal processing research. In applications related to text data, however, ICA has not been compared with SVD and clustering, nor has its relationship with them been explored. This paper reports comparative experiments for these three methods and then shows properties as they apply to text data

...read moreread less

8 citations

Proceedings Article•DOI•

Adaptive Query Optimization Method for Multiple Continuous Queries

[...]

Yousuke Watanabe¹, Hiroyuki Kitagawa¹•Institutions (1)

University of Tsukuba¹

05 Apr 2005

TL;DR: The research group has proposed a multiple query optimization method for continuous queries that automatically estimates the optimal value and iteratively adjusts it even if properties of underlying data streams dramatically change.

...read moreread less

Abstract: Continuous query is widely recognized as a scheme for processing queries over data streams, and efficient methods for processing multiple continuous queries are needed. Our research group has proposed a multiple query optimization method for continuous queries. In our method, the system forms clusters of queries with similar execution patterns, and derives query plans sharing the result of common operators. Our previous experiments have shown that a parameter value in the clustering phase controls divisions of clusters and has a great impact on query processing effi- ciency. However, the optimal parameter value must be decided by trial and error. This paper extends our previous work. The proposed method automatically estimates the optimal value and iteratively adjusts it even if properties of underlying data streams dramatically change.

...read moreread less

5 citations

Journal Article•DOI•

Conveying taxonomy context for topic‐focused Web search

[...]

Said Mirza Pahlevi¹, Hiroyuki Kitagawa²•Institutions (2)

National Institute of Advanced Industrial Science and Technology¹, University of Tsukuba²

15 Jan 2005-Journal of the Association for Information Science and Technology

TL;DR: This article proposes a method employing the taxonomy-based search services such as Web directories to facilitate searches in any Web search interfaces that support Boolean queries, and develops new fast classification learning algorithms.

...read moreread less

Abstract: Introducing context to a user query is effective to improve the search effectiveness. In this article we propose a method employing the taxonomy-based search services such as Web directories to facilitate searches in any Web search interfaces that support Boolean queries. The proposed method enables one to convey current search context on taxonomy of a taxonomy-based search service to the searches conducted with the Web search interfaces. The basic idea is to learn the search context in the form of a Boolean condition that is commonly accepted by many Web search interfaces, and to use the condition to modify the user query before forwarding it to the Web search interfaces. To guarantee that the modified query can always be processed by the Web search interfaces and to make the method adaptive to different user requirements on search result effectiveness, we have developed new fast classification learning algorithms.

...read moreread less

4 citations

Proceedings Article•DOI•

A Tool to Compute ReliableWeb Links and Its Applications

[...]

Akiyoshi Nakamizo¹, Toshinari Iida², Atsuyuki Morishima², Shigeo Sugimoto², Hiroyuki Kitagawa² - Show less +1 more•Institutions (2)

Shibaura Institute of Technology¹, University of Tsukuba²

05 Apr 2005

TL;DR: How the software tool that finds destinations (new URLs) of Web pages after pages are moved works internally and some results of its applications to the problem of finding new locations of real Web pages are shown.

...read moreread less

Abstract: We are developing a software tool that finds destinations (new URLs) of Web pages after pages are moved. A point of the tool is that it tries to find "reliable Web links," which are links always to be kept updated. We believe this is a new approach in finding new URLs for Web pages. This paper explains how the tool works internally and shows some results of its applications to the problem of finding new locations of real Web pages.

...read moreread less

4 citations

Book Chapter•DOI•

A distributed algorithm for outlier detection in a large database

[...]

Biplab Kumer Sarker, Hiroyuki Kitagawa

28 Mar 2005

TL;DR: This paper proposes a distributed algorithm to detect outliers for large and distributed datasets using the basis of distance-based outliers based on the distance of a point to its kth nearest neighbor to identify outliers.

...read moreread less

Abstract: This paper proposes a distributed algorithm to detect outliers for large and distributed datasets. The algorithm employs the basis of distance-based outliers based on the distance of a point to its kth nearest neighbor. It declares the top n points in the ranking to be outliers. To the best of our knowledge, this is the first proposal of a distributed algorithm for outlier detection for shared-nothing multiple processor computing environments. It has four phases. First, in each processing node, the algorithm partitions the input data set into disjoint subsets, then it prunes entire partitions as soon as it is determined that they cannot contain outliers. Then it applies a global filtering technique to collect the partitions as global candidates from local candidate partitions in each processing node. Further, it introduces a load balancing algorithm to balance the number of local candidate partitions. Finally, it identifies outliers from each processing node.

...read moreread less

4 citations

Book Chapter•DOI•

LocalRank: ranking web pages considering geographical locality by integrating web and databases

[...]

Jianwei Zhang, Yoshiharu Ishikawa, Sayumi Kurokawa, Hiroyuki Kitagawa

22 Aug 2005

TL;DR: This paper proposes a method called LocalRank to rank web pages by integrating the web and a user database containing information on a specific geographical area using a linked graph structure using entries contained in the database.

...read moreread less

Abstract: In this paper, we propose a method called LocalRank to rank web pages by integrating the web and a user database containing information on a specific geographical area. LocalRank is a rank value for a web page to assess its relevance degree to database entries considering geographical locality and its popularity on a local web space. In our method, we first construct a linked graph structure using entries contained in the database. The nodes of this graph consist of database entries and their related web pages. The edges in the graph are composed of semantic links including geographical links between these nodes, in addition to conventional hyperlinks. Then a link analysis is performed to compute a LocalRank value for each node. LocalRank can represent user's interest since this graph effectively integrates the web and the user database. Our experimental results for a local restaurant database shows that local web pages related to the database entries are highly ranked based on our method.

...read moreread less

Journal Article•

LocalRank : Ranking web pages considering geographical locality by integrating web and databases

[...]

Jianwei Zhang, Yoshiharu Ishikawa, Sayumi Kurokawa, Hiroyuki Kitagawa

01 Jan 2005-Lecture Notes in Computer Science

TL;DR: In this paper, the authors propose a method called LocalRank to rank web pages by integrating the web and a user database containing information on a specific geographical area, which is a rank value for a web page to assess its relevance degree to database entries considering geographical locality and its popularity on local web space.

...read moreread less

Abstract: In this paper, we propose a method called LocalRank to rank web pages by integrating the web and a user database containing information on a specific geographical area. LocalRank is a rank value for a web page to assess its relevance degree to database entries considering geographical locality and its popularity on a local web space. In our method, we first construct a linked graph structure using entries contained in the database. The nodes of this graph consist of database entries and their related web pages. The edges in the graph are composed of semantic links including geographical links between these nodes, in addition to conventional hyperlinks. Then a link analysis is performed to compute a LocalRank value for each node. LocalRank can represent user's interest since this graph effectively integrates the web and the user database. Our experimental results for a local restaurant database shows that local web pages related to the database entries are highly ranked based on our method.

...read moreread less

Extracting New Topic Documents from Hidden Web Sites

[...]

Takanori Mouri, Hiroyuki Kitagawa

15 Mar 2005

Journal Article•DOI•

Conveying taxonomy context for topic-focused Web search: Research Articles

[...]

Said Mirza Pahlevi¹, Hiroyuki Kitagawa²•Institutions (2)

National Institute of Advanced Industrial Science and Technology¹, University of Tsukuba²

01 Jan 2005-Journal of the Association for Information Science and Technology

TL;DR: This article proposes a method employing the taxonomy-based search services such as Web directories to facilitate searches in any Web search interfaces that support Boolean queries, and develops new fast classification learning algorithms for this method.

...read moreread less

Abstract: Introducing context to a user query is effective to improve the search effectiveness. In this article we propose a method employing the taxonomy-based search services such as Web directories to facilitate searches in any Web search interfaces that support Boolean queries. The proposed method enables one to convey current search context on taxonomy of a taxonomy-based search service to the searches conducted with the Web search interfaces. The basic idea is to learn the search context in the form of a Boolean condition that is commonly accepted by many Web search interfaces, and to use the condition to modify the user query before forwarding it to the Web search interfaces. To guarantee that the modified query can always be processed by the Web search interfaces and to make the method adaptive to different user requirements on search result effectiveness, we have developed new fast classification learning algorithms. © 2005 Wiley Periodicals, Inc.

...read moreread less

Proceedings Article•DOI•

Extended Link Analysis for Extracting Spatial Information Hubs

[...]

Jianwei Zhang¹, Yoshiharu Ishikawa¹, Hiroyuki Kitagawa¹•Institutions (1)

University of Tsukuba¹

08 Apr 2005

TL;DR: This paper proposes an approach to extract spatial information hubs from the Web to create spatial nodes and spatial links, and conducts a link analysis based on the extended link structures.

...read moreread less

Abstract: Recently, Web mining that tries to find useful knowledge from the vast amount of Web pages has attracted a lot of research interests. Besides, it is becoming an essential task to provide Web pages related to a user-specified geographic area. In this paper, we propose an approach to extract spatial information hubs from the Web. A spatial information hub is a Web page which is related to a specified geographic area and has much local information and/or many hyperlinks to local Web pages. In the traditional approach of Web link analysis, the importance and quality of pages are judged only by their contents and hyperlink structures. However, we take their geographic localities into consideration. In our approach, we first extract geographic information from Web pages to create spatial nodes and spatial links, then conduct a link analysis based on the extended link structures. We also show our approach works well based on the experiments

...read moreread less

Patent•

Link authority determination method, device, and program

[...]

Toshishige Iida, Hiroyuki Kitagawa, Koko Morishima, Masayoshi Nakamizo, Shigeo Sugimoto, 昌佳中溝, 博之北川, 重雄杉本, 厚行森嶋, 敏成飯田 - Show less +6 more

22 Aug 2005

TL;DR: In this paper, the authors proposed a link authority determination device capable of determining proper link authority in a shorter period of time and with a less labor as much as possible, where the link authority candidate determining is defined as determining a plurality of Web pages satisfying predetermined conditions as link authority candidates from among a plurality collected by a Web page collecting.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a link authority determination device capable of determining proper link authority in a shorter period of time and with a less labor as much as possible. SOLUTION: A link authority candidate determining means 5 decides a plurality of Web pages satisfying predetermined conditions as link authority candidates from among a plurality of Web pages collected by a Web page collecting means 3. A ranking means 17 of a link authority determining means 13 ranks candidate Web pages in the order of the Web page with the small percentage of link shortage. A final determination means 19 determines one or more Web pages positioned in the high order rank as link authority from the result of ranking by the ranking means 17. COPYRIGHT: (C)2007,JPO&INPIT

...read moreread less

Showing papers by "Hiroyuki Kitagawa published in 2005"