scispace - formally typeset
Search or ask a question

Showing papers by "Hiroyuki Kitagawa published in 2010"


Book ChapterDOI
12 Dec 2010
TL;DR: In this paper, TURank (Twitter User Rank), which is an algorithm for evaluating users' authority scores in Twitter based on link analysis, is proposed, and experimental results show that the proposed algorithm outperforms existing algorithms.
Abstract: In this paper, we address the problem of finding authoritative users in a micro-blogging service, Twitter, which is one of the most popular micro-blogging services [1]. Twitter has been gaining a public attention as a new type of information resource, because an enormous number of users transmit diverse information in real time. In particular, authoritative users who frequently submit useful information are considered to play an important role, because useful information is disseminated quickly and widely. To identify authoritative users, it is important to consider actual information flow in Twitter. However, existing approaches only deal with relationships among users. In this paper, we propose TURank (Twitter User Rank), which is an algorithm for evaluating users' authority scores in Twitter based on link analysis. In TURank, users and tweets are represented in a user-tweet graph which models information flow, and ObjectRank is applied to evaluate users' authority scores. Experimental results show that the proposed algorithm outperforms existing algorithms.

146 citations


Journal ArticleDOI
TL;DR: A novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level and preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data.
Abstract: Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the “Database as Service” model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.

58 citations


Proceedings ArticleDOI
04 Nov 2010
TL;DR: This paper proposes an optimized method to not only calculate the probability of outputs of compound events but also obtain the value of confidence of the complex pattern given by user against uncertain raw input data stream generated by distrustful network devices.
Abstract: Pattern matching over event streams is well developed. However, with the increasing demand of measurement accuracy, confidence of more complex events sourced from original, continuously arriving events generated from sensor kind electronic devices is becoming more and more been concerned. Actually, some applications such as RFID-based supply chain management and monitoring in health care require data stream with high reliability, but current hardware and wireless communication techniques cannot support 100% confident data, one stream processing engine which can report confidence for processed complex events over uncertain data is needed. In this paper, we propose an optimized method to not only calculate the probability of outputs of compound events but also obtain the value of confidence of the complex pattern given by user against uncertain raw input data stream generated by distrustful network devices. Our proposal is based on an existing stream processing engine SASE+, and we extend its evaluation model NFAb automaton to a new type of automaton in order to manage the runtime against probabilistic stream. In the design of automaton, we consider optimizations to reduce the computation cost and response time to a realistic degree with long sliding time window.

43 citations


Proceedings Article
01 Jan 2010
TL;DR: A novel database encryption scheme called MV-POPES (Multivalued Partial Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level and is robust against known plaintext attacks and statistical attacks.
Abstract: Encryption is a well-studied technique for protecting the confidentiality of sensitive data. However, encrypting relational databases affects the performance during query processing. Preserving the order of the encrypted values is a useful technique to perform queries over the encrypted database with a reasonable overhead. Unfortunately, the existing order preserving encryption schemes are not secure against known plaintext attacks and statistical attacks. In those attacks, it is assumed that the attacker has prior knowledge about plaintext values or statistical information on the plaintext domain. This paper presents a novel database encryption scheme called MV-POPES (Multivalued Partial Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to divide the plaintext domain into many partitions and randomize them in the encrypted domain. Then, one integer value is encrypted to different multiple values to prevent statistical attacks. At the same time, MV-POPES preserves the order of the integer values within the partitions to allow comparison operations to be directly applied on encrypted data. Our scheme is robust against known plaintext attacks and statistical attacks. MV-POPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.

37 citations


Book ChapterDOI
30 Aug 2010
TL;DR: This paper proposes an efficient algorithm for RFN query with metric index, and adapts the convex hull property to enhance the efficiency, but its computation is not on the fly.
Abstract: The variants of similarity queries have been widely studied in recent decade, such as k-nearest neighbors (k-NN), range query, reverse nearest neighbors (RNN), an so on. Nowadays, the reverse furthest neighbor (RFN) query is attracting more attention because of its applicability. Given an object set O and a query object q, the RFN query retrieves the objects of O, which take q as their furthest neighbor. Yao et al. proposed R-tree based algorithms to handle the RFN query using Voronoi diagrams and the convex hull property of dataset. However, computing the convex hull and executing range query on R-tree are very expensive on the fly. In this paper, we propose an efficient algorithm for RFN query with metric index. We also adapt the convex hull property to enhance the efficiency, but its computation is not on the fly. We select external pivots to construct metric indexes, and employ the triangle inequality to do efficient pruning by using the metric indexes. Experimental evaluations on both synthetic and real datasets are performed to confirm the efficiency and scalability.

12 citations


Journal ArticleDOI
TL;DR: This paper proposes general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a multi‐core system.
Abstract: Purpose – The purpose of this paper is to propose general parallelism techniques for holistic twig join algorithms to process queries against Extensible Markup Language (XML) databases on a multi‐core system.Design/methodology/approach – The parallelism techniques comprised data and task parallelism. As for data parallelism, the paper adopted the stream‐based partitioning for XML to partition XML data as the basis of parallelism on multiple CPU cores. The XML data partitioning was performed in two levels. The first level was to create buckets for creating data independence and balancing loads among CPU cores; each bucket was assigned onto a CPU core. Within each bucket, the second level of XML data partitioning was performed to create finer partitions for providing finer parallelism. Each CPU core performed the holistic twig join algorithm on each finer partition of its own in parallel with other CPU cores. In task parallelism, the holistic twig join algorithm was decomposed into two main tasks, which wer...

10 citations


Book ChapterDOI
25 Oct 2010
TL;DR: Some optimization techniques to reduce the overhead for range queries in MV-POPES by simplifying the translated condition and controlling the randomness of the encrypted partitions are presented.
Abstract: Encryption is a well-studied technique for protecting the privacy of sensitive data. However, encrypting relational databases affects the performance during query processing. Multivalued-Partial Order Preserving Encryption Scheme (MV-POPES) allows privacy preserving queries over encrypted databases with reasonable overhead and an improved security level. It divides the plaintext domain into many partitions and randomizes them in the encrypted domain. Then, one integer value is encrypted to different multiple values to prevent statistical attacks. At the same time, MV-POPES preserves the order of the integer values within the partitions to allow comparison operations to be directly applied on encrypted data. However, MV-POPES supports range queries at a high overhead. In this paper, we present some optimization techniques to reduce the overhead for range queries in MV-POPES by simplifying the translated condition and controlling the randomness of the encrypted partitions. The basic idea of our approaches is to classify the partitions into many supersets of partitions, then restrict the randomization within each superset. The supersets of partitions are created either based on predefined queries or using binary recursive partition. Experiments show high improvement percentage in performance using the proposed optimization approaches. Also, we study the affect of those optimization techniques on the privacy level of the encrypted data.

9 citations


Journal ArticleDOI
TL;DR: This paper proposes an efficient data stream processing scheme for multiple event-driven continuous queries that are activated by foreign events such as data arrival and the progression of time and introduces query result caching to achieve a flexible way to share common operators among queries activated by unpredictable events.

8 citations


Proceedings ArticleDOI
04 Nov 2010
TL;DR: This work attempts to design and implement a dedicated faceted navigation system for QCDml on top of an XML database, and makes use of a relational database system as the engine to speed up the aggregate computation.
Abstract: In this paper we describe a faceted navigation system for QCDml ensemble XML data, which is an XML-based metadata format for ILDG (International Lattice Data Grid). A faceted navigation system allows a user to search for one's desired information in an exploratory way, thereby enabling the user to browse a set of XML data without using specialized query languages such as XPath and XQuery. However, designing a faceted navigation interface for XML data is not straightforward due to the tree and flexible, tree-like nature of XML. In this work, we attempt to design and implement a dedicated faceted navigation system for QCDml on top of an XML database. The interface is designed by taking the domain experts' usability into account. We also care about the system's performance. In general, the process of faceted navigation is computationally expensive because of the need for aggregate computation of each available facets. In order to alleviate this, we make use of a relational database system as the engine to speed up the aggregate computation. We finally demonstrate the implemented faceted navigation system, which has been made available on the Web.

4 citations


Book ChapterDOI
TL;DR: This chapter presents a framework that directly supports efficient processing and a variety of advanced functions on event detection, which include complex event processing, probabilistic reasoning, and continuous media integration.
Abstract: For real world oriented applications to easily use sensor data obtained frommultiple wireless sensor networks, a data management infrastructure is mandatory. The infrastructure design should be based on the philosophy of a novel framework beyond the relational data management for two reasons: First is the freshness of data. To keep sensor data fresh, an infrastructure should process data efficiently; this means conventional time consuming transaction processing methodology is inappropriate. Second is the diversity of functions. The primary purpose of sensor data applications is to detect events; unfortunately, relational operators contribute little toward this purpose. This chapter presents a framework that directly supports efficient processing and a variety of advanced functions. Stream processing is the key concept of the framework. Regarding the efficiency requirement, we present a multiple query optimization technique for query processing over data streams; we also present an efficient data archiving technique. To meet the functions requirement, we present several techniques on event detection, which include complex event processing, probabilistic reasoning, and continuous media integration.

3 citations


Proceedings ArticleDOI
23 May 2010
TL;DR: A new high-availability scheme called Adaptive Semi-Active Standby (A-SAS) is proposed, which enables adaptive tradeoff between bandwidth usage and recovery time and experimental results suggest effectiveness.
Abstract: Distributed stream processing engines (DSPEs) have recently been studied to meet the needs of continuous query processing. Because they are built on the cooperation of several stream processing engines (SPEs), node failures cause the whole system to fail. This paper proposes a new high-availability scheme called Adaptive Semi-Active Standby (A-SAS). A-SAS enables adaptive tradeoff between bandwidth usage and recovery time. This paper presents the properties of A-SAS and experimental results that suggest A-SAS effectiveness.

Proceedings ArticleDOI
08 Nov 2010
TL;DR: This paper proposes RDF packages, which is a time and space efficient format for RDF data, which can be applied without any modification when performing RDFS reasoning, and demonstrates the performance of the proposed scheme in triple size, reasoning speed, and querying speed.
Abstract: When querying RDF and RDFS data, for improving the performance, it is common to derive all triples according to RDFS entailment rules before query processing. An undesirable drawback of this approach is that a large number of triples are generated by the RDFS reasoning, and hence considerable amount of storage space is required if we materialize the RDFS closure. In this paper, we propose RDF packages, which is a time and space efficient format for RDF data. In an RDF package, a set of triples of the same class or triples having the same predicate are grouped into a dedicated node named Package. Using Packages, we can represent any metadata that can be expressed by RDF. An important feature of the RDF packages is that, when performing RDFS reasoning, the same rules can be applied without any modification, thereby allowing us to use existing RDFS reasoners. In this paper, we discuss the model of RDF packages and its rules, followed by the transformation between RDF and RDF packages. We also discuss the implementation RDF packages using an existing RDF framework. Finally, we demonstrate the performance of the proposed scheme in triple size, reasoning speed, and querying speed.

Journal ArticleDOI
TL;DR: By focusing on the timestamp sequence of social bookmarkings on web pages, this paper model their activation levels representing current values and improves the previously proposed ranking method for web search by introducing the activation level concept.
Abstract: Social bookmarking services have recently made it possible for us to register and share our own bookmarks on the web and are attracting attention. The services let us get structured data: (URL, Username, Timestamp, Tag Set). And these data represent user interest in web pages. The number of bookmarks is a barometer of web page value. Some web pages have many bookmarks, but most of those bookmarks may have been posted far in the past. Therefore, even if a web page has many bookmarks, their value is not guaranteed. If most of the bookmarks are very old, the page may be obsolete. In this paper, by focusing on the timestamp sequence of social bookmarkings on web pages, we model their activation levels representing current values. Further, we improve our previously proposed ranking method for web search by introducing the activation level concept. Finally, through experiments, we show effectiveness of the proposed ranking method.

Book ChapterDOI
01 Apr 2010
TL;DR: An event detection system that extracts candidate events from satellite images, collects information about them from the Web, and integrates them, and the result of evaluation showed that the system detected some information of building construction events with appropriate web contents in Tsukuba, Japan.
Abstract: It has become easy to accumulate and to deliver scientific data by the evolution of computer technologies. The GEO Grid project has collected global satellite images from 2000 to present, and the amount of the collection is about 150 TB. It is required to generate new values by integrating satellite images with heterogeneous information such as Web contents or geographical data. Using GEO Grid satellite images, some researches detect feature changes such as earthquakes, fires and newly constructed building. In this paper, detections of feature changes from time series satellite image are referred to as events, and we focus on events about newly constructed buildings. Usually, there are articles about such newly constructed buildings on the Web. For example, a newly started shopping center is usually introduced in a news report, and a newly constructed apartment is often on the lips of neighboring residents. So, we propose an event detection system that extracts candidate events from satellite images, collects information about them from theWeb, and integrates them. This system consists of an event detection module and a Web contents collection module. The event detectionmodule detects geographical regions that have differences with elevation values between two satellite images which are temporally different. The expressions of regions are translated from latitudes/longitudes to building names by using an inverse geocoder. Then, the contents collection module collects Web pages by querying names of buildings to a search engine. The collected pages are re-ranked based on temporal information which is close to event occurrence time. We developed a prototype system. The result of evaluation showed that the system detected some information of building construction events with appropriate web contents in Tsukuba, Japan.

Proceedings ArticleDOI
08 Nov 2010
TL;DR: This paper proposes an algorithm for extracting complex records like XML by utilizing an existing IE technique, and points out a naive implementation that does not work well, and proposes an improved scheme for more efficient XML record extraction.
Abstract: Information Extraction (IE) is a technique to extract structured information (record) from unstructured documents such as Web pages. However, existing techniques are basically aiming at extracting simple records, such as binary relationships like "(company, location)" or named entities like "(organization)". In this paper, we propose an algorithm for extracting complex records like XML by utilizing an existing IE technique. Given a set of seed records in the form of XML data (XML records), we firstly infer the schema information from the XML records. Then, we transform the XML records to a set of relational records consisting of several tables. The obtained relational tables are decomposed into a set of binary relations, and they are forwarded to a record extraction system. We reconstruct XML data from the results obtained from the record of the extraction system. We point out a naive implementation docs not work well, and propose an improved scheme for more efficient XML record extraction. We evaluate the effectiveness of our proposed algorithm in some experiments.

Proceedings ArticleDOI
14 Jan 2010
TL;DR: A system to maintain the content integrity of Web sites without backend databases is proposed and weak inclusion relationships are proposed, which are inclusion relationships associated with inclusion ratios.
Abstract: Today, publishing information on Web sites is common. And the size of the Web contents that need to be managed is increasing. Therefore it is important to maintain content integrities on the Web. This paper proposes a system to maintain the content integrity of Web sites without backend databases. First, we explain the architecture of the proposed system. Second, we address the problem of finding integrity constraints used as the input to the system. We focus on inclusion dependencies among HTML/XML elements and discuss how to find inclusion relationships that can be used as hints to find inclusion dependencies. In particular, we propose to introduce weak inclusion relationships, which are inclusion relationships associated with inclusion ratios. Finally, we propose a filter-based approach to the efficient discovery of weak inclusion relationships and discuss some of its possible implementations.

Book ChapterDOI
17 Sep 2010
TL;DR: A scheme for efficiently detecting functional dependency in XML data (XFD) by modifying the basic PipeSort algorithm by incorporating a pruning mechanism by taking the features of XFDs into account, thereby making the whole process even faster.
Abstract: In this paper we discuss a scheme for efficiently detecting functional dependency in XML data (XFD). The ability to detect XFD in XML data is useful in many real-life applications, such as XML schema design, relational schema design based on XML data, and redundancy detection in XML data. However, detection of XFD is an expensive task, and an efficient algorithm is essential in order to deal with large XML data collection. For this reason, we propose an efficient way to detect XFD in XML data. We assume that XML data being processed are represented as hierarchically organized relational tables. Given such data, we attempt to detect XFDs existing within and among the tables. Our basic idea is to adopt the PipeSort algorithm, which has been successfully used in OLAP, to detect XFDs within a table. We modify the basic PipeSort algorithm by incorporating a pruning mechanism by taking the features of XFDs into account, thereby making the whole process even faster. Having obtained a set of XFDs existing in tables, we attempt to detect XFDs existing among tables. In this process, we also make use of the features of XFDs for pruning. We show the feasibility of our scheme by some experiments.

Journal ArticleDOI
TL;DR: This paper addresses the problem of load balancing in large scale and selforganizing P2P systems managing multidimensional data, and proposes simple and efficient decentralized mechanisms to evenly distribute the data load among the participating nodes in Content Addressable Networks.
Abstract: Balancing the load in a decentralized P2P system is a challenging problem due to the dynamic nature of such environment and the absence of global knowledge about the actual composition of the system.In this paper, we address the problem of load balancing in large scale and selforganizing P2P systems managing multidimensional data. We propose simple and efficient decentralized mechanisms to evenly distribute the data load among the participating nodes in Content Addressable Networks. The basic idea is to enable a new node that joins the system to share the load with a heavily loaded node which is already in the system, such that the load is still evenly distributed among all the participating nodes. In the multiple random choices method, the new node probes the load of some existing nodes selected uniformly at random, then chooses the heaviest node among them to share the load with. In this paper, we extend this method in three ways. First, the new node probes a pool of nodes proportional to the network size and composition. Specifically, the number of probed nodes is logarithmic to the network size. This property enables to achieve a constant load imbalance factor which is very small without the need to estimate the network size. Second, the probed nodes are not selected at random, but they are well spread over the key space; which enables a good estimation of the actual data distribution and network composition, which enables to cope well with large-scale data imbalance. Third, the selection of nodes to probe is restricted to the immediate and distant neighbors of a randomly chosen node. The cost incurred by our join-based load balancing method is very small, since all load information is piggybacked to periodic maintenance messages exchanged between nodes and their neighbors. Unlike other methods, we do not make use of external index nor assume any global knowledge. We also generalize the first method to enable locating a heavily loaded node through a sequential walk starting from a randomly selected node. This new method incurs additional overhead, however it achieves much smaller load imbalance. We also study the robustness of our join-based load balancing method against adversarial attacks. Using simulation, we analyze the impact of the number of entry points on load balancing. To the best of our knowledge we are the first to address this problem. We conduct an experimental study using uniform and nonuniform data distributions to demonstrate the effectiveness and the scalability of our proposals.

Proceedings ArticleDOI
17 Dec 2010
TL;DR: This paper proposed a novel framework of a system which improves the awareness of the outline of video retrieval result, detects topics from a set of retrieval matched videos utilizing time data, analyzes topic properties using author's diversity and offers an interface which lets person to grasp the outlines of video retrieved result in one glance.
Abstract: A video-sharing service, where users can upload videos to, have rapidly spread recently. The number of videos in the service has increased dramatically as well as the number of users. With the rapid growth of the number of videos, we are faced with is having to arrange and categorize videos. However, it is difficult to let a computer aware the contents (or topics) of videos. This paper focused on such video-sharing services and proposed a novel framework of a system which improves the awareness of the outline of video retrieval result. Our proposed system detects topics from a set of retrieval matched videos utilizing time data, analyzes topic properties using author's diversity and offers an interface which lets person to grasp the outline of video retrieval result in one glance. We also presented an experimental result of applying our proposed methods for some sets of video retrieval result.