scispace - formally typeset
Search or ask a question

Showing papers by "Wang-Chien Lee published in 2006"


Journal ArticleDOI
TL;DR: A novel scheduling algorithm called SIN-/spl alpha/ is proposed that takes the urgency and number of outstanding requests into consideration and significantly outperforms existing algorithms over a wide range of workloads and approaches the analytical bound at high request rates.
Abstract: On-demand broadcast is an effective wireless data dissemination technique to enhance system scalability and deal with dynamic user access patterns. With the rapid growth of time-critical information services in emerging applications, there is an increasing need for the system to support timely data dissemination. This paper investigates online scheduling algorithms for time-critical on-demand data broadcast. We propose a novel scheduling algorithm called SIN-/spl alpha/ that takes the urgency and number of outstanding requests into consideration. An efficient implementation of SIN-/spl alpha/ is presented. We also analyze the theoretical bound of request drop rate when the request arrival rate rises toward infinity. Trace-driven experiments show that SIN-/spl alpha/ significantly outperforms existing algorithms over a wide range of workloads and approaches the analytical bound at high request rates.

179 citations


Journal ArticleDOI
01 Jan 2006
TL;DR: This paper introduces a new index method, called the grid-partition index, to support NN search in both on-demand access and periodic broadcast modes of mobile computing and develops an incremental construction algorithm to address the issue of object update.
Abstract: Traditional nearest-neighbor (NN) search is based on two basic indexing approaches: object-based indexing and solution-based indexing. The former is constructed based on the locations of data objects: using some distance heuristics on object locations. The latter is built on a precomputed solution space. Thus, NN queries can be reduced to and processed as simple point queries in this solution space. Both approaches exhibit some disadvantages, especially when employed for wireless data broadcast in mobile computing environments.In this paper, we introduce a new index method, called the grid-partition index, to support NN search in both on-demand access and periodic broadcast modes of mobile computing. The grid-partition index is constructed based on the Voronoi diagram, i.e., the solution space of NN queries. However, it has two distinctive characteristics. First, it divides the solution space into grid cells such that a query point can be efficiently mapped into a grid cell around which the nearest object is located. This significantly reduces the search space. Second, the grid-partition index stores the objects that are potential NNs of any query falling within the cell. The storage of objects, instead of the Voronoi cells, makes the grid-partition index a hybrid of the solution-based and object-based approaches. As a result, it achieves a much more compact representation than the pure solution-based approach and avoids backtracked traversals required in the typical object-based approach, thus realizing the advantages of both approaches.We develop an incremental construction algorithm to address the issue of object update. In addition, we present a cost model to approximate the search cost of different grid partitioning schemes. The performances of the grid-partition index and existing indexes are evaluated using both synthetic and real data. The results show that, overall, the grid-partition index significantly outperforms object-based indexes and solution-based indexes. Furthermore, we extend the grid-partition index to support continuous-nearest-neighbor search. Both algorithms and experimental results are presented.

142 citations


Journal ArticleDOI
01 Mar 2006
TL;DR: Simulation results indicate that the proposed aggregate caching mechanism and a broadcast-based Simple Search algorithm can significantly improve an Imanet performance in terms of throughput and average number of hops to access data items.
Abstract: Internet-based mobile ad hoc network (Imanet) is an emerging technique that combines a wired network (e.g. Internet) and a mobile ad hoc network (Manet) for developing a ubiquitous communication infrastructure. To fulfill users' demand to access various kinds of information, however, an Imanet has several limitations such as limited accessibility to the wired Internet, insufficient wireless bandwidth, and longer message latency. In this paper, we address the issues involved in information search and access in Imanets. An aggregate caching mechanism and a broadcast-based Simple Search (SS) algorithm are proposed for improving the information accessibility and reducing average communication latency in Imanets. As a part of the aggregate cache, a cache admission control policy and a cache replacement policy, called Time and Distance Sensitive (TDS) replacement, are developed to reduce the cache miss ratio and improve the information accessibility. We evaluate the impact of caching, cache management, and the number of access points that are connected to the Internet, through extensive simulation. The simulation results indicate that the proposed aggregate caching mechanism can significantly improve an Imanet performance in terms of throughput and average number of hops to access data items.

103 citations


Proceedings ArticleDOI
23 May 2006
TL;DR: A new architecture and data model is proposed, CiteSeerx, that will overcome the existing problems as well as provide scalability and better performance plus new services and system features.
Abstract: CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the field of computer and information science. After serving as a public search engine for nearly ten years, CiteSeer is starting to have scaling problems for handling of more documents, adding new feature and more users. Its monolithic architecture design prevents it from effectively making use of new web technologies and providing new services. After analyzing the current system problems, we propose a new architecture and data model, CiteSeerx. CiteSeerx that will overcome the existing problems as well as provide scalability and better performance plus new services and system features.

84 citations


Proceedings ArticleDOI
03 Apr 2006
TL;DR: This paper proposes an infrastructure-free window query processing technique for sensor networks, called itinerary-based window query execution (IWQE), in which query propagation and data collection are combined into one single stage and executed along a well-designed itinerary inside a query window.
Abstract: The existing query processing techniques for sensor networks rely on a network infrastructure for query propagation and data collection. However, such an infrastructure is very susceptible to network topology transients that widely exist in sensor networks. In this paper, we propose an infrastructure-free window query processing technique for sensor networks, called itinerary-based window query execution (IWQE), in which query propagation and data collection are combined into one single stage and executed along a well-designed itinerary inside a query window. We study the parameters for setting up an itinerary (e.g., width and route) and incorporate into IWQE three data collection schemes based on different performance trade-offs. Finally we demonstrate, by extensive simulations, the superior energy-time efficiency, robustness, and accuracy of IWQE over the current state-of-the-art techniques in supporting window queries under various network conditions.

81 citations


Proceedings ArticleDOI
03 Apr 2006
TL;DR: This paper exploits the semantics of top-k query and proposes a novel energy-efficient monitoring approach, called FILA, which outperforms the existing TAGbased approach by an order of magnitude.
Abstract: Top-k monitoring is important to many wireless sensor applications. This paper exploits the semantics of top-k query and proposes a novel energy-efficient monitoring approach, called FILA. The basic idea is to install a filter at each sensor node to suppress unnecessary sensor updates. The correctness of the top-k result is ensured if all sensor nodes perform updates according to their filters. We show via simulation that FILA outperforms the existing TAGbased approach by an order of magnitude.

75 citations


Proceedings ArticleDOI
13 Mar 2006
TL;DR: A weighted regression algorithm is presented for efficient and accurate estimation of link quality in wireless sensor networks that captures the spatial correlation in quality of links between a sensor node and its neighbor nodes, such that the quality of a link to a neighbor node can be estimated based on thequality of links to other nodes geographically close.
Abstract: The irregularity in quality of wireless communication links poses significant research challenges in wireless sensor network design. Dynamic network conditions and environmental factors make an online, self-adapted link quality estimation mechanism within sensor nodes a necessity for making routing decisions and improving network performance. In this paper, we present a weighted regression algorithm for efficient and accurate estimation of link quality in wireless sensor networks. This algorithm captures the spatial correlation in quality of links between a sensor node and its neighbor nodes, such that the quality of a link to a neighbor node can be estimated based on the quality of links to other nodes geographically close. We evaluate the proposed algorithm using a trace-based simulator which takes into account the variances of link quality over time and spatial locations. The experimental results show that the weighted regression algorithm is able to achieve more accurate estimates than WMEWMA, a state-of-the-art link quality estimator, at a much lower communication cost.

55 citations


Journal ArticleDOI
TL;DR: Simulation results show that the exponential index substantially outperforms the state-of-the-art indexes, and is more resilient to link errors and achieves more performance advantages from index caching.
Abstract: Access efficiency and energy conservation are two critical performance concerns in a wireless data broadcast system. We propose in this paper a novel parameterized index called the exponential index that has a linear yet distributed structure for wireless data broadcast. Based on two tuning knobs, index base and chunk size, the exponential index can be tuned to optimize the access latency with the tuning time bounded by a given limit, and vice versa. The client access algorithm for the exponential index under unreliable broadcast is described. A performance analysis of the exponential index is provided. Extensive ns-2-based simulation experiments are conducted to evaluate the performance under various link error probabilities. Simulation results show that the exponential index substantially outperforms the state-of-the-art indexes. In particular, it is more resilient to link errors and achieves more performance advantages from index caching. The results also demonstrate its great flexibility in trading access latency with tuning time.

54 citations


Proceedings ArticleDOI
03 Apr 2006
TL;DR: This paper explores angle-based and distance-based bound properties of polygons, and devise two efficient algorithms, namely, Sweep and Ripple, based on R-tree, which access objects in an order according to their orientations and distances with respect to a given query point, respectively.
Abstract: In this paper, we study a new type of spatial query, Nearest Surrounder (NS), which searches the nearest surrounding spatial objects around a query point. NS query can be more useful than conventional nearest neighbor (NN) query as NS query takes the object orientation into consideration. To address this new type of query, we identify angle-based bounding properties and distance-bound properties of Rtree index. The former has not been explored for conventional spatial queries. With these identified properties, we propose two algorithms, namely, Sweep and Ripple. Sweep searches surrounders according to their orientation, while Ripple searches surrounders ordered by their distances to the query point. Both algorithms can deliver result incrementally with a single dataset lookup. We also consider the multiple-tier NS (mNS) query that searches multiple layers of NSs. We evaluate the algorithms and report their performance on both synthetic and real datasets.

45 citations


Proceedings ArticleDOI
12 Nov 2006
TL;DR: This study proposes a framework, called distributed peer tree (DPTree), which efficiently supports various types of queries on multidimensional data in P2P systems based on balanced tree indexes and verifies the superiority of DPTree over existing works.
Abstract: Peer-to-peer (P2P) systems have been widely used for exchange of voluminous information and resources among thousands or even millions of users. Since shared data are normally identified by multiple attributes, a fundamental issue in P2P systems is to efficiently support complex queries on multi-dimensional data. Prior works suffer from some fundamental limitations, such as being constrained to support certain types of queries, excessive maintenance overheads, and etc. In this study, we propose a framework, called distributed peer tree (DPTree), which efficiently supports various types of queries on multi-dimensional data in P2P systems based on balanced tree indexes. DPTree achieves the efficiency through the following designs: 1) distributing the tree structure among peers in a way preserving the nice properties of balanced tree structures yet avoiding single points of failure and performance bottlenecks; 2) organizing peers into an overlay structure that enables efficient navigation yet is easy to maintain; 3) an efficient navigation algorithm; 4) an innovative wavelet-based load balancing mechanism. Through extensive performance evaluation, we verify the superiority of DPTree over existing works.

43 citations


Proceedings ArticleDOI
30 May 2006
TL;DR: The problems of the current CiteSeer architecture are discussed and a new architecture for a next generation Cite Seer application is proposed, based on modular web services and pluggable service components.
Abstract: CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.

Proceedings ArticleDOI
10 May 2006
TL;DR: The proposed heterogeneous tracking model, referred to as HTM, is able to accurately predict the movements of objects and thus reduces the energy consumption for object tracking.
Abstract: In this paper, we propose a heterogeneous tracking model, referred to as HTM, to efficiently mine object moving patterns and track objects. Specifically, we use a variable memory Markov model to exploit the dependencies among object movements. Furthermore, due to the hierarchical nature of HTM, multi-resolution object moving patterns are provided. The proposed HTM is able to accurately predict the movements of objects and thus reduces the energy consumption for object tracking. Simulation results show that HTM not only is able to effectively mine object moving patterns but also save energy in tracking objects.

Proceedings ArticleDOI
05 Jun 2006
TL;DR: It is argued that query brokering and access control are not two orthogonal issues because access control deployment strategies can have a significant impact on the "whole" system's end-to-end performance.
Abstract: An XML brokerage system is a distributed XML database system that comprises data sources and brokers which, respectively, hold XML documents and document distribution information. However, all existing information brokerage systems view or handle query brokering and access control as two orthogonal issues: query brokering is a system issue that concerns costs and performance, while access control is a security issue that concerns information confidentiality. As a result, access control deployment strategies (in terms of where and when to do access control) and the impact of such strategies on end-to-end system performance are neglected by existing information brokerage systems. In addition, data source side access control deployment is taken-for-granted as the "right" thing to do. In this paper, we challenge this traditional, taken-for-granted access control deployment methodology, and argue that query brokering and access control are not two orthogonal issues because access control deployment strategies can have a significant impact on the "whole" system's end-to-end performance. We propose the first in-broker access control deployment strategy where access control is "pushed" from the boundary into the "heart" of the information brokerage system.

Proceedings ArticleDOI
30 May 2006
TL;DR: This work examines the problem of skyline query processing in P2P systems and proposes approximate algorithms to support skyline queries where exact answers are too costly to obtain and produces high quality answers using heuristics based on local semantics of peers.
Abstract: Skyline queries have received a lot of attention from database and information retrieval research communities. A skyline query returns a set of data objects that is not dominated by any other data objects in a given dataset. However, most of existing studies focus on skyline query processing in centralized systems. Only recently, skyline queries are considered in a distributed computing environment. Acknowledging the trend toward peer-to-peer (P2P) systems in distributed computing, we examine the problem of skyline query processing in P2P systems and propose innovative solutions. We exploit the data semantic embedded in semantically structured P2P overlay networks to efficiently prune search space, without compromising the quality of query result. In addition, we propose approximate algorithms to support skyline queries where exact answers are too costly to obtain. These approximate algorithms produce high quality answers using heuristics based on local semantics of peer nodes. Extensive experiments validate that our algorithms provides high efficiency and scalability to skyline query processing in P2P systems.

Proceedings ArticleDOI
30 May 2006
TL;DR: This paper proposes a fully distributed clustering algorithm, called Peer dENsity-based cluStering (PENS), which overcomes the challenge raised in performing clustering in peer-to-peer environments, i.e., cluster assembly.
Abstract: Huge amounts of data are available in large-scale networks of autonomous data sources dispersed over a wide area. Data mining is an essential technology for obtaining hidden and valuable knowledge from these networked data sources. In this paper, we investigate clustering, one of the most important data mining tasks, in one of such networked computing environments, i.e., peer-to-peer (P2P) systems. The lack of a central control and the sheer large size of P2P systems make the existing clustering techniques not applicable here. We propose a fully distributed clustering algorithm, called Peer dENsity-based cluStering (PENS), which overcomes the challenge raised in performing clustering in peer-to-peer environments, i.e., cluster assembly. The main idea of PENS is hierarchical cluster assembly, which enables peers to collaborate in forming a global clustering model without requiring a central control or message flooding. The complexity analysis of the algorithm demonstrates that PENS can discover clusters and noise efficiently in P2P systems.

Proceedings ArticleDOI
26 Jun 2006
TL;DR: This paper presents a system framework to support continuous nearest surrounder queries in moving object environments and proposes algorithms namely, safe region formation and partial query evaluation, that can significantly improve the system performance.
Abstract: This paper presents a system framework to support continuous nearest surrounder (NS) queries in moving object environments. NS query finds the nearest objects at individual distinct angles from a query point. This query distinguishes itself from other conventional spatial queries such as range queries and nearest neighbor queries by considering both distance and angular aspects of objects with respect to a query point. One of NS query applications is to monitor the nearest objects around an observation point. In our framework, a centralized server is dedicated to collect object location updates, to determine affected NS queries of each object location update, to compute the incremental result change of affected queries and to deliver result updates to corresponding interested users/applications that initiate the queries. In particular, we propose algorithms namely, safe region formation and partial query evaluation, that can significantly improve the system performance. Through simulations, we validate our proposed algorithms over a wide range of settings

Proceedings ArticleDOI
11 Jun 2006
TL;DR: A Bayesian framework is employed to build the ideal citation record for a document that carries the added advantages of fusing information from disparate sources and increasing system resilience to erroneous data.
Abstract: Citation matching, or the automatic grouping of bibliographic references that refer to the same document, is a data management problem faced by automatic digital libraries for scientific literature such as CiteSeer and Google Scholar. Although several solutions have been offered for citation matching in large bibliographic databases, these solutions typically require expensive batch clustering operations that must be run offline. Large digital libraries containing citation information can reduce maintenance costs and provide new services through efficient online processing of citation data, resolving document citation relationships as new records become available. Additionally, information found in citations can be used to supplement document metadata, requiring the generation of a canonical citation record from merging variant citation subfields into a unified "best guess" from which to draw information. Citation information must be merged with other information sources in order to provide a complete document record. This paper outlines a system and algorithms for online citation matching and canonical metadata generation. A Bayesian framework is employed to build the ideal citation record for a document that carries the added advantages of fusing information from disparate sources and increasing system resilience to erroneous data.

Journal Article
TL;DR: In this paper, the authors propose to maintain Materialized In-Network Views (MINVs) that precompute and store commonly used aggregation results in the sensor network to reduce the number of sensor accesses.
Abstract: To process aggregation queries issued through different sensors as access points in sensor networks, existing algorithms handle queries independently and perform in-network aggregation only at the query time. As a result of ad-hoc and independent execution of queries, no partial result is sharable and reusable among the queries. Consequently, scarce sensor network resources can be easily overconsumed, particularly, those sensors commonly accessed by queries. In this paper, we address this issue by examining strategies to maintain Materialized In-Network Views (MINVs) that pre-compute and store commonly used aggregation results in the sensor network. With MINVs, aggregated sensed results for some spatial regions are available and sharable to queries. Thus, the number of sensor accesses is greatly reduced. Through simulations, we validate the effectiveness of proposed strategies.

Book ChapterDOI
26 Mar 2006
TL;DR: Through an extensive performance evaluation, it is shown that CS caching is superior to existing caching schemes for location-based services, including cache memory allocation between objects and CRs, and CR coalescence.
Abstract: In this paper, we propose a novel client-side, multi-granularity caching scheme, called “Complementary Space Caching” (CS caching), for location-based services in mobile environments. Different from conventional data caching schemes that only cache a portion of dataset, CS caching maintains a global view of the whole dataset. Different portions of this view are cached in varied granularity based on the probabilities of being accessed in the future queries. The data objects with very high access probabilities are cached in the finest granularity, i.e., the data objects themselves. The data objects which are less likely to be accessed in the near future are abstracted and logically cached in the form of complementary regions (CRs) in a coarse granularity. CS caching naturally supports all types of location-based queries. In this paper, we explore several design and system issues of CS caching, including cache memory allocation between objects and CRs, and CR coalescence. We develop algorithms for location-based queries and a cache replacement mechanism. Through an extensive performance evaluation, we show that CS caching is superior to existing caching schemes for location-based services.

Proceedings ArticleDOI
05 Jun 2006
TL;DR: Simulation results show that delay-tolerant trajectory compression technique exhibits superior performance in terms of accuracy, communication cost and computation cost and soundly outperforms DPR with all types of movement trajectories.
Abstract: Taking advantage of the delay tolerance for objects tracking sensor networks, we propose delay-tolerant trajectory compression (DTTC) technique, an efficient and accurate algorithm for in-network data compression. In DTTC, each cluster head compresses the movement trajectory of a moving object by a compression function and reports only the compression parameters, which drastically reduces the total amount of data communications required for tracking operations. DTTC supports a broad class of movement trajectories using two techniques, DC-compression and SW-compression, which are designed to minimize the total number of segments to be compressed. Furthermore, we pro pose an efficient trajectory segmentation scheme, which helps both compression techniques to compress movement trajectory more accurately at less computation cost. An extensive simulation has been conducted to compare DTTC with competing prediction-based tracking technique, DPR. Simulation results show that DTTC exhibits superior performance in terms of accuracy, communication cost and computation cost and soundly outperforms DPR with all types of movement trajectories.

Proceedings ArticleDOI
27 Jun 2006
TL;DR: This demonstration paper discusses the architecture and the functionality of the CS Caching Engine that adopts CS caching, and a tourist information named TravelGuide is prototyped with the support of this cache engine.
Abstract: Location-based services (LBS) have emerged as one of the killer applications for mobile and pervasive computing environments. Due to limited bandwidth and scarce client resources, client-side data caching plays an important role of enhancing the data availability and improving the response time. In this demonstration, we present CS Cache Engine suitable for LBS. The underlying caching model is Complementary Space Caching (CS caching) scheme that we have recently presented in [citation]. Different from conventional data caching schemes, CS caching preserves a global view of the database by maintaining physical objects and capturing those objects in the server but not in the cache as Complementary Regions (CRs) in the cache. As a result, with the CS Cache Engine implementing CS caching, client assertiveness on their own answered queries is enhanced so that unnecessary requests over the wireless channel can be avoided; various kinds of location-based queries are naturally supported; and the client's ability to prefetch objects is introduced such that the response time can be further improved. In this demonstration paper, we discuss the architecture and the functionality of the CS Caching Engine that adopts CS caching. Specifically, for this demonstration, a tourist information named TravelGuide is prototyped with the support of this cache engine.

Proceedings ArticleDOI
10 May 2006
TL;DR: This seminar will provide an overview of research issues arising from accessing of location-based services in a mobile computing environment and discuss the state-of-theart solutions.
Abstract: Location based service (LBS) is emerging as a killer application in mobile data services thanks to the rapid development in wireless communication and location positioning technologies. Users with location-aware wireless devices can query about their surroundings (e.g., finding the nearest Japanese restaurant or all shopping malls within 5 miles) at any place, anytime. While this ubiquitous computing paradigm brings great convenience for information access, the constraints of mobile environments, the spatial property of location-dependent data, and the mobility of mobile users pose a great challenge for the provision of location-based services to mobile users. This seminar will provide an overview of research issues arising from accessing of location-based services in a mobile computing environment and discuss the state-of-theart solutions.

Proceedings ArticleDOI
05 Jun 2006
TL;DR: A novel validation algorithm is developed that allows the clients to verify whether their TNN query answers are still valid after they moved to new positions and a comprehensive simulation is conducted to evaluate performance of the proposed TNN search algorithms.
Abstract: Given a query point p, typically the position of a current client, and two datasets S and R, a transitive nearest neighbor (TNN) search returns a pair of objects (s, r)/spl isin/S/spl times/R such that the total distance from p to s and then to r, i.e., dis(p,s)+dis(s,r), is minimum. We propose various algorithms for supporting TNN search as a kind of location-based services in both on-demand-based and broadcast-based mobile environments. In addition, we develop a novel validation algorithm that allows the clients to verify whether their TNN query answers are still valid after they moved to new positions. Finally, we conduct a comprehensive simulation to evaluate performance of the proposed TNN search algorithms.

Book ChapterDOI
12 Apr 2006
TL;DR: This paper examines strategies to maintain Materialized In-Network Views (MINVs) that pre-compute and store commonly used aggregation results in the sensor network and validates the effectiveness of proposed strategies.
Abstract: To process aggregation queries issued through different sensors as access points in sensor networks, existing algorithms handle queries independently and perform in-network aggregation only at the query time. As a result of ad-hoc and independent execution of queries, no partial result is sharable and reusable among the queries. Consequently, scarce sensor network resources can be easily overconsumed, particularly, those sensors commonly accessed by queries. In this paper, we address this issue by examining strategies to maintain Materialized In-Network Views (MINVs) that pre-compute and store commonly used aggregation results in the sensor network. With MINVs, aggregated sensed results for some spatial regions are available and sharable to queries. Thus, the number of sensor accesses is greatly reduced. Through simulations, we validate the effectiveness of proposed strategies.