scispace - formally typeset
Search or ask a question

Showing papers by "Wang-Chien Lee published in 2008"


Proceedings Article•DOI•
20 Jul 2008
TL;DR: Experiments on large-scale tagging datasets of scientific documents and web pages del.icio.us indicate that the proposed framework for real-time tag recommendation is capable of making tag recommendation efficiently and effectively.
Abstract: Tags are user-generated labels for entities. Existing research on tag recommendation either focuses on improving its accuracy or on automating the process, while ignoring the efficiency issue. We propose a highly-automated novel framework for real-time tag recommendation. The tagged training documents are treated as triplets of (words, docs, tags), and represented in two bipartite graphs, which are partitioned into clusters by Spectral Recursive Embedding (SRE). Tags in each topical cluster are ranked by our novel ranking algorithm. A two-way Poisson Mixture Model (PMM) is proposed to model the document distribution into mixture components within each cluster and aggregate words into word clusters simultaneously. A new document is classified by the mixture model based on its posterior probabilities so that tags are recommended according to their ranks. Experiments on large-scale tagging datasets of scientific documents (CiteULike) and web pages del.icio.us) indicate that our framework is capable of making tag recommendation efficiently and effectively. The average tagging time for testing a document is around 1 second, with over 88% test documents correctly labeled with the top nine tags we suggested.

271 citations


Journal Article•DOI•
TL;DR: In this article, the authors discuss the generation, detection, and long-haul transmission of single-polarization differential quadrature phase shift keying (DQPSK) signals at a line rate of 53.5 Gbaud to support a net information bit rate of 100 Gb/s.
Abstract: We discuss the generation, detection, and long-haul transmission of single-polarization differential quadrature phase shift keying (DQPSK) signals at a line rate of 53.5 Gbaud to support a net information bit rate of 100 Gb/s. In the laboratory, we demonstrate 10-channel wavelength-division multiplexed (WDM) point-to-point transmission over 2000 km on a 150-GHz WDM grid, and 1200-km optically routed networking including 6 reconfigurable optical add/drop multiplexers (ROADMs) on a 100-GHz grid. We then report transmission over the commercial, 50-GHz spaced long-haul optical transport platform LambdaXtremereg. In a straight-line laboratory testbed, we demonstrate single-channel 700-km transmission, including an intermediate ROADM. On a field-deployed, live traffic bearing Verizon installation between Tampa and Miami, Florida, we achieve 500-km transmission, with no changes to the commercial system hardware or software and with 6 dB system margin. On the same operational system, we finally demonstrate 100-Gb/s DQPSK encoding on a field-programmable gate array (FPGA) and the transmission of real-time video traffic.

130 citations


Proceedings Article•DOI•
Huajing Li1, Zaiqing Nie2, Wang-Chien Lee1, C. Lee Giles1, Ji-Rong Wen2 •
26 Oct 2008
TL;DR: A hierarchical community model is proposed in the paper which distinguishes community cores from affiliated members and has high scalability to corpus size and feature dimensionality, with more than 15 topical precision improvement compared with popular clustering techniques.
Abstract: Every piece of textual data is generated as a method to convey its authors' opinion regarding specific topics. Authors deliberately organize their writings and create links, i.e., references, acknowledgments, for better expression. Thereafter, it is of interest to study texts as well as their relations to understand the underlying topics and communities. Although many efforts exist in the literature in data clustering and topic mining, they are not applicable to community discovery on large document corpus for several reasons. First, few of them consider both textual attributes as well as relations. Second, scalability remains a significant issue for large-scale datasets. Additionally, most algorithms rely on a set of initial parameters that are hard to be captured and tuned. Motivated by the aforementioned observations, a hierarchical community model is proposed in the paper which distinguishes community cores from affiliated members. We present our efforts to develop a scalable community discovery solution for large-scale document corpus. Our proposal tries to quickly identify potential cores as seeds of communities through relation analysis. To eliminate the influence of initial parameters, an innovative attribute-based core merge process is introduced so that the algorithm promises to return consistent communities regardless initial parameters. Experimental results suggest that the proposed method has high scalability to corpus size and feature dimensionality, with more than 15 topical precision improvement compared with popular clustering techniques.

78 citations


Proceedings Article•DOI•
22 Dec 2008
TL;DR: In this article, the feasibility of 100G overlaying existing 10G/40G commercial systems is demonstrated, showing that 100G can be achieved over a 50GHz grid over 1,040 km field fiber and two ROADMs.
Abstract: 111-Gb/s transmission combined with 2 times 43-Gb/s and 8 times 10.7-Gb/s on a 50-GHz grid over 1,040-km field fiber and two ROADMs is demonstrated, showing the feasibility of 100G overlaying existing 10G/40G commercial systems.

50 citations


Proceedings Article•DOI•
27 Apr 2008
TL;DR: This paper focuses on the query processing and result validation of LDSQ over static objects and proposes two algorithms, namely brute-forth and delta-scanning, which significantly improves the performance via space pruning.
Abstract: Given a set of data points with both spatial coordinates and non-spatial attributes, point a location-dependently dominates point b with respect to a query point q if a is closer to q than b and meanwhile a dominates b. A location- dependent skyline query (LDSQ) issued at point q is to retrieve all the points that are not location-dependently dominated by other points with regard to q. In this paper, we focus on the query processing and result validation of LDSQ over static objects. Two algorithms, namely brute-forth and delta-scanning, are proposed. The former serves as the baseline algorithm while the latter significantly improves the performance via space pruning. We further conduct a comprehensive simulation to demonstrate the performance of proposed algorithms.

49 citations


Journal Article•DOI•
TL;DR: An energy-conserving approximate storage (EASE) scheme to efficiently answer approximate location queries by keeping error-bounded imprecise location data at some designated storage node based on the mobility pattern.
Abstract: Energy efficiency is one of the most critical issues in the design of wireless sensor networks. Observing that many sensor applications for object tracking can tolerate a certain degree of imprecision in the location data of tracked objects, this paper studies precision-constrained approximate queries that trade answer precision for energy efficiency. We develop an energy-conserving approximate storage (EASE) scheme to efficiently answer approximate location queries by keeping error-bounded imprecise location data at some designated storage node. The data impreciseness is captured by a system parameter called the approximation radius. We derive the optimal setting of the approximation radius for our storage scheme based on the mobility pattern and devise an adaptive algorithm to adjust the setting when the mobility pattern is not available a priori or is dynamically changing. Simulation experiments are conducted to validate our theoretical analysis of the optimal approximation setting. The simulation results show that the proposed EASE scheme reduces the network traffic from a conventional approach by up to 96 percent and, in most cases, prolongs the network lifetime by a factor of 2-5.

49 citations


Journal Article•DOI•
TL;DR: Extensive experiments demonstrate that SSW is superior to the state of the art on various aspects, including scalability, maintenance overhead, adaptivity to distribution of data and locality of interest, resilience to peer failures, load balancing, and efficiency in support of various types of queries on data objects with high dimensions.
Abstract: Peer-to-peer (P2P) systems have become a popular platform for sharing and exchanging voluminous information among thousands or even millions of users. The massive amount of information shared in such systems mandates efficient semantic-based search instead of key-based search. The majority of existing proposals can only support simple key-based search rather than semantic-based search. This paper presents the design of an overlay network, namely, semantic small world (SSW), that facilitates efficient semantic-based search in P2P systems. SSW achieves the efficiency based on four ideas: 1) semantic clustering, where peers with similar semantics organize into peer clusters, 2) dimension reduction, where to address the high maintenance overhead associated with capturing high-dimensional data semantics in the overlay, peer clusters are adaptively mapped to a one-dimensional naming space, 3) small world network, where peer clusters form into a one-dimensional small world network, which is search efficient with low maintenance overhead, and 4) efficient search algorithms, where peers perform efficient semantic-based search, including approximate point query and range query in the proposed overlay. Extensive experiments using both synthetic data and real data demonstrate that SSW is superior to the state of the art on various aspects, including scalability, maintenance overhead, adaptivity to distribution of data and locality of interest, resilience to peer failures, load balancing, and efficiency in support of various types of queries on data objects with high dimensions.

48 citations


Journal Article•DOI•
TL;DR: A tradeoff between search performance and freshness is indicated: the search cost decreases sublinearly with decreasing freshness of P2P content sharing under TTL-based consistency.
Abstract: Consistency maintenance is important to the sharing of dynamic contents in peer-to-peer (P2P) networks. The TTL-based mechanism is a natural choice for maintaining freshness in P2P content sharing. This paper investigates TTL-based consistency maintenance in unstructured P2P networks. In this approach, each replica is assigned an expiration time beyond which the replica stops serving new requests unless it is validated. While TTL-based consistency is widely explored in many client-server applications, there has been no study on TTL-based consistency in P2P networks. Our main contribution is an analytical model that studies the search performance and the freshness of P2P content sharing under TTL-based consistency. Due to the random nature of request routing, P2P networks are fundamentally different from most existing TTL-based systems in that every node with a valid replica has the potential to serve any other node. We identify and discuss the factors that affect the performance of P2P content sharing under TTL-based consistency. Our results indicate a tradeoff between search performance and freshness: the search cost decreases sublinearly with decreasing freshness of P2P content sharing. We also compare two types of unstructured P2P networks and find that clustered P2P networks improve the freshness of content sharing over flat P2P networks under TTL-based consistency.

40 citations


Journal Article•DOI•
TL;DR: This work introduces a new variant of RNN query, namely, ranked reverse nearest neighbor (RRNN) query, that retrieves t data points most influenced by q, i.e., theT data points having the smallest kappa's with respect to q are retrieved.
Abstract: Given a set of data points P and a query point q in a multidimensional space, reverse nearest neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-nearest neighbor (RkNN) query (where k ges 1) generalizes RNN query to find data points whose kNNs include q. For RkNN query semantics, q is said to have influence to all those answer data points. The degree of q's influence on a data point p (isin P) is denoted by kappap where q is the kappap-th NN of p. We introduce a new variant of RNN query, namely, ranked reverse nearest neighbor (RRNN) query, that retrieves t data points most influenced by q, i.e., the t data points having the smallest kappa's with respect to q. To answer this RRNN query efficiently, we propose two novel algorithms, kappa-counting and kappa-browsing that are applicable to both monochromatic and bichromatic scenarios and are able to deliver results progressively. Through an extensive performance evaluation, we validate that the two proposed RRNN algorithms are superior to solutions derived from algorithms designed for RkNN query.

36 citations


Proceedings Article•DOI•
24 Feb 2008
TL;DR: In this article, a 107Gb/s field trial was conducted on a traffic carrying longhaul LambdaXtreme® transport platform over an active 504-km Verizon route in Florida thus proving upgradeability to 100 G of the Alcatel-Lucent 50GHz spaced ULH DWDM system.
Abstract: A 107-Gb/s field trial was conducted on a traffic carrying long-haul LambdaXtreme® transport platform over an active 504-km Verizon route in Florida thus proving upgradeability to 100 G of the Alcatel-Lucent 50-GHz spaced ULH DWDM system.

35 citations


Proceedings Article•DOI•
17 Dec 2008
TL;DR: The rule-based localization methods proposed in this paper achieve much higher accuracy than the state-of-the-art localization methods, namely, RADAR, LOCADIO and WHAM!.
Abstract: The rule-based localization methods proposed in this paper are based on two important observations. First, although the absolute RSS values change with time, the relative RSS (RRSS) values between several Access Points (APs) are more stable than the absolute RSSs. Thus, we can use RRSSs as rules for inferring a client's location. Second, when a unique location cannot be obtained based on RRSS rules, the localization process can backtrack to the previous observed client location. By analyzing the accessible paths on the floor plan, locations that are not reacheable from the previous location can be disqualified. Based on these two key observations, we propose several localization methods, implement them in a life environment and conduct extensive experiments to measure the localization accuracy of the proposed methods. We found that our methods achieve much higher accuracy than the state-of-the-art localization methods, namely, RADAR, LOCADIO and WHAM!.

Journal Article•DOI•
TL;DR: This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching, and data access techniques aiming at various application requirements are covered.
Abstract: The rapid advance of wireless and portable computing technology has brought a lot of research interests and momentum to the area of mobile computing. One of the research focus is on pervasive data access. With wireless connections, users can access information at any place at any time. However, various constraints such as limited client capability, limited bandwidth, weak connectivity, and client mobility impose many challenging technical issues. In the past years, tremendous research efforts have been put forth to address the issues related to pervasive data access. A number of interesting research results were reported in the literature. This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching. In addition, data access techniques aiming at various application requirements (such as time, location, semantics and reliability) are covered. Copyright © 2006 John Wiley & Sons, Ltd.

Proceedings Article•DOI•
30 Oct 2008
TL;DR: The implicit user feedback from access logs in the CiteSeer academic search engine is analyzed and it is shown how site structure can better inform the analysis of clickthrough feedback providing accurate personalized ranking services tailored to individual information retrieval systems.
Abstract: Given the exponential increase of indexable context on the Web, ranking is an increasingly difficult problem in information retrieval systems. Recent research shows that implicit feedback regarding user preferences can be extracted from web access logs in order to increase ranking performance. We analyze the implicit user feedback from access logs in the CiteSeer academic search engine and show how site structure can better inform the analysis of clickthrough feedback providing accurate personalized ranking services tailored to individual information retrieval systems. Experiment and analysis shows that our proposed method is more accurate on predicting user preferences than any non-personalized ranking methods when user preferences are stable over time. We compare our method with several non-personalized ranking methods including ranking SVMlight as well as several ranking functions specific to the academic document domain. The results show that our ranking algorithm can reach 63.59% accuracy in comparison to 50.02% for ranking SVMlight and below 43% for all other single feature ranking methods. We also show how the derived personalized ranking vectors can be employed for other ranking-related purposes such as recommendation systems.

Proceedings Article•DOI•
Mei Li1, Wang-Chien Lee•
17 Jun 2008
TL;DR: This paper defines the problem of identifying frequent items (IFI) and proposes an efficient in- network processing technique, called in-network filtering (netFilter), to address this important fundamental problem.
Abstract: As peer-to-peer (P2P) systems receive growing acceptance, the need of identifying 'frequent items' in such systems appears in a variety of applications. In this paper, we define the problem of identifying frequent items (IFI) and propose an efficient in-network processing technique, called in-network filtering (netFilter), to address this important fundamental problem. netFilter operates in two phases: 1) candidate filtering: data items are grouped into item groups to obtain aggregates for pruning of infrequent items; and 2) candidate verification: the aggregates for the remaining candidate items are obtained to filter out false frequent items. We address various issues faced in realizing netFilter, including aggregate computation, candidate set optimization, and candidate set materialization. In addition, we analyze the performance of netFilter, derive the optimal setting analytically, and discuss how to achieve the optimal setting in practice. Finally, we validate the effectiveness of netFilter through extensive simulation.

Proceedings Article•DOI•
17 Jun 2008
TL;DR: This paper presents the design of a contour mapping engine (CME) in wireless sensor networks that incorporates in-network processing based on binary classification to reduce the total number of active nodes and shows the superiority of CME over the state-of-the-art contours mapping techniques.
Abstract: Contour maps, showing topological distribution of extracted features, are crucial for many applications. Building a dynamic contour map in a wireless sensor network is a challenging task due to the constrained network resources. In this paper, we present the design of a contour mapping engine (CME) in wireless sensor networks. Our design incorporates in-network processing based on binary classification to reduce the total number of active nodes. The underlying network architecture is analyzed to derive an optimal configuration. We show, by extensive simulations, the superiority of CME over the state-of-the-art contour mapping techniques.

Proceedings Article•DOI•
26 Oct 2008
TL;DR: Efficient algorithms to determine valid scopes for various LDSQs including range, window and nearest neighbor queries along with LDSQ processing over a broadcast channel are devised, thus providing faster query response and saving client energy.
Abstract: Wireless data broadcast is an efficient and scalable means to provide information access for a large population of clients in mobile environments. With Location-Based Services (LBSs) deployed upon a broadcast channel, mobile clients can collect data from the channel to answer their location-dependent spatial queries (LDSQs). Since the results of LDSQs would become invalid when mobile client moves to new locations, the knowledge of valid scopes for LDSQ results is necessary to assist clients to determine if their previous LDSQ results can be reused after they moved. This effectively improves query response time and client energy consumption. In this paper, we devise efficient algorithms to determine valid scopes for various LDSQs including range, window and nearest neighbor queries along with LDSQ processing over a broadcast channel. We conduct an extensive set of experiments to evaluate the performance of our proposed algorithms. While the proposed valid scope algorithm incurs only little extra processing overhead, unnecessary LDSQ reevaluation is significantly eliminated, thus providing faster query response and saving client energy.

Proceedings Article•DOI•
25 Nov 2008
TL;DR: A 107-Gbps field trial carrying live HDTV traffic over a 504-km in-service DWDM route, proving commercial systems designed for 10G/40G can be upgraded to 100G without impacting existing active channels.
Abstract: A 107-Gbps field trial carrying live HDTV traffic over a 504-km in-service DWDM route, proving commercial systems designed for 10G/40G can be upgraded to 100G without impacting existing active channels.

Proceedings Article•DOI•
25 Mar 2008
TL;DR: This paper considers a scenario of near future where a mobile device has the ability to process queries using information simultaneously received from multiple channels, and proposes and develops an optimization technique, called approximate-NN (ANN), to reduce the energy consumption in mobile devices.
Abstract: Wireless broadcast is an efficient way for information dissemination due to its good scalability [10]. Existing works typically assume mobile devices, such as cell phones and PDAs, can access only one channel at a time. In this paper, we consider a scenario of near future where a mobile device has the ability to process queries using information simultaneously received from multiple channels. We focus on the query processing of the transitive nearest neighbor (TNN) search [19]. Two TNN algorithms developed for a single broadcast channel environment are adapted to our new broadcast enviroment. Based on the obtained insights, we propose two new algorithms, namely Double-NN-Search and Hybrid-NN-Search algorithms. Further, we develop an optimization technique, called approximate-NN (ANN), to reduce the energy consumption in mobile devices. Finally, we conduct a comprehensive set of experiments to validate our proposals. The result shows that our new algorithms provide a better performance than the existing ones and the optimization technique efficiently reduces energy consumption.

Journal Article•DOI•
TL;DR: Simulation results show that PSGR exhibits superior performance in terms of energy consumption, routing latency, and delivery rate, and soundly outperforms all of the compared protocols.
Abstract: Volunteer forwarding, as an emerging routing idea for large scale, location-aware wireless sensor networks, has recently received significant attention. However, several critical research issues raised by volunteer forwarding, including communication collisions, communication voids, and time-critical routing, have not been well addressed by the existing work. In this paper, we propose a priority-based stateless geo-routing (PSGR) protocol that addresses these issues. Based on PSGR, sensor nodes are able to locally determine their priority to serve as the next relay node using dynamically estimated network density. This effectively suppresses potential communication collisions without prolonging routing delays. PSGR also overcomes the communication void problem using two alternative stateless schemes, rebroadcast and bypass. Meanwhile, PSGR supports routing of time-critical packets with different deadline requirements at no extra communication cost. Additionally, we analyze the energy consumption and the delivery rate of PSGR as functions of the transmission range. Finally, an extensive performance evaluation has been conducted to compare PSGR with competing protocols, including GeRaf, IGF, GPSR, flooding, and MSPEED. Simulation results show that PSGR exhibits superior performance in terms of energy consumption, routing latency, and delivery rate, and soundly outperforms all of the compared protocols.

Journal Article•DOI•
TL;DR: This study presents an extensive analysis into the workload of scientific literature digital libraries, unveiling their temporal and user interest patterns and investigates how to utilize the findings to improve system performance.
Abstract: Workload studies of large-scale systems may help locating possible bottlenecks and improving performances. However, previous workload analysis for Web applications is typically focused on generic platforms, neglecting the unique characteristics exhibited in various domains of these applications. It is observed that different application domains have intrinsically heterogeneous characteristics, which have a direct impact on the system performance. In this study, we present an extensive analysis into the workload of scientific literature digital libraries, unveiling their temporal and user interest patterns. Logs of a computer science literature digital library, CiteSeer, are collected and analyzed. We intentionally remove service details specific to CiteSeer. We believe our analysis is applicable to other systems with similar characteristics. While many of our findings are consistent with previous Web analysis, we discover several unique characteristics of scientific literature digital library workload. Furthermore, we discuss how to utilize our findings to improve system performance.

Journal Article•DOI•
TL;DR: This study investigates K nearest neighbors query (KNN) on high dimensional data objects in P2P systems and proposes efficient query algorithm and solutions that address various technical challenges raised by high dimensionality, such as search space resolution and incremental search space refinement.
Abstract: Peer-to-peer systems have been widely used for sharing and exchanging data and resources among numerous computer nodes. Various data objects identifiable with high dimensional feature vectors, such as text, images, genome sequences, are starting to leverage P2P technology. Most of the existing works have been focusing on queries on data objects with one or few attributes and thus are not applicable on high dimensional data objects. In this study, we investigate K nearest neighbors query (KNN) on high dimensional data objects in P2P systems. Efficient query algorithm and solutions that address various technical challenges raised by high dimensionality, such as search space resolution and incremental search space refinement, are proposed. An extensive simulation using both synthetic and real data sets demonstrates that our proposal efficiently supports KNN query on high dimensional data in P2P systems.

Proceedings Article•DOI•
Yang Sun1, Huajing Li1, Isaac G. Councill1, Wang-Chien Lee1, C. Lee Giles1 •
26 Oct 2008
TL;DR: A study that measures the changes of user preferences based on an analysis of access logs of a large scale digital library over one year shows that the majority of user actions should be predictable from previous browsing behavior in the digital library.
Abstract: Much research has been conducted using web access logs to study implicit user feedback and infer user preferences from clickstreams. However, little research measures the changes of user preferences of ranking documents over time. We present a study that measures the changes of user preferences based on an analysis of access logs of a large scale digital library over one year. A metric based on the accuracy of predicting future user actions is proposed. The results show that although user preferences change over time, the majority of user actions should be predictable from previous browsing behavior in the digital library.

Book Chapter•DOI•
09 Jul 2008
TL;DR: This paper introduces correlation query that finds correlated pairs of objects often appearing closely to each other in a given sequence and proposes One-Scan Algorithm (OSA) and Index-Based Al algorithm (IBA), which is significantly outperforming the others and is the most efficient.
Abstract: Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation querythat finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA)and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.

Proceedings Article•DOI•
08 Dec 2008
TL;DR: Experimental results show that ACM is able to discover community structures with high quality while outperforming the existing approaches and employs an asynchronous strategy such that local clustering is executed without requiring an expensive global clustering to be performed in a synchronous fashion.
Abstract: Most social networks exhibit community structures, in which nodes are tightly connected to each other within a community but only loosely connected to nodes in other communities. Researches on community mining have received a lot of attention; however, most of them are based on a centralized system model and thus not applicable to the distributed model of P2P networks. In this paper, we propose a distributed community mining algorithm, namely Asynchronous Clustering and Merging scheme (ACM), for computing environments. Due to the dynamic and distributed nature of P2P networks, The ACM scheme employs an asynchronous strategy such that local clustering is executed without requiring an expensive global clustering to be performed in a synchronous fashion. Experimental results show that ACM is able to discover community structures with high quality while outperforming the existing approaches.

Journal Issue•DOI•
01 Jan 2008
TL;DR: This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching, and data access techniques aiming at various application requirements are covered.
Abstract: The rapid advance of wireless and portable computing technology has brought a lot of research interests and momentum to the area of mobile computing. One of the research focus is on pervasive data access. With wireless connections, users can access information at any place at any time. However, various constraints such as limited client capability, limited bandwidth, weak connectivity, and client mobility impose many challenging technical issues. In the past years, tremendous research efforts have been put forth to address the issues related to pervasive data access. A number of interesting research results were reported in the literature. This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching. In addition, data access techniques aiming at various application requirements (such as time, location, semantics and reliability) are covered. Copyright © 2006 John Wiley & Sons, Ltd.

01 Jan 2008
TL;DR: A tradeoff between search performance and freshness is indicated: the search cost decreases sublinearly with decreasing freshness of P2P content sharing under TTL-based consistency.
Abstract: Consistency maintenance is important to the sharing of dynamic contents in peer-to-peer (P2P) networks. The TTL-based mechanism is a natural choice for maintaining freshness in P2P content sharing. This paper investigates TTL-based consistency maintenance in unstructured P2P networks. In this approach, each replica is assigned an expiration time beyond which the replica stops serving new requests unless it is validated. While TTL-based consistency is widely explored in many client-server applications, there has been no study on TTL-based consistency in P2P networks. Our main contribution is an analytical model that studies the search performance and the freshness of P2P content sharing under TTL-based consistency. Due to the random nature of request routing, P2P networks are fundamentally different from most existing TTL-based systems in that every node with a valid replica has the potential to serve any other node. We identify and discuss the factors that affect the performance of P2P content sharing under TTL-based consistency. Our results indicate a tradeoff between search performance and freshness: the search cost decreases sublinearly with decreasing freshness of P2P content sharing. We also compare two types of unstructured P2P networks and find that clustered P2P networks improve the freshness of content sharing over flat P2P networks under TTL-based consistency. Index Terms—Unstructured P2P network, TTL-based consistency, replication, consistency maintenance, content distribution.

Proceedings Article•DOI•
17 Jun 2008
TL;DR: This study investigates the problem of monitoring changes on the data distribution in the networks (MCDN), and proposes a technique, called wavenet, by compressing the local item set in each host node into a compact yet accurate summary, called local wavelet, for communication with the coordinator.
Abstract: A massive amount of data is available in distributed fashion on various networks, including Internet, peer-to-peer networks, and wireless sensor networks. Users are often interested in monitoring interesting patterns or abnormal events hidden in these data. Transferring all the raw data from each host node to a central coordinator for processing is costly and unnecessary. In this study, we investigate the problem of monitoring changes on the data distribution in the networks (MCDN). To address this problem, we propose a technique, called wavenet, by compressing the local item set in each host node into a compact yet accurate summary, called local wavelet, for communication with the coordinator. We also propose adaptive monitoring to address the issues of local wavelet propagation in wavenet. An extensive performance evaluation has been conducted to validate our proposal and demonstrates the efficiency of wavenet.

Proceedings Article•DOI•
26 Oct 2008
TL;DR: ROAD, a system framework for processing location dependent spatial queries (LDSQs) that search for spatial objects of interest on road networks is developed and explained how it can support efficient location-dependent nearest neighbor search.
Abstract: In this research, we develop ROAD, a system framework for processing location dependent spatial queries (LDSQs) that search for spatial objects of interest on road networks By exploiting search space pruning, ROAD is very efficient and flexible for various LDSQs on different types of objects over large-scale networks In ROAD, a large road network is organized as a set of interconnected regional sub-networks (called Rnets) augmented with 1) shortcuts for accelerating search traversals; and 2) object abstracts for guiding object search In this poster, we outline this framework and explain how it can support efficient location-dependent nearest neighbor search

Proceedings Article•DOI•
26 Oct 2008
TL;DR: A performance study shows the problem of answering probabilistic range queries on moving objects based on an uncertainty model, which captures the possible movements of objects with probabilities, significantly reduces the number of object examinations and the overall cost of the query evaluation.
Abstract: Range queries for querying the current and future positions of the moving objects have received growing interests in the research community. Existing methods, however, assume that an object only moves along an anticipated path. In this paper, we study the problem of answering probabilistic range queries on moving objects based on an uncertainty model, which captures the possible movements of objects with probabilities. We conduct a performance study, which shows our proposal significantly reduces the number of object examinations and the overall cost of the query evaluation.

Journal Article•DOI•
TL;DR: Of particular interests are the issues that arise in the design of storage management and indexing structures combining sensor system workload and read/write/erase characteristics of flash memory.
Abstract: Wireless sensor networks are used in a large array of applications to capture, collect, and analyze physical environmental data. Many existing sensor systems instruct sensor nodes to report their measurements to central repositories outside the network, which is expensive in energy cost. Recent technological advances in flash memory have given rise to the development of storagecentric sensor networks, where sensor nodes are equipped with high-capacity flash memory storage such that sensor data can be stored and managed inside the network to reduce expensive communication. This novel architecture calls for new data management techniques to fully exploit distributed in-network data storage. This paper describes some of our research on distributed query processing in such flash-based sensor networks. Of particular interests are the issues that arise in the design of storage management and indexing structures combining sensor system workload and read/write/erase characteristics of flash memory.