Showing papers by "Wang-Chien Lee published in 2008"

PDF

Open Access

Proceedings Article•DOI•

[...]

Yang Song¹, Ziming Zhuang², Huajing Li¹, Qiankun Zhao, Jia Li¹, Wang-Chien Lee¹, C. Lee Giles¹ - Show less +3 more•Institutions (2)

Pennsylvania State University¹, Yahoo!²

20 Jul 2008

TL;DR: Experiments on large-scale tagging datasets of scientific documents and web pages del.icio.us indicate that the proposed framework for real-time tag recommendation is capable of making tag recommendation efficiently and effectively.

...read moreread less

Abstract: Tags are user-generated labels for entities. Existing research on tag recommendation either focuses on improving its accuracy or on automating the process, while ignoring the efficiency issue. We propose a highly-automated novel framework for real-time tag recommendation. The tagged training documents are treated as triplets of (words, docs, tags), and represented in two bipartite graphs, which are partitioned into clusters by Spectral Recursive Embedding (SRE). Tags in each topical cluster are ranked by our novel ranking algorithm. A two-way Poisson Mixture Model (PMM) is proposed to model the document distribution into mixture components within each cluster and aggregate words into word clusters simultaneously. A new document is classified by the mixture model based on its posterior probabilities so that tags are recommended according to their ranks. Experiments on large-scale tagging datasets of scientific documents (CiteULike) and web pages del.icio.us) indicate that our framework is capable of making tag recommendation efficiently and effectively. The average tagging time for testing a document is around 1 second, with over 88% test documents correctly labeled with the top nine tags we suggested.

...read moreread less

271 citations

Journal Article•DOI•

100-Gb/s DQPSK Transmission: From Laboratory Experiments to Field Trials

[...]

Peter J. Winzer¹, Gregory Raybon¹, Haoyu Song¹, Andrew Adamiecki¹, S. Corteselli¹, Alan H. Gnauck¹, D.A. Fishman¹, Christopher Richard Doerr¹, Sethumadhavan Chandrasekhar¹, Lawrence L. Buhl¹, T. J. Xia², Glenn A. Wellbrock², Wang-Chien Lee², B. Basch¹, Tetsuya Kawanishi, K. Higuma, Y. Painchaud - Show less +13 more•Institutions (2)

Bell Labs¹, Verizon Communications²

15 Oct 2008-Journal of Lightwave Technology

TL;DR: In this article, the authors discuss the generation, detection, and long-haul transmission of single-polarization differential quadrature phase shift keying (DQPSK) signals at a line rate of 53.5 Gbaud to support a net information bit rate of 100 Gb/s.

...read moreread less

Abstract: We discuss the generation, detection, and long-haul transmission of single-polarization differential quadrature phase shift keying (DQPSK) signals at a line rate of 53.5 Gbaud to support a net information bit rate of 100 Gb/s. In the laboratory, we demonstrate 10-channel wavelength-division multiplexed (WDM) point-to-point transmission over 2000 km on a 150-GHz WDM grid, and 1200-km optically routed networking including 6 reconfigurable optical add/drop multiplexers (ROADMs) on a 100-GHz grid. We then report transmission over the commercial, 50-GHz spaced long-haul optical transport platform LambdaXtremereg. In a straight-line laboratory testbed, we demonstrate single-channel 700-km transmission, including an intermediate ROADM. On a field-deployed, live traffic bearing Verizon installation between Tampa and Miami, Florida, we achieve 500-km transmission, with no changes to the commercial system hardware or software and with 6 dB system margin. On the same operational system, we finally demonstrate 100-Gb/s DQPSK encoding on a field-programmable gate array (FPGA) and the transmission of real-time video traffic.

...read moreread less

130 citations

Proceedings Article•DOI•

Scalable community discovery on textual data with relations

[...]

Huajing Li¹, Zaiqing Nie², Wang-Chien Lee¹, C. Lee Giles¹, Ji-Rong Wen² - Show less +1 more•Institutions (2)

Pennsylvania State University¹, Microsoft²

26 Oct 2008

TL;DR: A hierarchical community model is proposed in the paper which distinguishes community cores from affiliated members and has high scalability to corpus size and feature dimensionality, with more than 15 topical precision improvement compared with popular clustering techniques.

...read moreread less

Abstract: Every piece of textual data is generated as a method to convey its authors' opinion regarding specific topics. Authors deliberately organize their writings and create links, i.e., references, acknowledgments, for better expression. Thereafter, it is of interest to study texts as well as their relations to understand the underlying topics and communities. Although many efforts exist in the literature in data clustering and topic mining, they are not applicable to community discovery on large document corpus for several reasons. First, few of them consider both textual attributes as well as relations. Second, scalability remains a significant issue for large-scale datasets. Additionally, most algorithms rely on a set of initial parameters that are hard to be captured and tuned. Motivated by the aforementioned observations, a hierarchical community model is proposed in the paper which distinguishes community cores from affiliated members. We present our efforts to develop a scalable community discovery solution for large-scale document corpus. Our proposal tries to quickly identify potential cores as seeds of communities through relation analysis. To eliminate the influence of initial parameters, an innovative attribute-based core merge process is introduced so that the algorithm promises to return consistent communities regardless initial parameters. Experimental results suggest that the proposed method has high scalability to corpus size and feature dimensionality, with more than 15 topical precision improvement compared with popular clustering techniques.

...read moreread less

78 citations

Proceedings Article•DOI•

Multi-rate (111-Gb/s, 2×43-Gb/s, and 8×10.7-Gb/s) transmission at 50-GHz channel spacing over 1040-km field-deployed fiber

[...]

T. J. Xia¹, Glenn A. Wellbrock¹, Daniel L. Peterson¹, Wang-Chien Lee¹, M. Pollock¹, B. Basch¹, David Z. Chen¹, Michael Freiberger¹, M. S. Alfiad², H. de Waardt², M. Kuschnerov, Berthold Lankl, T. Wuth³, E.-D. Schmidt⁴, Bernhard Spinnler⁴, C.-J. Weiske⁴, E. de Man⁴, C. Xie⁴, D. van den Borne⁴, M. Finkenzeller⁴, Stefan Spaelter⁴, R. H. Derksen⁴, M. Rehman⁴, J. Behel⁴, J. Stachowiak⁴, M. Chbat⁴ - Show less +22 more•Institutions (4)

Verizon Communications¹, Eindhoven University of Technology², Siemens³, Nokia Networks⁴

22 Dec 2008

TL;DR: In this article, the feasibility of 100G overlaying existing 10G/40G commercial systems is demonstrated, showing that 100G can be achieved over a 50GHz grid over 1,040 km field fiber and two ROADMs.

...read moreread less

Abstract: 111-Gb/s transmission combined with 2 times 43-Gb/s and 8 times 10.7-Gb/s on a 50-GHz grid over 1,040-km field fiber and two ROADMs is demonstrated, showing the feasibility of 100G overlaying existing 10G/40G commercial systems.

...read moreread less

50 citations

Proceedings Article•DOI•

Location-Dependent Skyline Query

[...]

Baihua Zheng, Ken C. K. Lee¹, Wang-Chien Lee¹•Institutions (1)

Pennsylvania State University¹

27 Apr 2008

TL;DR: This paper focuses on the query processing and result validation of LDSQ over static objects and proposes two algorithms, namely brute-forth and delta-scanning, which significantly improves the performance via space pruning.

...read moreread less

Abstract: Given a set of data points with both spatial coordinates and non-spatial attributes, point a location-dependently dominates point b with respect to a query point q if a is closer to q than b and meanwhile a dominates b. A location- dependent skyline query (LDSQ) issued at point q is to retrieve all the points that are not location-dependently dominated by other points with regard to q. In this paper, we focus on the query processing and result validation of LDSQ over static objects. Two algorithms, namely brute-forth and delta-scanning, are proposed. The former serves as the baseline algorithm while the latter significantly improves the performance via space pruning. We further conduct a comprehensive simulation to demonstrate the performance of proposed algorithms.

...read moreread less

49 citations

Journal Article•DOI•

A New Storage Scheme for Approximate Location Queries in Object-Tracking Sensor Networks

[...]

Jianliang Xu¹, Xueyan Tang, Wang-Chien Lee•Institutions (1)

University of Hong Kong¹

01 Feb 2008-IEEE Transactions on Parallel and Distributed Systems

TL;DR: An energy-conserving approximate storage (EASE) scheme to efficiently answer approximate location queries by keeping error-bounded imprecise location data at some designated storage node based on the mobility pattern.

...read moreread less

Abstract: Energy efficiency is one of the most critical issues in the design of wireless sensor networks. Observing that many sensor applications for object tracking can tolerate a certain degree of imprecision in the location data of tracked objects, this paper studies precision-constrained approximate queries that trade answer precision for energy efficiency. We develop an energy-conserving approximate storage (EASE) scheme to efficiently answer approximate location queries by keeping error-bounded imprecise location data at some designated storage node. The data impreciseness is captured by a system parameter called the approximation radius. We derive the optimal setting of the approximation radius for our storage scheme based on the mobility pattern and devise an adaptive algorithm to adjust the setting when the mobility pattern is not available a priori or is dynamically changing. Simulation experiments are conducted to validate our theoretical analysis of the optimal approximation setting. The simulation results show that the proposed EASE scheme reduces the network traffic from a conventional approach by up to 96 percent and, in most cases, prolongs the network lifetime by a factor of 2-5.

...read moreread less

49 citations

Journal Article•DOI•

SSW: A Small-World-Based Overlay for Peer-to-Peer Search

[...]

Mei Li¹, Wang-Chien Lee², Anand Sivasubramaniam, Jing Zhao³•Institutions (3)

Microsoft¹, Pennsylvania State University², Hong Kong University of Science and Technology³

01 Jun 2008-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Extensive experiments demonstrate that SSW is superior to the state of the art on various aspects, including scalability, maintenance overhead, adaptivity to distribution of data and locality of interest, resilience to peer failures, load balancing, and efficiency in support of various types of queries on data objects with high dimensions.

...read moreread less

Abstract: Peer-to-peer (P2P) systems have become a popular platform for sharing and exchanging voluminous information among thousands or even millions of users. The massive amount of information shared in such systems mandates efficient semantic-based search instead of key-based search. The majority of existing proposals can only support simple key-based search rather than semantic-based search. This paper presents the design of an overlay network, namely, semantic small world (SSW), that facilitates efficient semantic-based search in P2P systems. SSW achieves the efficiency based on four ideas: 1) semantic clustering, where peers with similar semantics organize into peer clusters, 2) dimension reduction, where to address the high maintenance overhead associated with capturing high-dimensional data semantics in the overlay, peer clusters are adaptively mapped to a one-dimensional naming space, 3) small world network, where peer clusters form into a one-dimensional small world network, which is search efficient with low maintenance overhead, and 4) efficient search algorithms, where peers perform efficient semantic-based search, including approximate point query and range query in the proposed overlay. Extensive experiments using both synthetic data and real data demonstrate that SSW is superior to the state of the art on various aspects, including scalability, maintenance overhead, adaptivity to distribution of data and locality of interest, resilience to peer failures, load balancing, and efficiency in support of various types of queries on data objects with high dimensions.

...read moreread less

48 citations

Journal Article•DOI•

Analysis of TTL-Based Consistency in Unstructured Peer-to-Peer Networks

[...]

Xueyan Tang¹, Jianliang Xu², Wang-Chien Lee³•Institutions (3)

Nanyang Technological University¹, Hong Kong Baptist University², Pennsylvania State University³

01 Dec 2008-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A tradeoff between search performance and freshness is indicated: the search cost decreases sublinearly with decreasing freshness of P2P content sharing under TTL-based consistency.

...read moreread less

Abstract: Consistency maintenance is important to the sharing of dynamic contents in peer-to-peer (P2P) networks. The TTL-based mechanism is a natural choice for maintaining freshness in P2P content sharing. This paper investigates TTL-based consistency maintenance in unstructured P2P networks. In this approach, each replica is assigned an expiration time beyond which the replica stops serving new requests unless it is validated. While TTL-based consistency is widely explored in many client-server applications, there has been no study on TTL-based consistency in P2P networks. Our main contribution is an analytical model that studies the search performance and the freshness of P2P content sharing under TTL-based consistency. Due to the random nature of request routing, P2P networks are fundamentally different from most existing TTL-based systems in that every node with a valid replica has the potential to serve any other node. We identify and discuss the factors that affect the performance of P2P content sharing under TTL-based consistency. Our results indicate a tradeoff between search performance and freshness: the search cost decreases sublinearly with decreasing freshness of P2P content sharing. We also compare two types of unstructured P2P networks and find that clustered P2P networks improve the freshness of content sharing over flat P2P networks under TTL-based consistency.

...read moreread less

40 citations

Journal Article•DOI•

Ranked Reverse Nearest Neighbor Search

[...]

Ken C. K. Lee¹, Baihua Zheng², Wang-Chien Lee¹•Institutions (2)

Pennsylvania State University¹, Singapore Management University²

01 Jul 2008-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work introduces a new variant of RNN query, namely, ranked reverse nearest neighbor (RRNN) query, that retrieves t data points most influenced by q, i.e., theT data points having the smallest kappa's with respect to q are retrieved.

...read moreread less

Abstract: Given a set of data points P and a query point q in a multidimensional space, reverse nearest neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-nearest neighbor (RkNN) query (where k ges 1) generalizes RNN query to find data points whose kNNs include q. For RkNN query semantics, q is said to have influence to all those answer data points. The degree of q's influence on a data point p (isin P) is denoted by kappap where q is the kappap-th NN of p. We introduce a new variant of RNN query, namely, ranked reverse nearest neighbor (RRNN) query, that retrieves t data points most influenced by q, i.e., the t data points having the smallest kappa's with respect to q. To answer this RRNN query efficiently, we propose two novel algorithms, kappa-counting and kappa-browsing that are applicable to both monochromatic and bichromatic scenarios and are able to deliver results progressively. Through an extensive performance evaluation, we validate that the two proposed RRNN algorithms are superior to solutions derived from algorithms designed for RkNN query.

...read moreread less

36 citations

Proceedings Article•DOI•

Transmission of 107-Gb/s DQPSK over Verizon 504-km Commercial LambdaXtreme® Transport System

[...]

T. J. Xia¹, Glenn A. Wellbrock¹, Wang-Chien Lee¹, G. Lyons¹, P. Hofmann¹, T. Fisk¹, B. Basch¹, W. Kluge¹, J. Gatewood¹, Peter J. Winzer², G. Raybon², T. Kissel², T. Carenza², Alan H. Gnauck², Andrew Adamiecki², Daniel A. Fishman², N M Denkin², Christopher Richard Doerr², M. Duelk², Tetsuya Kawanishi, K. Higuma, Y. Painchaud, C Paquet - Show less +19 more•Institutions (2)

Verizon Communications¹, Alcatel-Lucent²

24 Feb 2008

TL;DR: In this article, a 107Gb/s field trial was conducted on a traffic carrying longhaul LambdaXtreme® transport platform over an active 504-km Verizon route in Florida thus proving upgradeability to 100 G of the Alcatel-Lucent 50GHz spaced ULH DWDM system.

...read moreread less

Abstract: A 107-Gb/s field trial was conducted on a traffic carrying long-haul LambdaXtreme® transport platform over an active 504-km Verizon route in Florida thus proving upgradeability to 100 G of the Alcatel-Lucent 50-GHz spaced ULH DWDM system.

...read moreread less

35 citations

Proceedings Article•DOI•

Rule-Based WiFi Localization Methods

[...]

Qiuxia Chen¹, Dik Lun Lee¹, Wang-Chien Lee²•Institutions (2)

Hong Kong University of Science and Technology¹, Pennsylvania State University²

17 Dec 2008

TL;DR: The rule-based localization methods proposed in this paper achieve much higher accuracy than the state-of-the-art localization methods, namely, RADAR, LOCADIO and WHAM!.

...read moreread less

Abstract: The rule-based localization methods proposed in this paper are based on two important observations. First, although the absolute RSS values change with time, the relative RSS (RRSS) values between several Access Points (APs) are more stable than the absolute RSSs. Thus, we can use RRSSs as rules for inferring a client's location. Second, when a unique location cannot be obtained based on RRSS rules, the localization process can backtrack to the previous observed client location. By analyzing the accessible paths on the floor plan, locations that are not reacheable from the previous location can be disqualified. Based on these two key observations, we propose several localization methods, implement them in a life environment and conduct extensive experiments to measure the localization accuracy of the proposed methods. We found that our methods achieve much higher accuracy than the state-of-the-art localization methods, namely, RADAR, LOCADIO and WHAM!.

...read moreread less

Journal Article•DOI•

Pervasive data access in wireless and mobile computing environments

[...]

Ken C. K. Lee¹, Wang-Chien Lee¹, Sanjay Kumar Madria²•Institutions (2)

Pennsylvania State University¹, Missouri University of Science and Technology²

01 Jan 2008-Wireless Communications and Mobile Computing

TL;DR: This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching, and data access techniques aiming at various application requirements are covered.

...read moreread less

Abstract: The rapid advance of wireless and portable computing technology has brought a lot of research interests and momentum to the area of mobile computing. One of the research focus is on pervasive data access. With wireless connections, users can access information at any place at any time. However, various constraints such as limited client capability, limited bandwidth, weak connectivity, and client mobility impose many challenging technical issues. In the past years, tremendous research efforts have been put forth to address the issues related to pervasive data access. A number of interesting research results were reported in the literature. This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching. In addition, data access techniques aiming at various application requirements (such as time, location, semantics and reliability) are covered. Copyright © 2006 John Wiley & Sons, Ltd.

...read moreread less

Proceedings Article•DOI•

Personalized ranking for digital libraries based on log analysis

[...]

Yang Sun¹, Huajing Li¹, Isaac G. Councill¹, Jian Huang¹, Wang-Chien Lee¹, C. Lee Giles¹ - Show less +2 more•Institutions (1)

Pennsylvania State University¹

30 Oct 2008

TL;DR: The implicit user feedback from access logs in the CiteSeer academic search engine is analyzed and it is shown how site structure can better inform the analysis of clickthrough feedback providing accurate personalized ranking services tailored to individual information retrieval systems.

...read moreread less

Abstract: Given the exponential increase of indexable context on the Web, ranking is an increasingly difficult problem in information retrieval systems. Recent research shows that implicit feedback regarding user preferences can be extracted from web access logs in order to increase ranking performance. We analyze the implicit user feedback from access logs in the CiteSeer academic search engine and show how site structure can better inform the analysis of clickthrough feedback providing accurate personalized ranking services tailored to individual information retrieval systems. Experiment and analysis shows that our proposed method is more accurate on predicting user preferences than any non-personalized ranking methods when user preferences are stable over time. We compare our method with several non-personalized ranking methods including ranking SVMlight as well as several ranking functions specific to the academic document domain. The results show that our ranking algorithm can reach 63.59% accuracy in comparison to 50.02% for ranking SVMlight and below 43% for all other single feature ranking methods. We also show how the derived personalized ranking vectors can be employed for other ranking-related purposes such as recommendation systems.

...read moreread less

Proceedings Article•DOI•

Identifying Frequent Items in P2P Systems

[...]

Mei Li¹, Wang-Chien Lee•Institutions (1)

Microsoft¹

17 Jun 2008

TL;DR: This paper defines the problem of identifying frequent items (IFI) and proposes an efficient in- network processing technique, called in-network filtering (netFilter), to address this important fundamental problem.

...read moreread less

Abstract: As peer-to-peer (P2P) systems receive growing acceptance, the need of identifying 'frequent items' in such systems appears in a variety of applications. In this paper, we define the problem of identifying frequent items (IFI) and propose an efficient in-network processing technique, called in-network filtering (netFilter), to address this important fundamental problem. netFilter operates in two phases: 1) candidate filtering: data items are grouped into item groups to obtain aggregates for pruning of infrequent items; and 2) candidate verification: the aggregates for the remaining candidate items are obtained to filter out false frequent items. We address various issues faced in realizing netFilter, including aggregate computation, candidate set optimization, and candidate set materialization. In addition, we analyze the performance of netFilter, derive the optimal setting analytically, and discuss how to achieve the optimal setting in practice. Finally, we validate the effectiveness of netFilter through extensive simulation.

...read moreread less

Proceedings Article•DOI•

CME: A Contour Mapping Engine in Wireless Sensor Networks

[...]

Yingqi Xu¹, Wang-Chien Lee¹, Gail Mitchell²•Institutions (2)

Pennsylvania State University¹, BBN Technologies²

17 Jun 2008

TL;DR: This paper presents the design of a contour mapping engine (CME) in wireless sensor networks that incorporates in-network processing based on binary classification to reduce the total number of active nodes and shows the superiority of CME over the state-of-the-art contours mapping techniques.

...read moreread less

Abstract: Contour maps, showing topological distribution of extracted features, are crucial for many applications. Building a dynamic contour map in a wireless sensor network is a challenging task due to the constrained network resources. In this paper, we present the design of a contour mapping engine (CME) in wireless sensor networks. Our design incorporates in-network processing based on binary classification to reduce the total number of active nodes. The underlying network architecture is analyzed to derive an optimal configuration. We show, by extensive simulations, the superiority of CME over the state-of-the-art contour mapping techniques.

...read moreread less

Proceedings Article•DOI•

Valid scope computation for location-dependent spatial query in mobile broadcast environments

[...]

Ken C. K. Lee¹, Josh Schiffman¹, Baihua Zheng², Wang-Chien Lee¹•Institutions (2)

Pennsylvania State University¹, Singapore Management University²

26 Oct 2008

TL;DR: Efficient algorithms to determine valid scopes for various LDSQs including range, window and nearest neighbor queries along with LDSQ processing over a broadcast channel are devised, thus providing faster query response and saving client energy.

...read moreread less

Abstract: Wireless data broadcast is an efficient and scalable means to provide information access for a large population of clients in mobile environments. With Location-Based Services (LBSs) deployed upon a broadcast channel, mobile clients can collect data from the channel to answer their location-dependent spatial queries (LDSQs). Since the results of LDSQs would become invalid when mobile client moves to new locations, the knowledge of valid scopes for LDSQ results is necessary to assist clients to determine if their previous LDSQ results can be reused after they moved. This effectively improves query response time and client energy consumption. In this paper, we devise efficient algorithms to determine valid scopes for various LDSQs including range, window and nearest neighbor queries along with LDSQ processing over a broadcast channel. We conduct an extensive set of experiments to evaluate the performance of our proposed algorithms. While the proposed valid scope algorithm incurs only little extra processing overhead, unnecessary LDSQ reevaluation is significantly eliminated, thus providing faster query response and saving client energy.

...read moreread less

Proceedings Article•DOI•

Field trial of 107-Gb/s channel carrying live video traffic over 504 km in-service DWDM route

[...]

Glenn A. Wellbrock¹, T. J. Xia¹, Wang-Chien Lee¹, G. Lyons¹, P. Hofmann¹, T. Fisk¹, B. Basch¹, W. Kluge¹, J. Gatewood¹, Peter J. Winzer², G. Raybon², Haoyu Song², Andrew Adamiecki², S. Corteselli², Alan H. Gnauck², D.A. Fishman², T. Kawanishi, K. Higuma, Y. Painchaud - Show less +15 more•Institutions (2)

Verizon Communications¹, Bell Labs²

25 Nov 2008

TL;DR: A 107-Gbps field trial carrying live HDTV traffic over a 504-km in-service DWDM route, proving commercial systems designed for 10G/40G can be upgraded to 100G without impacting existing active channels.

...read moreread less

Abstract: A 107-Gbps field trial carrying live HDTV traffic over a 504-km in-service DWDM route, proving commercial systems designed for 10G/40G can be upgraded to 100G without impacting existing active channels.

...read moreread less

Proceedings Article•DOI•

Processing transitive nearest-neighbor queries in multi-channel access environments

[...]

Xiao Zhang¹, Wang-Chien Lee¹, Prasenjit Mitra¹, Baihua Zheng²•Institutions (2)

Pennsylvania State University¹, Singapore Management University²

25 Mar 2008

TL;DR: This paper considers a scenario of near future where a mobile device has the ability to process queries using information simultaneously received from multiple channels, and proposes and develops an optimization technique, called approximate-NN (ANN), to reduce the energy consumption in mobile devices.

...read moreread less

Abstract: Wireless broadcast is an efficient way for information dissemination due to its good scalability [10]. Existing works typically assume mobile devices, such as cell phones and PDAs, can access only one channel at a time. In this paper, we consider a scenario of near future where a mobile device has the ability to process queries using information simultaneously received from multiple channels. We focus on the query processing of the transitive nearest neighbor (TNN) search [19]. Two TNN algorithms developed for a single broadcast channel environment are adapted to our new broadcast enviroment. Based on the obtained insights, we propose two new algorithms, namely Double-NN-Search and Hybrid-NN-Search algorithms. Further, we develop an optimization technique, called approximate-NN (ANN), to reduce the energy consumption in mobile devices. Finally, we conduct a comprehensive set of experiments to validate our proposals. The result shows that our new algorithms provide a better performance than the existing ones and the optimization technique efficiently reduces energy consumption.

...read moreread less

Journal Article•DOI•

Energy-Aware and Time-Critical Geo-Routing in Wireless Sensor Networks

[...]

Yingqi Xu¹, Wang-Chien Lee¹, Jianliang Xu², Gail Mitchell³•Institutions (3)

Pennsylvania State University¹, Hong Kong Baptist University², BBN Technologies³

01 Oct 2008-International Journal of Distributed Sensor Networks

TL;DR: Simulation results show that PSGR exhibits superior performance in terms of energy consumption, routing latency, and delivery rate, and soundly outperforms all of the compared protocols.

...read moreread less

Abstract: Volunteer forwarding, as an emerging routing idea for large scale, location-aware wireless sensor networks, has recently received significant attention. However, several critical research issues raised by volunteer forwarding, including communication collisions, communication voids, and time-critical routing, have not been well addressed by the existing work. In this paper, we propose a priority-based stateless geo-routing (PSGR) protocol that addresses these issues. Based on PSGR, sensor nodes are able to locally determine their priority to serve as the next relay node using dynamically estimated network density. This effectively suppresses potential communication collisions without prolonging routing delays. PSGR also overcomes the communication void problem using two alternative stateless schemes, rebroadcast and bypass. Meanwhile, PSGR supports routing of time-critical packets with different deadline requirements at no extra communication cost. Additionally, we analyze the energy consumption and the delivery rate of PSGR as functions of the transmission range. Finally, an extensive performance evaluation has been conducted to compare PSGR with competing protocols, including GeRaf, IGF, GPSR, flooding, and MSPEED. Simulation results show that PSGR exhibits superior performance in terms of energy consumption, routing latency, and delivery rate, and soundly outperforms all of the compared protocols.

...read moreread less

Journal Article•DOI•

Workload analysis for scientific literature digital libraries

[...]

Huajing Li¹, Wang-Chien Lee¹, Anand Sivasubramaniam¹, C. Lee Giles²•Institutions (2)

Pennsylvania State University¹, Penn State College of Information Sciences and Technology²

10 Nov 2008-International Journal on Digital Libraries

TL;DR: This study presents an extensive analysis into the workload of scientific literature digital libraries, unveiling their temporal and user interest patterns and investigates how to utilize the findings to improve system performance.

...read moreread less

Abstract: Workload studies of large-scale systems may help locating possible bottlenecks and improving performances. However, previous workload analysis for Web applications is typically focused on generic platforms, neglecting the unique characteristics exhibited in various domains of these applications. It is observed that different application domains have intrinsically heterogeneous characteristics, which have a direct impact on the system performance. In this study, we present an extensive analysis into the workload of scientific literature digital libraries, unveiling their temporal and user interest patterns. Logs of a computer science literature digital library, CiteSeer, are collected and analyzed. We intentionally remove service details specific to CiteSeer. We believe our analysis is applicable to other systems with similar characteristics. While many of our findings are consistent with previous Web analysis, we discover several unique characteristics of scientific literature digital library workload. Furthermore, we discuss how to utilize our findings to improve system performance.

...read moreread less

Journal Article•DOI•

Supporting K nearest neighbors query on high-dimensional data in P2P systems

[...]

Mei Li¹, Wang-Chien Lee¹, Anand Sivasubramaniam¹, Jizhong Zhao²•Institutions (2)

Pennsylvania State University¹, Xi'an Jiaotong University²

12 Aug 2008-Frontiers of Computer Science

TL;DR: This study investigates K nearest neighbors query (KNN) on high dimensional data objects in P2P systems and proposes efficient query algorithm and solutions that address various technical challenges raised by high dimensionality, such as search space resolution and incremental search space refinement.

...read moreread less

Abstract: Peer-to-peer systems have been widely used for sharing and exchanging data and resources among numerous computer nodes. Various data objects identifiable with high dimensional feature vectors, such as text, images, genome sequences, are starting to leverage P2P technology. Most of the existing works have been focusing on queries on data objects with one or few attributes and thus are not applicable on high dimensional data objects. In this study, we investigate K nearest neighbors query (KNN) on high dimensional data objects in P2P systems. Efficient query algorithm and solutions that address various technical challenges raised by high dimensionality, such as search space resolution and incremental search space refinement, are proposed. An extensive simulation using both synthetic and real data sets demonstrates that our proposal efficiently supports KNN query on high dimensional data in P2P systems.

...read moreread less

Proceedings Article•DOI•

Measuring user preference changes in digital libraries

[...]

Yang Sun¹, Huajing Li¹, Isaac G. Councill¹, Wang-Chien Lee¹, C. Lee Giles¹ - Show less +1 more•Institutions (1)

Pennsylvania State University¹

26 Oct 2008

TL;DR: A study that measures the changes of user preferences based on an analysis of access logs of a large scale digital library over one year shows that the majority of user actions should be predictable from previous browsing behavior in the digital library.

...read moreread less

Abstract: Much research has been conducted using web access logs to study implicit user feedback and infer user preferences from clickstreams. However, little research measures the changes of user preferences of ranking documents over time. We present a study that measures the changes of user preferences based on an analysis of access logs of a large scale digital library over one year. A metric based on the accuracy of predicting future user actions is proposed. The results show that although user preferences change over time, the majority of user actions should be predictable from previous browsing behavior in the digital library.

...read moreread less

Book Chapter•DOI•

[...]

Ken C. Lee¹, Wang-Chien Lee¹, Donna J. Peuquet¹, Baihua Zheng²•Institutions (2)

Pennsylvania State University¹, Singapore Management University²

09 Jul 2008

TL;DR: This paper introduces correlation query that finds correlated pairs of objects often appearing closely to each other in a given sequence and proposes One-Scan Algorithm (OSA) and Index-Based Al algorithm (IBA), which is significantly outperforming the others and is the most efficient.

...read moreread less

Abstract: Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation querythat finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA)and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.

...read moreread less

Proceedings Article•DOI•

Mining Community Structures in Peer-to-Peer Environments

[...]

Ching-Hua Yu¹, Wen-Chih Peng¹, Wang-Chien Lee²•Institutions (2)

National Chiao Tung University¹, Pennsylvania State University²

08 Dec 2008

TL;DR: Experimental results show that ACM is able to discover community structures with high quality while outperforming the existing approaches and employs an asynchronous strategy such that local clustering is executed without requiring an expensive global clustering to be performed in a synchronous fashion.

...read moreread less

Abstract: Most social networks exhibit community structures, in which nodes are tightly connected to each other within a community but only loosely connected to nodes in other communities. Researches on community mining have received a lot of attention; however, most of them are based on a centralized system model and thus not applicable to the distributed model of P2P networks. In this paper, we propose a distributed community mining algorithm, namely Asynchronous Clustering and Merging scheme (ACM), for computing environments. Due to the dynamic and distributed nature of P2P networks, The ACM scheme employs an asynchronous strategy such that local clustering is executed without requiring an expensive global clustering to be performed in a synchronous fashion. Experimental results show that ACM is able to discover community structures with high quality while outperforming the existing approaches.

...read moreread less

Journal Issue•DOI•

Pervasive data access in wireless and mobile computing environments: Research Articles

[...]

Ken C. K. Lee¹, Wang-Chien Lee¹, Sanjay Kumar Madria²•Institutions (2)

Pennsylvania State University¹, Missouri University of Science and Technology²

01 Jan 2008

...read moreread less

Analysis of TTL-Based Consistency in

[...]

Unstructured Peer-to-Peer Networks, Xueyan Tang, Jianliang Xu, Wang-Chien Lee

01 Jan 2008

TL;DR: A tradeoff between search performance and freshness is indicated: the search cost decreases sublinearly with decreasing freshness of P2P content sharing under TTL-based consistency.

...read moreread less

Proceedings Article•DOI•

Wavenet: A Wavelet-Based Approach to Monitor Changes on Data Distribution in Networks

[...]

Mei Li¹, Ping Xia², Wang-Chien Lee•Institutions (2)

Microsoft¹, University of Pittsburgh²

17 Jun 2008

TL;DR: This study investigates the problem of monitoring changes on the data distribution in the networks (MCDN), and proposes a technique, called wavenet, by compressing the local item set in each host node into a compact yet accurate summary, called local wavelet, for communication with the coordinator.

...read moreread less

Abstract: A massive amount of data is available in distributed fashion on various networks, including Internet, peer-to-peer networks, and wireless sensor networks. Users are often interested in monitoring interesting patterns or abnormal events hidden in these data. Transferring all the raw data from each host node to a central coordinator for processing is costly and unnecessary. In this study, we investigate the problem of monitoring changes on the data distribution in the networks (MCDN). To address this problem, we propose a technique, called wavenet, by compressing the local item set in each host node into a compact yet accurate summary, called local wavelet, for communication with the coordinator. We also propose adaptive monitoring to address the issues of local wavelet propagation in wavenet. An extensive performance evaluation has been conducted to validate our proposal and demonstrates the efficiency of wavenet.

...read moreread less

Proceedings Article•DOI•

ROAD: an efficient framework for location dependentspatial queries on road networks

[...]

Ken C. K. Lee¹, Wang-Chien Lee¹, Baihua Zheng²•Institutions (2)

Pennsylvania State University¹, Singapore Management University²

26 Oct 2008

TL;DR: ROAD, a system framework for processing location dependent spatial queries (LDSQs) that search for spatial objects of interest on road networks is developed and explained how it can support efficient location-dependent nearest neighbor search.

...read moreread less

Abstract: In this research, we develop ROAD, a system framework for processing location dependent spatial queries (LDSQs) that search for spatial objects of interest on road networks By exploiting search space pruning, ROAD is very efficient and flexible for various LDSQs on different types of objects over large-scale networks In ROAD, a large road network is organized as a set of interconnected regional sub-networks (called Rnets) augmented with 1) shortcuts for accelerating search traversals; and 2) object abstracts for guiding object search In this poster, we outline this framework and explain how it can support efficient location-dependent nearest neighbor search

...read moreread less

Proceedings Article•DOI•

Efficient processing of probabilistic spatio-temporal range queries over moving objects

[...]

Bruce S. E. Chung¹, Wang-Chien Lee², Arbee L. P. Chen³•Institutions (3)

National Tsing Hua University¹, Pennsylvania State University², National Chengchi University³

26 Oct 2008

TL;DR: A performance study shows the problem of answering probabilistic range queries on moving objects based on an uncertainty model, which captures the possible movements of objects with probabilities, significantly reduces the number of object examinations and the overall cost of the query evaluation.

...read moreread less

Abstract: Range queries for querying the current and future positions of the moving objects have received growing interests in the research community. Existing methods, however, assume that an object only moves along an anticipated path. In this paper, we study the problem of answering probabilistic range queries on moving objects based on an uncertainty model, which captures the possible movements of objects with probabilities. We conduct a performance study, which shows our proposal significantly reduces the number of object examinations and the overall cost of the query evaluation.

...read moreread less

Journal Article•DOI•

Distributed query processing in flash-based sensor networks

[...]

Jianliang Xu¹, Xueyan Tang², Wang-Chien Lee³•Institutions (3)

Hong Kong Baptist University¹, Nanyang Technological University², Pennsylvania State University³

12 Aug 2008-Frontiers of Computer Science

TL;DR: Of particular interests are the issues that arise in the design of storage management and indexing structures combining sensor system workload and read/write/erase characteristics of flash memory.

...read moreread less

Abstract: Wireless sensor networks are used in a large array of applications to capture, collect, and analyze physical environmental data. Many existing sensor systems instruct sensor nodes to report their measurements to central repositories outside the network, which is expensive in energy cost. Recent technological advances in flash memory have given rise to the development of storagecentric sensor networks, where sensor nodes are equipped with high-capacity flash memory storage such that sensor data can be stored and managed inside the network to reduce expensive communication. This novel architecture calls for new data management techniques to fully exploit distributed in-network data storage. This paper describes some of our research on distributed query processing in such flash-based sensor networks. Of particular interests are the issues that arise in the design of storage management and indexing structures combining sensor system workload and read/write/erase characteristics of flash memory.

...read moreread less