scispace - formally typeset
Search or ask a question
Author

Wang-Chien Lee

Bio: Wang-Chien Lee is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Wireless sensor network & Nearest neighbor search. The author has an hindex of 60, co-authored 366 publications receiving 14123 citations. Previous affiliations of Wang-Chien Lee include Ohio State University & Verizon Communications.


Papers
More filters
01 Jan 1998
TL;DR: In this article, the authors provide an overview of the research on indexing techniques and revisit some related work in the literature and incorporate two important techniques clustering and scheduling for improving data broadcast efficiency and explore the scenarios of single and multiple attribute query processing.
Abstract: Indexing techniques have been developed as a means for clients to reduce power consumption and to select between broadcast and on demand data services In this paper we provide an overview of our research on indexing techniques and revisit some related work in the literature Our study incorporates two important techniques clustering and scheduling for improving data broadcast e ciency and explores the scenarios of single and multiple attribute query processing Moreover we apply two indexing methods cache schedule and integrated signature to a hierarchical data delivery
Proceedings ArticleDOI
01 Oct 2020
TL;DR: This paper proposes an embedding model, CO2Vec, that explores in-direct order dependencies as supplementary evidence to enhance order representation learning across different types of entities and shows the robustness of CO2vec with the removal of order relations from the original networks.
Abstract: We study the problem of representation learning for multiple types of entities in a co-ordered network where order relations exist among entities of the same type, and association relations exist across entities of different types. The key challenge in learning co-ordered network embedding is to preserve order relations among entities of the same type while leveraging on the general consistency in order relations between different entity types. In this paper, we propose an embedding model, CO2Vec, that addresses this challenge using mutually reinforced order dependencies. Specifically, CO2Vec explores in-direct order dependencies as supplementary evidence to enhance order representation learning across different types of entities. We conduct extensive experiments on both synthetic and real world datasets to demonstrate the robustness and effectiveness of CO2Vec against several strong baselines in link prediction task. We also design a comprehensive evaluation framework to study the performance of CO2Vec under different settings. In particular, our results show the robustness of CO2Vec with the removal of order relations from the original networks.
Proceedings ArticleDOI
01 Oct 2020
TL;DR: This work proposes a joint-task neural network model, Word Worth Model (WWM), to learn word embedding that captures the underlying economic worths of words, and shows that, compared with other baselines, WWM accurately predicts missing words when given target words.
Abstract: Knowing the perceived economic value of words is often desirable for applications such as product naming and pricing. However, there is a lack of understanding on the underlying economic worths of words, even though we have seen some breakthrough on learning the semantics of words. In this work, we bridge this gap by proposing a joint-task neural network model, Word Worth Model (WWM), to learn word embedding that captures the underlying economic worths. Through the design of WWM, we incorporate contextual factors, e.g., product’s brand name and restaurant’s city, that may affect the aggregated monetary value of a textual item. Via a comprehensive evaluation, we show that, compared with other baselines, WWM accurately predicts missing words when given target words. We also show that the learned embeddings of both words and contextual factors reflect well the underlying economic worths through various visualization analyses.
01 Jan 2007
TL;DR: This dissertation provides profound insights on exploiting the vast amount of data for different applications, e.g., system performance tuning, network attack detection, market analysis, opens the new research direction on distributed data mining, and provides a solid foundation for exploring various data management tasks in the networks systems.
Abstract: A massive amount of information, including multimedia files, relational data, scientific data, system usage logs, etc., is being collected and stored in a large number of host nodes connected as large scale dynamic networks (LSDNs), such as peer-to-peer (P2P) systems and sensor networks. A wide spectrum of applications, e.g., resource locating, network attack detection, market analysis, and scientific exploration, relies on efficient discovery and retrieval of resources and knowledge from the vast amount of data distributed in the network systems. With the rapid growth in the volume of data and the scale of networks, simply transferring the data generated at different host nodes to a single site for storing and processing becomes impractical, incurring excessive communication overhead while raising privacy concerns. Thus, a major challenge faced by LSDNs is to design decentralized infrastructures and algorithms that enable efficient resource and knowledge discovery in large scale dynamic networks. In this dissertation, various resource and knowledge discovery tasks ranging from simple tasks such as query processing to complex tasks such as network attack detection are systematically investigated, with a synergy of research efforts spanning multiple disciplines, including distributed computing, network and data management. Efficient and robust infrastructures and algorithms are proposed to support these tasks, with particular attention paid to various system issues including load balancing, maintenance, adaptivity to dynamic changes, data distribution and users access pattern in the networks. The superiority of these proposed ideas is demonstrated through extensive experiments using both synthetic data and real data. This dissertation provides profound insights on exploiting the vast amount of data for different applications, e.g., system performance tuning, network attack detection, market analysis, opens the new research direction on distributed data mining, and provides a solid foundation for exploring various data management tasks in the networks systems. It is expected that this study will have a deep impact on the deployment of various applications that mandate efficient management and mining of the vast amount of data distributed in the network systems.
Book ChapterDOI
01 Jan 2009
TL;DR: This chapter focuses on clustering, one of the most important data mining tasks, in P2P systems, and outlines the challenges and review the start-of-the-art in this area.
Abstract: With the advances in network communication, many large scale network systems have emerged. Peer-topeer (P2P) systems, where a large number of nodes self-form into a dynamic information sharing system, are good examples. It is extremely valuable for many P2P applications, such as market analysis, scientific exploration, and smart query answering, to discover the knowledge hidden in this distributed data repository. In this chapter, we focus on clustering, one of the most important data mining tasks, in P2P systems. We outline the challenges and review the start-of-the-art in this area. Clustering is a data mining technique to group a set of data objects into classes of similar data objects. Data objects within the same class are similar to each other, while data objects across classes are considered as dissimilar. Clustering has a wide range of applications, e.g., pattern recognition, spatial data analysis, custom/market analysis, document classification and access pattern discovery in WWW, etc. Data mining community have been intensively studying clustering techniques for the last decade. As a result, various clustering algorithms have been proposed. Majority of these proposed algorithms is designed for traditional centralized systems where all data to be clustered resides in (or is transferred to) a central site. However, it is not desirable to transfer all the data from widely spread data sources to a centralized server for clustering in P2P systems. This is due to the following three reasons: 1) there is no central control in P2P systems; 2) transferring all data objects to a central site would incur excessive communication overheads, and 3) participants of P2P systems reside in a collaborating yet competing environment, and thus they may like to expose as little information as possible to other peers for various reasons. In addition, these existing algorithms are designed to minimize disk access cost. In P2P system, the communication cost is a dominating factor. Therefore, we need to reexamine the problem of clustering in P2P systems. A general idea to perform clustering in P2P systems is to first cluster the local data objects at each peer and then combine the local clustering results to form a global clustering result. Based on this general idea, clustering in P2P systems essentially consists of two steps, i.e., local clustering and cluster assembly. While local clustering can be done by employing existing clustering techniques, cluster assembly is a nontrivial issue, which concerns representation model (what should be communicated among peers) and communication model (how peers communicate with each other). In this chapter, we review three representation models (including two approximate representation models and an exact representation model) and three communication models (including flooding-based communication model, centralized communication model, and hierarchical communication model). The rest of this chapter is organized as follows. In next section, we provide some background knowledge on P2P systems and clustering techniques. The details of representation models and communication models are presented in Section 3. We discuss future trend and draw the conclusion in Section 4 and Section 5, respectively.

Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI

6,278 citations

Proceedings ArticleDOI
21 Aug 2011
TL;DR: A model of human mobility that combines periodic short range movements with travel due to the social network structure is developed and it is shown that this model reliably predicts the locations and dynamics of future human movement and gives an order of magnitude better performance.
Abstract: Even though human movement and mobility patterns have a high degree of freedom and variation, they also exhibit structural patterns due to geographic and social constraints. Using cell phone location data, as well as data from two online location-based social networks, we aim to understand what basic laws govern human motion and dynamics. We find that humans experience a combination of periodic movement that is geographically limited and seemingly random jumps correlated with their social networks. Short-ranged travel is periodic both spatially and temporally and not effected by the social network structure, while long-distance travel is more influenced by social network ties. We show that social relationships can explain about 10% to 30% of all human movement, while periodic behavior explains 50% to 70%. Based on our findings, we develop a model of human mobility that combines periodic short range movements with travel due to the social network structure. We show that our model reliably predicts the locations and dynamics of future human movement and gives an order of magnitude better performance than present models of human mobility.

2,922 citations

01 Nov 2008

2,686 citations

Journal ArticleDOI
TL;DR: This review presents the emergent field of temporal networks, and discusses methods for analyzing topological and temporal structure and models for elucidating their relation to the behavior of dynamical systems.
Abstract: A great variety of systems in nature, society and technology -- from the web of sexual contacts to the Internet, from the nervous system to power grids -- can be modeled as graphs of vertices coupled by edges The network structure, describing how the graph is wired, helps us understand, predict and optimize the behavior of dynamical systems In many cases, however, the edges are not continuously active As an example, in networks of communication via email, text messages, or phone calls, edges represent sequences of instantaneous or practically instantaneous contacts In some cases, edges are active for non-negligible periods of time: eg, the proximity patterns of inpatients at hospitals can be represented by a graph where an edge between two individuals is on throughout the time they are at the same ward Like network topology, the temporal structure of edge activations can affect dynamics of systems interacting through the network, from disease contagion on the network of patients to information diffusion over an e-mail network In this review, we present the emergent field of temporal networks, and discuss methods for analyzing topological and temporal structure and models for elucidating their relation to the behavior of dynamical systems In the light of traditional network theory, one can see this framework as moving the information of when things happen from the dynamical system on the network, to the network itself Since fundamental properties, such as the transitivity of edges, do not necessarily hold in temporal networks, many of these methods need to be quite different from those for static networks

2,452 citations