scispace - formally typeset
Search or ask a question

Showing papers by "Wang-Chien Lee published in 2012"


Proceedings ArticleDOI
12 Aug 2012
TL;DR: This paper is the first research to study EBSNs at scale and paves the way for future studies on this new type of social network.
Abstract: Newly emerged event-based online social services, such as Meetup and Plancast, have experienced increased popularity and rapid growth. From these services, we observed a new type of social network - event-based social network (EBSN). An EBSN does not only contain online social interactions as in other conventional online social networks, but also includes valuable offline social interactions captured in offline activities. By analyzing real data collected from Meetup, we investigated EBSN properties and discovered many unique and interesting characteristics, such as heavy-tailed degree distributions and strong locality of social interactions.We subsequently studied the heterogeneous nature (co-existence of both online and offline social interactions) of EBSNs on two challenging problems: community detection and information flow. We found that communities detected in EBSNs are more cohesive than those in other types of social networks (e.g. location-based social networks). In the context of information flow, we studied the event recommendation problem. By experimenting various information diffusion patterns, we found that a community-based diffusion model that takes into account of both online and offline interactions provides the best prediction power.This paper is the first research to study EBSNs at scale and paves the way for future studies on this new type of social network. A sample dataset of this study can be downloaded from http://www.largenetwork.org/ebsn.

307 citations


Proceedings ArticleDOI
12 Aug 2012
TL;DR: Experimental results show that social influence captured based on the proposed probabilistic generative model, called social influenced selection (SIS), is effective for enhancing both item recommendation and group recommendation, essential for viral marketing, and useful for various user analysis.
Abstract: Social friendship has been shown beneficial for item recommendation for years. However, existing approaches mostly incorporate social friendship into recommender systems by heuristics. In this paper, we argue that social influence between friends can be captured quantitatively and propose a probabilistic generative model, called social influenced selection(SIS), to model the decision making of item selection (e.g., what book to buy or where to dine). Based on SIS, we mine the social influence between linked friends and the personal preferences of users through statistical inference. To address the challenges arising from multiple layers of hidden factors in SIS, we develop a new parameter learning algorithm based on expectation maximization (EM). Moreover, we show that the mined social influence and user preferences are valuable for group recommendation and viral marketing. Finally, we conduct a comprehensive performance evaluation using real datasets crawled from last.fm and whrrl.com to validate our proposal. Experimental results show that social influence captured based on our SIS model is effective for enhancing both item recommendation and group recommendation, essential for viral marketing, and useful for various user analysis.

284 citations


Proceedings ArticleDOI
29 Oct 2012
TL;DR: This paper analyzes the decision making process in a group to propose a personal impact topic (PIT) model for group recommendations, which effectively identifies the group preference profile for a given group by considering the personal preferences and personal impacts of group members.
Abstract: Group activities are essential ingredients of people's social life. The rapid growth of online social networking services has greatly boosted group activities by providing convenient platform for users to organize and participate in such activities. Therefore, recommender systems, as a critical component in social networking services, now face new challenges in supporting group activities. In this paper, we study the group recommendation problem, i.e., making recommendations to a group of people in social networking services. We analyze the decision making process in a group to propose a personal impact topic (PIT) model for group recommendations. The PIT model effectively identifies the group preference profile for a given group by considering the personal preferences and personal impacts of group members. Moreover, we further enhance the discovery of personal impact with social network information to obtain an extended personal impact topic (E-PIT) model. We have conducted comprehensive data analysis and evaluations on three real datasets. The results show that our proposed group recommendation techniques outperform baseline approaches.

127 citations


Proceedings ArticleDOI
12 Aug 2012
TL;DR: This paper designs an efficient algorithm SSGSelect, which includes effective pruning techniques to reduce the running time for finding the optimal solution, and proposes a new index structure, Social R-Tree, to further improve the efficiency.
Abstract: Challenges faced in organizing impromptu activities are the requirements of making timely invitations in accordance with the locations of candidate attendees and the social relationship among them. It is desirable to find a group of attendees close to a rally point and ensure that the selected attendees have a good social relationship to create a good atmosphere in the activity. Therefore, this paper proposes Socio-Spatial Group Query (SSGQ) to select a group of nearby attendees with tight social relation. Efficient processing of SSGQ is very challenging due to the tradeoff in the spatial and social domains. We show that the problem is NP-hard via a proof and design an efficient algorithm SSGSelect, which includes effective pruning techniques to reduce the running time for finding the optimal solution. We also propose a new index structure, Social R-Tree to further improve the efficiency. User study and experimental results demonstrate that SSGSelect significantly outperforms manual coordination in both solution quality and efficiency.

115 citations


Journal ArticleDOI
TL;DR: The analysis and experiment results show the superiority of ROAD over the state-of-the-art approaches.
Abstract: In this paper, we present a new system framework called ROAD for spatial object search on road networks. ROAD is extensible to diverse object types and efficient for processing various location-dependent spatial queries (LDSQs), as it maintains objects separately from a given network and adopts an effective search space pruning technique. Based on our analysis on the two essential operations for LDSQ processing, namely, network traversal and object lookup, ROAD organizes a large road network as a hierarchy of interconnected regional subnetworks (called Rnets). Each Rnet is augmented with 1) shortcuts and 2) object abstracts to accelerate network traversals and provide quick object lookups, respectively. To manage those shortcuts and object abstracts, two cooperating indices, namely, Route Overlay and Association Directory are devised. In detail, we present 1) the Rnet hierarchy and several properties useful in constructing and maintaining the Rnet hierarchy, 2) the design and implementation of the ROAD framework, and 3) a suite of efficient search algorithms for single-source LDSQs and multisource LDSQs. We conduct a theoretical performance analysis and carry out a comprehensive empirical study to evaluate ROAD. The analysis and experiment results show the superiority of ROAD over the state-of-the-art approaches.

104 citations


Journal ArticleDOI
TL;DR: This work proposes a novel framework, called Mobile Commerce Explorer (MCE), for mining and prediction of mobile users' movements and purchase transactions under the context of mobile commerce, and is believed to be the first work that facilitates mining and predictions of users' commerce behaviors in order to recommend stores and items previously unknown to a user.
Abstract: Due to a wide range of potential applications, research on mobile commerce has received a lot of interests from both of the industry and academia. Among them, one of the active topic areas is the mining and prediction of users' mobile commerce behaviors such as their movements and purchase transactions. In this paper, we propose a novel framework, called Mobile Commerce Explorer (MCE), for mining and prediction of mobile users' movements and purchase transactions under the context of mobile commerce. The MCE framework consists of three major components: 1) Similarity Inference Model (SIM) for measuring the similarities among stores and items, which are two basic mobile commerce entities considered in this paper; 2) Personal Mobile Commerce Pattern Mine (PMCP-Mine) algorithm for efficient discovery of mobile users' Personal Mobile Commerce Patterns (PMCPs); and 3) Mobile Commerce Behavior Predictor (MCBP) for prediction of possible mobile user behaviors. To our best knowledge, this is the first work that facilitates mining and prediction of mobile users' commerce behaviors in order to recommend stores and items previously unknown to a user. We perform an extensive experimental evaluation by simulation and show that our proposals produce excellent results.

62 citations


Proceedings ArticleDOI
06 Nov 2012
TL;DR: Experimental result shows that the proposed prediction schemes significantly outperforms the state-of-the-art and baseline techniques.
Abstract: In this paper, we develop a new bus travel time prediction framework, called Historical Trajectory based Travel/Arrival Time Prediction (HTTP) for real-time prediction of travel time over future segments (and thus the arrival time at stops) of an on-going bus journey. The basic idea behind HTTP is to use a collection of historical trajectories "similar" to the current bus trajectory to predict the future segments. Specifically, the HTTP framework (1) samples a set of similar trajectories as the basis for travel time estimation instead of relying on only one historical trajectory best matching the on-going bus journey; and (2) explores different prediction schemes, namely, passed segments, temporal features, and hybrid methods, to identify the sample set of similar trajectories. We conduct a comprehensive empirical experimentation using real bus trajectory data collected from Taipei City, Taiwan to validate our ideas and to evaluate the proposed schemes. Experimental result shows that the proposed prediction schemes significantly outperforms the state-of-the-art and baseline techniques.

54 citations


Proceedings ArticleDOI
08 Feb 2012
TL;DR: This work argues that the former personality prompts a user to cast her vote conforming to the majority of the service community while on the contrary the later personality makes her vote different from the community and proposes a Conformer-Maverick (CM) model to simulate the voting process and use it to rank top-k potentially popular items based on the early votes they received.
Abstract: Prediction of popular items in online content sharing systems has recently attracted a lot of attention due to the tremendous need of users and its commercial values. Different from previous works that make prediction by fitting a popularity growth model, we tackle this problem by exploiting the latent conforming and maverick personalities of those who vote to assess the quality of on-line items. We argue that the former personality prompts a user to cast her vote conforming to the majority of the service community while on the contrary the later personality makes her vote different from the community. We thus propose a Conformer-Maverick (CM) model to simulate the voting process and use it to rank top-k potentially popular items based on the early votes they received. Through an extensive experimental evaluation, we validate our ideas and find that our proposed CM model achieves better performance than baseline solutions, especially for smaller k.

41 citations


Journal ArticleDOI
TL;DR: A centralized algorithm to determine a set of representative nodes with high energy levels and wide data coverage ranges is proposed, and maintenance mechanisms are proposed to dynamically select alternative representative nodes when the original representative nodes run low on energy, or cannot capture spatial correlation within their respective data Coverage ranges.
Abstract: To conserve energy, sensor nodes with similar readings can be grouped such that readings from only the representative nodes within the groups need to be reported. However, efficiently identifying sensor groups and their representative nodes is a very challenging task. In this paper, we propose a centralized algorithm to determine a set of representative nodes with high energy levels and wide data coverage ranges. Here, the data coverage range of a sensor node is considered to be the set of sensor nodes that have reading behaviors very close to the particular sensor node. To further reduce the extra cost incurred in messages for selection of representative nodes, a distributed algorithm is developed. Furthermore, maintenance mechanisms are proposed to dynamically select alternative representative nodes when the original representative nodes run low on energy, or cannot capture spatial correlation within their respective data coverage ranges. Using experimental studies on both synthesis and real data sets, our proposed algorithms are shown to effectively and efficiently provide approximate data collection while prolonging the network lifetime.

37 citations


Proceedings ArticleDOI
23 Jul 2012
TL;DR: A novel Key formulation scheme based on R+-tree (abbreviated as KR+-index) is proposed, which outperforms other existing key formulations and MD-HBase and two spatial queries, k-NN query and range query, are designed.
Abstract: Due to the flexibility and scalability in cloud computing, cloud computing nowadays plays an important role to handle a large-scale data analysis. For data processing operations, several cloud data managements (CDMs), such as HBase and Cassandra, are developed. Such CDMs usually provide key-value storages, where each key is used to access its corresponding value. Both HBase and Cassandra provide some basic operations (e.g., Get, Scan) to retrieve the values via keys specified by users. The exiting CDMs fully inherit the characteristics of cloud computing (i.e., high scalability and availability). With the aforementioned characteristics of cloud computing, CDMs are widely employed for Web data, especially for search engines. However, with the proliferation of smart phones and location-based services, data with spatial information, referring as spatial data, are dramatically increasing. Consequently, how to formulate keys for spatial data in the existing CDMs is a challenge issue. In this paper, we develop several key formulation schemes. In particular, we propose a novel Key formulation scheme based on R+-tree (abbreviated as KR+-index). With our design for keys of spatial data, the existing CDMs are able to efficiently retrieve spatial data. In light of KR+-tree, two spatial queries, k-NN query and range query, are designed. Moreover, we implement the proposed key formulation schemes on HBase and Cassandra, and import real spatial data for spatial queries. The experimental results demonstrate that KR+-tree outperforms other existing key formulations and MD-HBase.

28 citations


Proceedings ArticleDOI
23 Jul 2012
TL;DR: An algorithm, coined Proximity, which answers CAkNN queries in O(n(k + λ)) time, where n denotes the number of users and λ a network-specific parameter, and its efficiency is mainly attributed to a smart search space sharing technique it introduces.
Abstract: Consider a centralized query operator that identifies to every smart phone user its k geographically nearest neighbors at all times, a query we coin Continuous All k-Nearest Neighbor (CAkNN). Such an operator could be utilized to enhance public emergency services, allowing users to send SOS beacons out to the closest rescuers and allowing gamers or social networking users to establish ad-hoc overlay communication infrastructures, in order to carry out complex interactions. In this paper, we study the problem of efficiently processing a CAkNN query in a cellular or WiFi network, both of which are ubiquitous. We introduce an algorithm, coined Proximity, which answers CAkNN queries in O(n(k+lambda)) time, where n denotes the number of users and lambda a network-specific parameter (lambda

Proceedings ArticleDOI
29 Oct 2012
TL;DR: This paper proposes a dynamic model for group gathering based on the process of friend invitation to interpret how a f2f group is formed on-line, and demonstrates that using such group information can effectively improve the accuracies of social tie inference and friend recommendation.
Abstract: The rapid development of on-line social networking sites has dramatically changed the way people live and communicate. One particularly interesting phenomena came along with this development is the prominent role of various on-line networking portals played in scheduling and organizing off-line group events and activities. In this paper, we focus on studying the face-to-face(f2f) group formed through, or facilitated by, on-line portals. We first show the distinct characteristics of such f2f groups by analyzing datasets collected from Whrrl and Meetup. Next, we propose a dynamic model for group gathering based on the process of friend invitation to interpret how a f2f group is formed on-line. The results of our model are confirmed by empirical observations. Finally, we demonstrate that using such group information can effectively improve the accuracies of social tie inference and friend recommendation.

Proceedings ArticleDOI
29 Oct 2012
TL;DR: Experimental results show that ABC significantly outperforms its counterpart and two baseline approaches in terms of both computational overhead and bundle quality.
Abstract: Prior research on viral marketing mostly focuses on promoting one single product item. In this work, we explore the idea of bundling multiple items for viral marketing and formulate a new research problem, called Bundle Configuration for SpreAd Maximization (BCSAM). Efficiently obtaining an optimal product bundle under the setting of BCSAM is very challenging. Aiming to strike a balance between the quality of solution and the computational overhead, we systematically explore various heuristics to develop a suite of algorithms, including κ-Bundle Configuration and Aggregated Bundle Configuration. Moreover, we integrate all the proposed ideas into one efficient algorithm, called Aggregated Bundle Configuration (ABC). Finally, we conduct an extensive performance evaluation on our proposals. Experimental results show that ABC significantly outperforms its counterpart and two baseline approaches in terms of both computational overhead and bundle quality.

Journal ArticleDOI
TL;DR: To answer PMVQs and PMNQs energy-efficiently, two suites of in-network algorithms are devised and extended to answerPMNQ variants, and all the proposed approaches are evaluated through cost analysis and simulations.
Abstract: In this paper, we introduce two types of probabilistic aggregation queries, namely, Probabilistic Minimum Value Queries (PMVQ)s and Probabilistic Minimum Node Queries (PMNQ)s. A PMVQ determines possible minimum values among all imprecise sensed data, while a PMNQ identifies sensor nodes that possibly provide minimum values. However, centralized approaches incur a lot of energy from battery-powered sensor nodes and well-studied in-network aggregation techniques that presume precise sensed data are not practical to inherently imprecise sensed data. Thus, to answer PMVQs and PMNQs energy-efficiently, we devised suites of in-network algorithms. For PMVQs, our in-network minimum value screening algorithm (MVS) filters candidate minimum values; and our in-network minimum value aggregation algorithm (MVA) conducts in-network probability calculation. PMNQs requires possible minimum values to be determined a prior, inevitably consuming more energy to evaluate than PMVQs. Accordingly, our one-phase and two-phase in-network algorithms are devised. We also extend the algorithms to answer PMNQ variants. We evaluate all our proposed approaches through cost analysis and simulations.

Proceedings ArticleDOI
04 Sep 2012
TL;DR: Aiming at achieving high estimation accuracy and alleviating excessive computation, a time-series disaggregation algorithm is developed which incorporates two novel techniques, namely, DE-pruning and monotonic enumeration, for search space pruning.
Abstract: The growing concerns on urgent environmental and economical issues, such as global warming and rising energy cost, have motivated research studies on various green computing technologies. For example, Non-Intrusive Appliance Load Monitor (NIALM) techniques, aiming at energy monitoring, load forecasting and improved control of residential electrical appliances, have been developed by monitoring one electrical circuit that contains a number of electrical appliances without using separate sub-meters. By employing pattern recognition algorithms, the NIALM techniques estimate the consumption of individual appliances. While the basic ideas behind the NIALM techniques are valid, existing proposals suffer from the issue of poor estimation accuracy. In this paper, we model the process of load separation in NIALM as a time series disaggregation problem. Aiming at achieving high estimation ac-curacy and alleviating excessive computation, we develop a time-series disaggregation algorithm which incorporates two novel techniques, namely, DE-pruning and monotonic enumeration, for search space pruning. A comprehensive set of experiments are conducted to validate our proposals and to evaluate the effectiveness and the efficiency of the proposed methods. The result shows that our proposal is effective and efficient.

Journal ArticleDOI
TL;DR: This study designs three metrics to evaluate the system performance, develops five task assignment algorithms for GWAP-based geotagging systems, and finds that the Least-Throughput-First Assignment algorithm (LTFA) is the most effective approach because it can achieve competitive system utility, while its computational complexity remains moderate.
Abstract: Geospatial tagging (geotagging) is an emerging and very promising application that can help users find a wide variety of location-specific information, and thereby facilitate the development of advanced location-based services. Conventional geotagging systems share some limitations, such as the use of a two-phase operating model and the tendency to tag popular objects with simple contexts. To address these problems, a number of geotagging systems based on the concept of `Games with a Purpose' (GWAP) have been developed recently. In this study, we use analysis to investigate these new systems. Based on our analysis results, we design three metrics to evaluate the system performance, and develop five task assignment algorithms for GWAP-based systems. Using a comprehensive set of simulations under both synthetic and realistic mobility scenarios, we find that the Least-Throughput-First Assignment algorithm (LTFA) is the most effective approach because it can achieve competitive system utility, while its computational complexity remains moderate. We also find that, to improve the system utility, it is better to assign as many tasks as possible in each round. However, because players may feel annoyed if too many tasks are assigned at the same time, it is recommended that multiple tasks be assigned one by one in each round in order to achieve higher system utility.