scispace - formally typeset
Search or ask a question

Showing papers by "Wang-Chien Lee published in 2011"


Proceedings ArticleDOI
24 Jul 2011
TL;DR: This paper argues that the geographical influence among POIs plays an important role in user check-in behaviors and model it by power law distribution, and develops a collaborative recommendation algorithm based on geographical influence based on naive Bayesian.
Abstract: In this paper, we aim to provide a point-of-interests (POI) recommendation service for the rapid growing location-based social networks (LBSNs), e.g., Foursquare, Whrrl, etc. Our idea is to explore user preference, social influence and geographical influence for POI recommendations. In addition to deriving user preference based on user-based collaborative filtering and exploring social influence from friends, we put a special emphasis on geographical influence due to the spatial clustering phenomenon exhibited in user check-in activities of LBSNs. We argue that the geographical influence among POIs plays an important role in user check-in behaviors and model it by power law distribution. Accordingly, we develop a collaborative recommendation algorithm based on geographical influence based on naive Bayesian. Furthermore, we propose a unified POI recommendation framework, which fuses user preference to a POI with social influence and geographical influence. Finally, we conduct a comprehensive performance evaluation over two large-scale datasets collected from Foursquare and Whrrl. Experimental results with these real datasets show that the unified collaborative recommendation approach significantly outperforms a wide spectrum of alternative recommendation approaches.

1,048 citations


Proceedings ArticleDOI
01 Nov 2011
TL;DR: The core idea of the prediction model is a novel cluster-based prediction strategy which evaluates the next location of a mobile user based on the frequent behaviors of similar users in the same cluster determined by analyzing users' common behavior in semantic trajectories.
Abstract: Research on predicting movements of mobile users has attracted a lot of attentions in recent years. Many of those prediction techniques are developed based only on geographic features of mobile users' trajectories. In this paper, we propose a novel approach for predicting the next location of a user's movement based on both the geographic and semantic features of users' trajectories. The core idea of our prediction model is based on a novel cluster-based prediction strategy which evaluates the next location of a mobile user based on the frequent behaviors of similar users in the same cluster determined by analyzing users' common behavior in semantic trajectories. Through a comprehensive evaluation by experiments, our proposal is shown to deliver excellent performance.

291 citations


Journal ArticleDOI
TL;DR: An efficient index, called IR-tree, is proposed that together with a top-k document search algorithm facilitates four major tasks in document searches, namely, 1) spatial filtering, 2) textual filtering, 3) relevance computation, and 4) document ranking in a fully integrated manner.
Abstract: Given a geographic query that is composed of query keywords and a location, a geographic search engine retrieves documents that are the most textually and spatially relevant to the query keywords and the location, respectively, and ranks the retrieved documents according to their joint textual and spatial relevances to the query. The lack of an efficient index that can simultaneously handle both the textual and spatial aspects of the documents makes existing geographic search engines inefficient in answering geographic queries. In this paper, we propose an efficient index, called IR-tree, that together with a top-k document search algorithm facilitates four major tasks in document searches, namely, 1) spatial filtering, 2) textual filtering, 3) relevance computation, and 4) document ranking in a fully integrated manner. In addition, IR-tree allows searches to adopt different weights on textual and spatial relevance of documents at the runtime and thus caters for a wide variety of applications. A set of comprehensive experiments over a wide range of scenarios has been conducted and the experiment results demonstrate that IR-tree outperforms the state-of-the-art approaches for geographic document searches.

270 citations


Proceedings ArticleDOI
21 Aug 2011
TL;DR: A semantic annotation technique for location-based social networks to automatically annotate all places with category tags which are a crucial prerequisite for location search, recommendation services, or data cleaning is developed.
Abstract: In this paper, we develop a semantic annotation technique for location-based social networks to automatically annotate all places with category tags which are a crucial prerequisite for location search, recommendation services, or data cleaning. Our annotation algorithm learns a binary support vector machine (SVM) classifier for each tag in the tag space to support multi-label classification. Based on the check-in behavior of users, we extract features of places from i) explicit patterns (EP) of individual places and ii) implicit relatedness (IR) among similar places. The features extracted from EP are summarized from all check-ins at a specific place. The features from IR are derived by building a novel network of related places (NRP) where similar places are linked by virtual edges. Upon NRP, we determine the probability of a category tag for each place by exploring the relatedness of places. Finally, we conduct a comprehensive experimental study based on a real dataset collected from a location-based social network, Whrrl. The results demonstrate the suitability of our approach and show the strength of taking both EP and IR into account in feature extraction.

243 citations


Proceedings ArticleDOI
24 Jul 2011
TL;DR: The Collaborative Location Recommendation (CLR) framework is proposed, which employs a dynamic clustering algorithm CADC to cluster the trajectory data into groups of similar users, similar activities and similar locations efficiently by supporting incremental update of the groups when new GPS trajectory data arrives.
Abstract: GPS data tracked on mobile devices contains rich information about human activities and preferences. In this paper, GPS data is used in location-based services (LBSs) to provide collaborative location recommendations. We observe that most existing LBSs provide location recommendations by clustering the User-Location matrix. Since the User-Location matrix created based on GPS data is huge, there are two major problems with these methods. First, the number of similar locations that need to be considered in computing the recommendations can be numerous. As a result, the identification of truly relevant locations from numerous candidates is challenging. Second, the clustering process on large matrix is time consuming. Thus, when new GPS data arrives, complete re-clustering of the whole matrix is infeasible. To tackle these two problems, we propose the Collaborative Location Recommendation (CLR) framework for location recommendation. By considering activities (i.e., temporal preferences) and different user classes (i.e., Pattern Users, Normal Users, and Travelers) in the recommendation process, CLR is capable of generating more precise and refined recommendations to the users compared to the existing methods. Moreover, CLR employs a dynamic clustering algorithm CADC to cluster the trajectory data into groups of similar users, similar activities and similar locations efficiently by supporting incremental update of the groups when new GPS trajectory data arrives. We evaluate CLR with a real-world GPS dataset, and confirm that the CLR framework provides more accurate location recommendations compared to the existing methods.

154 citations


Proceedings ArticleDOI
01 Nov 2011
TL;DR: This work extracts user check-ins from massive real-world data crawled from Location-based Social Networks to understand the temporal dimension of Points Of Interest.
Abstract: Feature types play a crucial role in understanding and analyzing geographic information. Usually, these types are defined, standardized, and controlled by domain experts and cover geographic features on the mesoscale level, e.g., populated places, forests, or lakes. While feature types also underlie most Location-Based Services (LBS), assigning a consistent typing schema for Points Of Interest (POI) across different data sets is challenging. In case of Volunteered Geographic Information (VGI), types are assigned as tags by a heterogeneous community with different backgrounds and applications in mind. Consequently, VGI research is shifting away from data completeness and positional accuracy as quality measures towards attribute accuracy. As tags can be assigned by everybody and have no formal or stable definition, we propose to study category tags via indirect observations. We extract user check-ins from massive real-world data crawled from Location-based Social Networks to understand the temporal dimension of Points Of Interest. While users may assign different category tags to places, we argue that their temporal characteristics, e.g., opening times, will show distinguishable patterns.

120 citations


Book ChapterDOI
12 Sep 2011
TL;DR: This work presents a methodology to analyze the spatial-semantic interaction of point features in Volunteered Geographic Information, presents a case study on a spatial and semantic subset of OpenStreetMap, and introduces a novel semantic similarity measure based on the change history of Open StreetMap elements.
Abstract: With the increasing success and commercial integration of Volunteered Geographic Information (VGI), the focus shifts away from coverage to data quality and homogeneity. Within the last years, several studies have been published analyzing the positional accuracy of features, completeness of specific attributes, or the topological consistency of line and polygon features. However, most of these studies do not take geographic feature types into account. This is for two reasons. First, and in contrast to street networks, choosing a reference set is difficult. Second, we lack the measures to quantify the degree of feature type miscategorization. In this work, we present a methodology to analyze the spatial-semantic interaction of point features in Volunteered Geographic Information. Feature types in VGI can be considered special in both, the way they are formed and the way they are applied. Given that they reflect community agreement more accurately than top-down approaches, we argue that they should be used as the primary basis for assessing spatial-semantic interaction. We present a case study on a spatial and semantic subset of OpenStreetMap, and introduce a novel semantic similarity measure based on the change history of OpenStreetMap elements. Our results set the stage for systems that assist VGI contributors in suggesting the types of new features, cleaning up existing data, and integrating data from different sources.

65 citations


Journal ArticleDOI
01 Mar 2011
TL;DR: Wang et al. as discussed by the authors proposed a social-temporal group query to find the activity time and attendees with the minimum total social distance to the initiator, which incorporates an acquaintance constraint to avoid finding a group with mutually unfamiliar attendees.
Abstract: Three essential criteria are important for activity planning, including: (1) finding a group of attendees familiar with the initiator, (2) ensuring each attendee in the group to have tight social relations with most of the members in the group, and (3) selecting an activity period available for all attendees. Therefore, this paper proposes Social-Temporal Group Query to find the activity time and attendees with the minimum total social distance to the initiator. Moreover, this query incorporates an acquaintance constraint to avoid finding a group with mutually unfamiliar attendees. Efficient processing of the social-temporal group query is very challenging. We show that the problem is NP-hard via a proof and formulate the problem with Integer Programming. We then propose two efficient algorithms, SGSelect and STGSelect, which include effective pruning techniques and employ the idea of pivot time slots to substantially reduce the running time, for finding the optimal solutions. Experimental results indicate that the proposed algorithms are much more efficient and scalable. In the comparison of solution quality, we show that STGSelect outperforms the algorithm that represents manual coordination by the initiator.

46 citations


Posted Content
TL;DR: Two efficient algorithms are proposed, SGSelect and STGSelect, which include effective pruning techniques and employ the idea of pivot time slots to substantially reduce the running time, for finding the optimal solutions to the social-temporal group query.
Abstract: Three essential criteria are important for activity planning, including: (1) finding a group of attendees familiar with the initiator, (2) ensuring each attendee in the group to have tight social relations with most of the members in the group, and (3) selecting an activity period available for all attendees. Therefore, this paper proposes Social-Temporal Group Query to find the activity time and attendees with the minimum total social distance to the initiator. Moreover, this query incorporates an acquaintance constraint to avoid finding a group with mutually unfamiliar attendees. Efficient processing of the social-temporal group query is very challenging. We show that the problem is NP-hard via a proof and formulate the problem with Integer Programming. We then propose two efficient algorithms, SGSelect and STGSelect, which include effective pruning techniques and employ the idea of pivot time slots to substantially reduce the running time, for finding the optimal solutions. Experimental results indicate that the proposed algorithms are much more efficient and scalable. In the comparison of solution quality, we show that STGSelect outperforms the algorithm that represents manual coordination by the initiator.

38 citations


Posted Content
TL;DR: Experimental results show that the generative models with social influence significantly outperform those without incorporating social influence, and the experimental results also confirm that the social influence based group recommendation algorithm outperforms the state-of-the-art algorithms for group recommendation.
Abstract: In this paper, we propose a probabilistic generative model, called unified model, which naturally unifies the ideas of social influence, collaborative filtering and content-based methods for item recommendation. To address the issue of hidden social influence, we devise new algorithms to learn the model parameters of our proposal based on expectation maximization (EM). In addition to a single-machine version of our EM algorithm, we further devise a parallelized implementation on the Map-Reduce framework to process two large-scale datasets we collect. Moreover, we show that the social influence obtained from our generative models can be used for group recommendation. Finally, we conduct comprehensive experiments using the datasets crawled from this http URL and this http URL to validate our ideas. Experimental results show that the generative models with social influence significantly outperform those without incorporating social influence. The unified generative model proposed in this paper obtains the best performance. Moreover, our study on social influence finds that users in this http URL are more likely to get influenced by friends than those in this http URL. The experimental results also confirm that our social influence based group recommendation algorithm outperforms the state-of-the-art algorithms for group recommendation.

33 citations


Journal ArticleDOI
TL;DR: A novel framework, called Trajectory-based Path Finding (TPF), is developed, built upon a novel algorithm named Mining-based Algorithm for Travel time Evaluation (MATE) for evaluating the travel time of a navigation path and a novel index structure named Efficient Navigation Path Search Tree (ENS-Tree) for efficiently retrieving the fastest path.
Abstract: Nowadays, research on Intelligent Transportation System (ITS) has received many attentions due to its broad applications, such as path planning, which has become a common activity in our daily life. Besides, with the advances of Web 2.0 technologies, users are willing to share their trajectories, thus providing good resources for ITS applications. To the best of our knowledge, there is no study on the fastest path planning with multiple destinations in the literature. In this paper, we develop a novel framework, called Trajectory-based Path Finding (TPF), which is built upon a novel algorithm named Mining-based Algorithm for Travel time Evaluation (MATE) for evaluating the travel time of a navigation path and a novel index structure named Efficient Navigation Path Search Tree (ENS-Tree) for efficiently retrieving the fastest path. With MATE and ENS-tree, an efficient fastest path finding algorithm for single destination is derived. To find the path for multiple destinations, we propose a novel strategy named Cluster-Based Approximation Strategy (CBAS), to determine the fastest visiting order from specified multiple destinations. Through a comprehensive set of experiments, we evaluate the proposed techniques employed in the design of TPF and show that MATE, ENS-tree and CBAS produce excellent performance under various system conditions.

Proceedings ArticleDOI
11 Apr 2011
TL;DR: This work develops a novel collaborative cache replacement policy which maximizes cache effectiveness by considering not only the peer itself but also its neighbors, and implements two SECC schemes, namely, the periodical and adaptive SAT-based schemes, with different SAT maintenance policies.
Abstract: We propose a novel collaborative caching framework to support spatial query processing in Mobile Peer-to-Peer Networks (MP2PNs). To maximize cache sharing among clients, each client caches not only data objects but also parts of the index structure built on the spatial objects. Thus, we call the proposed method structure-embedded collaborative caching (SECC). By introducing a novel index structure called Signature Augment Tree (SAT), we address two crucial issues in SECC. First, we propose a cost-efficient collaborative query processing method in MP2PNs, including peer selection and result merge from multiple peers. Second, we develop a novel collaborative cache replacement policy which maximizes cache effectiveness by considering not only the peer itself but also its neighbors. We implement two SECC schemes, namely, the periodical and adaptive SAT-based schemes, with different SAT maintenance policies. Simulation results show that our SECC schemes significantly outperform other collaborative caching methods which are based on existing spatial caching schemes in a number of metrics, including traffic volume, query latency and power consumption.

Journal ArticleDOI
01 Jun 2011
TL;DR: QFilter is a scalable and effective pre-processing approach based on non-deterministic finite automata and rewrites user’s queries such that parts violating access control rules are pre-pruned, capable of many emerging applications, such as in-network access control and access control outsourcing.
Abstract: In this paper, we ask whether XML access control can be supported when underlying (XML or relational) storage system does not provide adequate security features and propose three alternative solutions --primitive, pre-processing, and post-processing. Toward that scenario, in particular, we advocate a scalable and effective pre-processing approach, called QFilter. QFilter is based on non-deterministic finite automata (NFA) and rewrites user's queries such that parts violating access control rules are pre-pruned. Through analysis and experimental validation, we show that (1) QFilter guarantees that only permissible portion of data is returned to the authorized users, (2) such access controls can be efficiently enforced without relying on security features of underlying storage system, and (3) such independency makes QFilter capable of many emerging applications, such as in-network access control and access control outsourcing.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This paper proposes four locale based metrics, including Locale Clustering Coefficient, Inward Locale Transitivity, Locale Assortativity Coefficient and LocaleAssortability Coefficient to make association analysis on EveryTrail, a popular LBSN specialized on sharing trips and observations are observed.
Abstract: In recent years, location-based social networks (LBSNs) have received high attention. While this new breed of social networks is nascent, there is no large-scale analysis conducted to investigate the associations among users in locales of the network. In this paper, we propose four locale based metrics, including Locale Clustering Coefficient, Inward Locale Transitivity, Locale Assortativity Coefficient, and Locale Assortability Coefficient to make association analysis on EveryTrail, a popular LBSN specialized on sharing trips. Based on the analysis result, we observe that people who share more trajectories will get more attention by other users, and people who are popular will connect to the people who are also popular.

Proceedings ArticleDOI
24 Jul 2011
TL;DR: This paper develops a travelogue service that discovers and conveys various travelogue digests, in form of theme locations, geographical scope, traveling trajectory and location snippet, to users and explores the textual and geographical features of locations to develop a co-training model for enhancement of classification performance.
Abstract: In this paper, we aim to develop a travelogue service that discovers and conveys various travelogue digests, in form of theme locations, geographical scope, traveling trajectory and location snippet, to users. In this service, theme locations in a travelogue are the core information to discover. Thus we aim to address the problem of theme location discovery to enable the above travelogue services. Due to the inherent ambiguity of location relevance, we perform location relevance mining (LRM) in two complementary angles, relevance classification and relevance ranking, to provide comprehensive understanding of locations. Furthermore, we explore the textual (e.g., surrounding words) and geographical (e.g., geographical relationship among locations) features of locations to develop a co-training model for enhancement of classification performance. Built upon the mining result of LRM, we develop a series of techniques for provisioning of the aforementioned travelogue digests in our travelogue system. Finally, we conduct comprehensive experiments on collected travelogues to evaluate the performance of our location relevance mining techniques and demonstrate the effectiveness of the travelogue service.

Journal ArticleDOI
TL;DR: The experimental results show that m-LIGHT substantially reduces index maintenance overhead and improves query performance in terms of both bandwidth consumption and response latency.
Abstract: In this paper, we study the problem of indexing multidimensional data in P2P networks based on distributed hash tables (DHTs). We advocate the indexing approach that superimposes a multidimensional index tree on top of a DHT - a paradigm that keeps the underlying DHT intact while being able to adapt to any DHT substrate. In this context, we identify several index design issues and propose a novel indexing scheme called multidimensional Lightweight Hash Tree (m-LIGHT). First, to preserve data locality, m-LIGHT employs a clever naming mechanism that gracefully maps a tree-based index into the DHT and contributes to high efficiency in both index maintenance and query processing. Second, to tackle the load balancing issue, m-LIGHT leverages a new data-aware splitting strategy that achieves optimal load balance under a fixed index size. We present detailed algorithms for processing complex queries over the m-LIGHT index. We also conduct an extensive performance evaluation of m-LIGHT in comparison with several state-of-the-art indexing schemes. The experimental results show that m-LIGHT substantially reduces index maintenance overhead and improves query performance in terms of both bandwidth consumption and response latency.

Journal ArticleDOI
01 Oct 2011
TL;DR: This paper exploits query containment techniques for LDSQs (called LDSQ containment) to enable mobile clients to determine whether the result of a new LDSQ Q' is completely covered by that of another LDSQQ previously answered by a server and to answer Q' locally if Q'?Q.
Abstract: Nowadays, location-related information is highly accessible to mobile users via issuing Location-Dependent Spatial Queries (LDSQs) with respect to their locations wirelessly to Location-Based Service (LBS) servers. Due to the limited mobile device battery energy, scarce wireless bandwidth, and heavy LBS server workload, the number of LDSQs submitted over wireless channels to LBS servers for evaluation should be minimized as appropriate. In this paper, we exploit query containment techniques for LDSQs (called LDSQ containment) to enable mobile clients to determine whether the result of a new LDSQ Q' is completely covered by that of another LDSQ Q previously answered by a server (denoted by Q'?Q) and to answer Q' locally if Q'?Q. Thus, many LDSQs can be reduced from server evaluation. To support LDSQ containment, we propose a notion of containment scope, which represents a spatial area corresponding to an LDSQ result wherein all semantically matched LDSQs are answerable with the result. Through a comprehensive simulation, our proposed approach significantly outperforms existing techniques.

Proceedings ArticleDOI
28 Mar 2011
TL;DR: A travelogue service to discover and convey various travelogue digests, in form of theme locations and geographical scope to their readers, and explores the textual and geographical features of locations to perform location relevance classification for theme location discovery.
Abstract: In this paper, we aim to develop a travelogue service to discover and convey various travelogue digests, in form of theme locations and geographical scope to their readers. In this service, theme locations in a travelogue are the core information to discover. Due to the inherent ambiguity of location relevance, we explore the textual (e.g., surrounding words) and geographical (e.g., geographical relationship among locations) features of locations to perform location relevance classification for theme location discovery. Finally, we conduct comprehensive experiments on collected travelogues to evaluate the performance of our location relevance classification technique and demonstrate the effectiveness of the travelogue service.