scispace - formally typeset
Search or ask a question

Showing papers by "Gao Cong published in 2015"


Proceedings Article
25 Jul 2015
TL;DR: This paper proposes a personalized ranking metric embedding method (PRME) to model personalized check-in sequences and develops a PRME-G model, which integrates sequential information, individual preference, and geographical influence, to improve the recommendation performance.
Abstract: The rapidly growing of Location-based Social Networks (LBSNs) provides a vast amount of check-in data, which enables many services, e.g., point-of-interest (POI) recommendation. In this paper, we study the next new POI recommendation problem in which new POIs with respect to users' current location are to be recommended. The challenge lies in the difficulty in precisely learning users' sequential information and personalizing the recommendation model. To this end, we resort to the Metric Embedding method for the recommendation, which avoids drawbacks of the Matrix Factorization technique. We propose a personalized ranking metric embedding method (PRME) to model personalized check-in sequences. We further develop a PRME-G model, which integrates sequential information, individual preference, and geographical influence, to improve the recommendation performance. Experiments on two real-world LBSN datasets demonstrate that our new algorithm outperforms the state-of-the-art next POI recommendation methods.

373 citations


Proceedings ArticleDOI
09 Aug 2015
TL;DR: A ranking based geographical factorization method, called Rank-GeoFM, for POI recommendation, which addresses the two challenges of scarcity of check-in data and context information, and outperforms the state-of-the-art methods significantly in terms of recommendation accuracy.
Abstract: With the rapid growth of location-based social networks, Point of Interest (POI) recommendation has become an important research problem. However, the scarcity of the check-in data, a type of implicit feedback data, poses a severe challenge for existing POI recommendation methods. Moreover, different types of context information about POIs are available and how to leverage them becomes another challenge. In this paper, we propose a ranking based geographical factorization method, called Rank-GeoFM, for POI recommendation, which addresses the two challenges. In the proposed model, we consider that the check-in frequency characterizes users' visiting preference and learn the factorization by ranking the POIs correctly. In our model, POIs both with and without check-ins will contribute to learning the ranking and thus the data sparsity problem can be alleviated. In addition, our model can easily incorporate different types of context information, such as the geographical influence and temporal influence. We propose a stochastic gradient descent based algorithm to learn the factorization. Experiments on publicly available datasets under both user-POI setting and user-time-POI setting have been conducted to test the effectiveness of the proposed method. Experimental results under both settings show that the proposed method outperforms the state-of-the-art methods significantly in terms of recommendation accuracy.

340 citations


Proceedings ArticleDOI
13 Apr 2015
TL;DR: A novel solution to efficiently process a large number of TaSK queries over a stream of geotextual objects and the experimental results show that the solution is able to achieve a reduction of the processing time by 70-80% compared with two baselines.
Abstract: Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date tweets such that their locations are close to a user specified location and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “food poisoning vomiting.” We consider the Temporal Spatial-Keyword Top-k Subscription (TaSK) query. Given a TaSK query, we continuously maintain up-to-date top-k most relevant results over a stream of geo-textual objects (e.g., geo-tagged Tweets) for the query. The TaSK query takes into account text relevance, spatial proximity, and recency of geo-textual objects in evaluating its relevance with a geo-textual object. We propose a novel solution to efficiently process a large number of TaSK queries over a stream of geotextual objects. We evaluate the efficiency of our approach on two real-world datasets and the experimental results show that our solution is able to achieve a reduction of the processing time by 70–80% compared with two baselines.

135 citations


Proceedings ArticleDOI
13 Apr 2015
TL;DR: This paper proposes a general graph-based model, called HeteRS, to solve the three recommendation problems on EBSNs in one framework, and proposes a learning scheme to set the influence weights between different types of entities.
Abstract: Event-based social networks (EBSNs), such as Meetup and Plancast, which offer platforms for users to plan, arrange, and publish events, have gained increasing popularity and rapid growth. EBSNs capture not only the online social relationship, but also the offline interactions from offline events. They contain rich heterogeneous information, including multiple types of entities, such as users, events, groups and tags, and their interaction relations. Three recommendation tasks, namely recommending groups to users, recommending tags to groups, and recommending events to users, have been explored in three separate studies. However, none of the proposed methods can handle all the three recommendation tasks. In this paper, we propose a general graph-based model, called HeteRS, to solve the three recommendation problems on EBSNs in one framework. Our method models the rich information with a heterogeneous graph and considers the recommendation problem as a query-dependent node proximity problem. To address the challenging issue of weighting the influences between different types of entities, we propose a learning scheme to set the influence weights between different types of entities. Experimental results on two real-world datasets demonstrate that our proposed method significantly outperforms the state-of-the-art methods for all the three recommendation tasks, and the learned influence weights help understanding user behaviors.

100 citations


Proceedings ArticleDOI
27 May 2015
TL;DR: This paper proves that the problem of answering mCK queries is NP-hard, and proposes an exact algorithm that utilizes the group found by the 2 over √3 + ε)-approximation algorithm to obtain the optimal group.
Abstract: As an important type of spatial keyword query, the m-closest keywords (mCK) query finds a group of objects such that they cover all query keywords and have the smallest diameter, which is defined as the largest distance between any pair of objects in the group The query is useful in many applications such as detecting locations of web resources However, the existing work does not study the intractability of this problem and only provides exact algorithms, which are computationally expensive In this paper, we prove that the problem of answering mCK queries is NP-hard We first devise a greedy algorithm that has an approximation ratio of 2 Then, we observe that an mCK query can be approximately answered by finding the circle with the smallest diameter that encloses a group of objects together covering all query keywords We prove that the group enclosed in the circle can answer the mCK query with an approximation ratio of 2 over 3 Based on this, we develop an algorithm for finding such a circle exactly, which has a high time complexity To improve efficiency, we propose another two algorithms that find such a circle approximately, with a ratio of 2 over √3 + e Finally, we propose an exact algorithm that utilizes the group found by the 2 over √3 + e)-approximation algorithm to obtain the optimal group We conduct extensive experiments using real-life datasets The experimental results offer insights into both efficiency and accuracy of the proposed approximation algorithms, and the results also demonstrate that our exact algorithm outperforms the best known algorithm by an order of magnitude

88 citations


Journal ArticleDOI
Quan Yuan1, Gao Cong1, Kaiqi Zhao1, Zongyang Ma1, Aixin Sun1 
TL;DR: Experimental results on two real-world datasets show that the proposed model is effective in discovering users’ spatial-temporal topics and significantly outperforms state-of-the-art baselines for various tasks including location prediction for tweets and requirement-aware location recommendation.
Abstract: Micro-blogging services and location-based social networks, such as Twitter, Weibo, and Foursquare, enable users to post short messages with timestamps and geographical annotations. The rich spatial-temporal-semantic information of individuals embedded in these geo-annotated short messages provides exciting opportunity to develop many context-aware applications in ubiquitous computing environments. Example applications include contextual recommendation and contextual search. To obtain accurate recommendations and most relevant search results, it is important to capture users’ contextual information (e.g., time and location) and to understand users’ topical interests and intentions. While time and location can be readily captured by smartphones, understanding user’s interests and intentions calls for effective methods in modeling user mobility behavior. Here, user mobility refers to who visits which place at what time for what activity. That is, user mobility behavior modeling must consider user (Who), spatial (Where), temporal (When), and activity (What) aspects. Unfortunately, no previous studies on user mobility behavior modeling have considered all of the four aspects jointly, which have complex interdependencies. In our preliminary study, we propose the first solution named W4 (short for Who, Where, When, and What) to discover user mobility behavior from the four aspects. In this article, we further enhance W4 and propose a nonparametric Bayesian model named EW4 (short for Enhanced W4). EW4 requires no parameter tuning and achieves better results over W4 in our experiments. Given some of the four aspects of a user (e.g., time), our model is able to infer information of the other aspects (e.g., location and topical words). Thus, our model has a variety of context-aware applications, particularly in contextual search and recommendation. Experimental results on two real-world datasets show that the proposed model is effective in discovering users’ spatial-temporal topics. The model also significantly outperforms state-of-the-art baselines for various tasks including location prediction for tweets and requirement-aware location recommendation.

84 citations


Proceedings ArticleDOI
13 Apr 2015
TL;DR: A unified probabilistic model is proposed to capture two types of user preferences to POIs: topical-region preference and category aware topical-aspect preference and it is shown that the model achieves significant improvement in POI recommendation and user recommendation in comparison to the state-of-the-art methods.
Abstract: Many location based services, such as FourSquare, Yelp, TripAdvisor, Google Places, etc, allow users to compose reviews or tips on points of interest (POIs), each having a geographical coordinates These services have accumulated a large amount of such geo-tagged review data, which allows deep analysis of user preferences in POIs This paper studies two types of user preferences to POIs: topical-region preference and category aware topical-aspect preference We propose a unified probabilistic model to capture these two preferences simultaneously In addition, our model is capable of capturing the interaction of different factors, including topical aspect, sentiment, and spatial information The model can be used in a number of applications, such as POI recommendation and user recommendation, among others In addition, the model enables us to investigate whether people like an aspect of a POI or whether people like a topical aspect of some type of POIs (eg, bars) in a region, which offer explanation for recommendations Experiments on real world datasets show that the model achieves significant improvement in POI recommendation and user recommendation in comparison to the state-of-the-art methods We also propose an efficient online recommendation algorithm based on our model, which saves up to 90% computation time

69 citations


Proceedings Article
25 Jan 2015
TL;DR: This paper proposes a semi-supervised learning approach to categorizing intent tweets into six categories, namely Food & Drink, Travel, Career & Education, Goods & Services, Event & Activities and Trifle, and shows that the approach is effective in inferring intent categories for tweets.
Abstract: In this paper, we propose to study the problem of identifying and classifying tweets into intent categories. For example, a tweet "I wanna buy a new car" indicates the user's intent for buying a car. Identifying such intent tweets will have great commercial value among others. In particular, it is important that we can distinguish different types of intent tweets. We propose to classify intent tweets into six categories, namely Food & Drink, Travel, Career & Education, Goods & Services, Event & Activities and Trifle. We propose a semi-supervised learning approach to categorizing intent tweets into the six categories. We construct a test collection by using a bootstrap method. Our experimental results show that our approach is effective in inferring intent categories for tweets.

67 citations


Journal ArticleDOI
TL;DR: This work solves the problems of retrieving top-k groups of three instantiations of the problem of retrieving a group of spatio-textual objects such that the group's keywords cover the query's keywords andsuch that the objects are nearest to the query location and have the smallest inter-object distances.
Abstract: With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group together satisfy a query.We define the problem of retrieving a group of spatio-textual objects such that the group's keywords cover the query's keywords and such that the objects are nearest to the query location and have the smallest inter-object distances. Specifically, we study three instantiations of this problem, all of which are NP-hard. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. In addition, we solve the problems of retrieving top-k groups of three instantiations, and study a weighted version of the problem that incorporates object weights. We present empirical studies that offer insight into the efficiency of the solutions, as well as the accuracy of the approximate solutions.

64 citations


Proceedings ArticleDOI
27 May 2015
TL;DR: This work proposes a novel solution to efficiently processing a large number of DAS queries over a stream of documents and demonstrates the efficiency of the approach on real-world dataset and experimental results show that the solution is able to achieve a reduction of the processing time by 60--75% compared with two baselines.
Abstract: Massive amount of text data are being generated by a huge number of web users at an unprecedented scale. These data cover a wide range of topics. Users are interested in receiving a few up-to-date representative documents (e.g., tweets) that can provide them with a wide coverage of different aspects of their query topics. To address the problem, we consider the Diversity-Aware Top-k Subscription (DAS) query. Given a DAS query, we continuously maintain an up-to-date result set that contains k most recently returned documents over a text stream for the query. The DAS query takes into account text relevance, document recency, and result diversity. We propose a novel solution to efficiently processing a large number of DAS queries over a stream of documents. We demonstrate the efficiency of our approach on real-world dataset and the experimental results show that our solution is able to achieve a reduction of the processing time by 60--75% compared with two baselines. We also study the effectiveness of the DAS query.

51 citations


Proceedings Article
25 Jan 2015
TL;DR: A new POI recommendation problem, namely top-K location category basedPOI recommendation, is formulated by introducing information coverage to encode the location categories of POIs in a city by developing a greedy algorithm and further optimization to solve this challenging problem.
Abstract: Point-of-interest (POI) recommendation becomes a valuable service in location-based social networks. Based on the norm that similar users are likely to have similar preference of POIs, the current recommendation techniques mainly focus on users' preference to provide accurate recommendation results. This tends to generate a list of homogeneous POIs that are clustered into a narrow band of location categories (like food, museum, etc.) in a city. However, users are more interested to taste a wide range of flavors that are exposed in a global set of location categories in the city. In this paper, we formulate a new POI recommendation problem, namely top-K location category based POI recommendation, by introducing information coverage to encode the location categories of POIs in a city. The problem is NP-hard. We develop a greedy algorithm and further optimization to solve this challenging problem. The experimental results on two real-world datasets demonstrate the utility of new POI recommendations and the superior performance of the proposed algorithms.

Proceedings Article
25 Jan 2015
TL;DR: Although the problem is NP-hard, the influence spread functions are monotonic and submodular, enabling fast approximations on top of an innovative stochastic model checking approach, and the model finds higher quality solutions and the algorithm outperforms state-of-art alternatives.
Abstract: Studying the spread of phenomena in social networks is critical but still not fully solved Existing influence maximization models assume a static network, disregarding its evolution over time We introduce the continuous time constrained influence maximization problem for dynamic diffusion networks, based on a novel diffusion model called DYNA DIFFUSE Although the problem is NP-hard, the influence spread functions are monotonic and submodular, enabling fast approximations on top of an innovative stochastic model checking approach Experiments on real social network data show that our model finds higher quality solutions and our algorithm outperforms state-of-art alternatives

Proceedings Article
25 Jan 2015
TL;DR: A Tri-Role Topic Model (TRTM) is proposed to model the tri-roles of users and the activities of each role including composing question, selecting question to answer, contributing and voting answers, which outperforms state-of-the-art methods on nDCG.
Abstract: Stack Overflow and MedHelp are examples of domain-specific community-based question answering (CQA) systems. Different from CQA systems for general topics (e.g., Yahoo! Answers, Baidu Knows), questions and answers in domain-specific CQA systems are mostly in the same topical domain, enabling more comprehensive interaction between users on fine-grained topics. In such systems, users are more likely to ask questions on unfamiliar topics and to answer questions matching their expertise. Users can also vote answers based on their judgements. In this paper, we propose a Tri-Role Topic Model (TRTM) to model the tri-roles of users (i.e., as askers, answerers, and voters, respectively) and the activities of each role including composing question, selecting question to answer, contributing and voting answers. The proposed model can be used to enhance CQA systems from many perspectives. As a case study, we conducted experiments on ranking answers for questions on Stack Overflow, a CQA system for professional and enthusiast programmers. Experimental results show that TRTM is effective in facilitating users getting ideal rankings of answers, particularly for new and less popular questions. Evaluated on nDCG, TRTM outperforms state-of-the-art methods.

Proceedings ArticleDOI
17 Oct 2015
TL;DR: This work proposes to model the mapping problem as a ranking problem, and develop a method to learn a ranking function by exploiting the textual, visual and user information of photos, and proposes three subobjectives for learning the parameters of the proposed ranking function.
Abstract: Instagram, an online photo-sharing platform, has gained increasing popularity. It allows users to take photos, apply digital filters and share them with friends instantaneously by using mobile devices.Instagram provides users with the functionality to associate their photos with points of interest, and it thus becomes feasible to study the association between points of interest and Instagram photos. However, no previous work studies the association. In this paper, we propose to study the problem of mapping Instagram photos to points of interest. To understand the problem, we analyze Instagram datasets, and report our findings, which also characterize the challenges of the problem. To address the challenges, we propose to model the mapping problem as a ranking problem, and develop a method to learn a ranking function by exploiting the textual, visual and user information of photos. To maximize the prediction effectiveness for textual and visual information, and incorporate the users' visiting preferences, we propose three subobjectives for learning the parameters of the proposed ranking function. Experimental results on two sets of Instagram data show that the proposed method substantially outperforms existing methods that are adapted to handle the problem.

Journal ArticleDOI
TL;DR: A corpus statistics association measure is employed to quantify the pairwise word dependencies and a generalized association-based unified framework to identify features, including explicit and implicit features, and opinion words from reviews is proposed.
Abstract: Mining features and opinion words is essential for fine-grained opinion analysis of customer reviews. It is observed that semantic dependencies naturally exist between features and opinion words, even among features or opinion words themselves. In this article, we employ a corpus statistics association measure to quantify the pairwise word dependencies and propose a generalized association-based unified framework to identify features, including explicit and implicit features, and opinion words from reviews. We first extract explicit features and opinion words via an association-based bootstrapping method (ABOOT). ABOOT starts with a small list of annotated feature seeds and then iteratively recognizes a large number of domain-specific features and opinion words by discovering the corpus statistics association between each pair of words on a given review domain. Two instances of this ABOOT method are evaluated based on two particular association models, likelihood ratio tests (LRTs) and latent semantic analysis (LSA). Next, we introduce a natural extension to identify implicit features by employing the recognized known semantic correlations between features and opinion words. Experimental results illustrate the benefits of the proposed association-based methods for identifying features and opinion words versus benchmark methods.

Proceedings ArticleDOI
03 Nov 2015
TL;DR: To design efficient algorithms, for the first time, an ideal case is theoretically analyzed, which minimizes the object/index node accesses, for processing reverse spatial-keyword nearest neighbor queries and novel search algorithms are designed for efficiently answering the queries.
Abstract: With the proliferation of local services and GPS-enabled mobile phones, reverse spatial-keyword Nearest Neighbor queries are becoming an important type of query. Given a service object (e.g., shop) q as the query, which has a location and a text description, we return customers such that q is one of top-k spatial-keyword relevant service objects for each result customer. Existing algorithms for answering reverse nearest neighbor queries cannot be used for processing reverse spatial-keyword nearest neighbor queries due to the additional text information. To design efficient algorithms, for the first time we theoretically analyze an ideal case, which minimizes the object/index node accesses, for processing reverse spatial-keyword nearest neighbor queries. Under the derived theoretical guidelines, we design novel search algorithms for efficiently answering the queries. Empirical studies show that the proposed algorithms offer scalability and are orders of magnitude faster than existing methods for reverse spatial-keyword nearest neighbor queries.