scispace - formally typeset
Search or ask a question
Author

Gao Cong

Other affiliations: Microsoft, Aalborg University, National University of Singapore  ...read more
Bio: Gao Cong is an academic researcher from Nanyang Technological University. The author has contributed to research in topics: Graph (abstract data type) & Query optimization. The author has an hindex of 57, co-authored 218 publications receiving 11650 citations. Previous affiliations of Gao Cong include Microsoft & Aalborg University.


Papers
More filters
Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper defines a new problem, namely, the time-aware POI recommendation, to recommend POIs for a given user at a specified time in a day, and develops a collaborative recommendation model that is able to incorporate temporal information.
Abstract: The availability of user check-in data in large volume from the rapid growing location based social networks (LBSNs) enables many important location-aware services to users. Point-of-interest (POI) recommendation is one of such services, which is to recommend places where users have not visited before. Several techniques have been recently proposed for the recommendation service. However, no existing work has considered the temporal information for POI recommendations in LBSNs. We believe that time plays an important role in POI recommendations because most users tend to visit different places at different time in a day, \eg visiting a restaurant at noon and visiting a bar at night. In this paper, we define a new problem, namely, the time-aware POI recommendation, to recommend POIs for a given user at a specified time in a day. To solve the problem, we develop a collaborative recommendation model that is able to incorporate temporal information. Moreover, based on the observation that users tend to visit nearby POIs, we further enhance the recommendation model by considering geographical information. Our experimental results on two real-world datasets show that the proposed approach outperforms the state-of-the-art POI recommendation methods substantially.

731 citations

Journal ArticleDOI
01 Aug 2009
TL;DR: A new indexing framework for location-aware top-k text retrieval that encompasses algorithms that utilize the proposed indexes for computing the top- k query, thus taking into account both text relevancy and location proximity to prune the search space.
Abstract: The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and text relevancy. To our knowledge, only naive techniques exist that are capable of computing a general web information retrieval query while also taking location into account.This paper proposes a new indexing framework for location-aware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within the framework. The framework encompasses algorithms that utilize the proposed indexes for computing the top-k query, thus taking into account both text relevancy and location proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper's proposal offers scalability and is capable of excellent performance.

585 citations

Proceedings ArticleDOI
25 Jul 2010
TL;DR: Empirical studies on a large real-world mobile social network show that this algorithm is more than an order of magnitudes faster than the state-of-the-art Greedy algorithm for finding top-K influential nodes and the error of the approximate algorithm is small.
Abstract: With the proliferation of mobile devices and wireless technologies, mobile social network systems are increasingly available. A mobile social network plays an essential role as the spread of information and influence in the form of "word-of-mouth". It is a fundamental issue to find a subset of influential individuals in a mobile social network such that targeting them initially (e.g. to adopt a new product) will maximize the spread of the influence (further adoptions of the new product). The problem of finding the most influential nodes is unfortunately NP-hard. It has been shown that a Greedy algorithm with provable approximation guarantees can give good approximation; However, it is computationally expensive, if not prohibitive, to run the greedy algorithm on a large mobile network. In this paper we propose a new algorithm called Community-based Greedy algorithm for mining top-K influential nodes. The proposed algorithm encompasses two components: 1) an algorithm for detecting communities in a social network by taking into account information diffusion; and 2) a dynamic programming algorithm for selecting communities to find influential nodes. We also provide provable approximation guarantees for our algorithm. Empirical studies on a large real-world mobile social network show that our algorithm is more than an order of magnitudes faster than the state-of-the-art Greedy algorithm for finding top-K influential nodes and the error of our approximate algorithm is small.

521 citations

Proceedings Article
25 Jul 2015
TL;DR: This paper proposes a personalized ranking metric embedding method (PRME) to model personalized check-in sequences and develops a PRME-G model, which integrates sequential information, individual preference, and geographical influence, to improve the recommendation performance.
Abstract: The rapidly growing of Location-based Social Networks (LBSNs) provides a vast amount of check-in data, which enables many services, e.g., point-of-interest (POI) recommendation. In this paper, we study the next new POI recommendation problem in which new POIs with respect to users' current location are to be recommended. The challenge lies in the difficulty in precisely learning users' sequential information and personalizing the recommendation model. To this end, we resort to the Metric Embedding method for the recommendation, which avoids drawbacks of the Matrix Factorization technique. We propose a personalized ranking metric embedding method (PRME) to model personalized check-in sequences. We further develop a PRME-G model, which integrates sequential information, individual preference, and geographical influence, to improve the recommendation performance. Experiments on two real-world LBSN datasets demonstrate that our new algorithm outperforms the state-of-the-art next POI recommendation methods.

373 citations

Proceedings Article
23 Sep 2007
TL;DR: This paper proposes two algorithms: one for automatically computing a repair D' that satisfies a given set of CFDs, and the other for incrementally finding a repair in response to updates to a clean database.
Abstract: Two central criteria for data quality are consistency and accuracy. Inconsistencies and errors in a database often emerge as violations of integrity constraints. Given a dirty database D, one needs automated methods to make it consistent, i.e., find a repair D' that satisfies the constraints and "minimally" differs from D. Equally important is to ensure that the automatically-generated repair D' is accurate, or makes sense, i.e., D' differs from the "correct" data within a predefined bound. This paper studies effective methods for improving both data consistency and accuracy. We employ a class of conditional functional dependencies (CFDs) proposed in [6] to specify the consistency of the data, which are able to capture inconsistencies and errors beyond what their traditional counterparts can catch. To improve the consistency of the data, we propose two algorithms: one for automatically computing a repair D' that satisfies a given set of CFDs, and the other for incrementally finding a repair in response to updates to a clean database. We show that both problems are intractable. Although our algorithms are necessarily heuristic, we experimentally verify that the methods are effective and efficient. Moreover, we develop a statistical method that guarantees that the repairs found by the algorithms are accurate above a predefined rate without incurring excessive user interaction.

354 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: The problem of finding the most influential nodes in a social network is NP-hard as mentioned in this paper, and the first provable approximation guarantees for efficient algorithms were provided by Domingos et al. using an analysis framework based on submodular functions.
Abstract: Models for the processes by which ideas and influence propagate through a social network have been studied in a number of domains, including the diffusion of medical and technological innovations, the sudden and widespread adoption of various strategies in game-theoretic settings, and the effects of "word of mouth" in the promotion of new products. Recently, motivated by the design of viral marketing strategies, Domingos and Richardson posed a fundamental algorithmic problem for such social network processes: if we can try to convince a subset of individuals to adopt a new product or innovation, and the goal is to trigger a large cascade of further adoptions, which set of individuals should we target?We consider this problem in several of the most widely studied models in social network analysis. The optimization problem of selecting the most influential nodes is NP-hard here, and we provide the first provable approximation guarantees for efficient algorithms. Using an analysis framework based on submodular functions, we show that a natural greedy strategy obtains a solution that is provably within 63% of optimal for several classes of models; our framework suggests a general approach for reasoning about the performance guarantees of algorithms for these types of influence problems in social networks.We also provide computational experiments on large collaboration networks, showing that in addition to their provable guarantees, our approximation algorithms significantly out-perform node-selection heuristics based on the well-studied notions of degree centrality and distance centrality from the field of social networks.

4,390 citations

01 Jan 2006
TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

2,591 citations

Journal ArticleDOI
01 Jan 1966
TL;DR: Koestler as mentioned in this paper examines the idea that we are at our most creative when rational thought is suspended, for example, in dreams and trancelike states, and concludes that "the act of creation is the most creative act in human history".
Abstract: While the study of psychology has offered little in the way of explaining the creative process, Koestler examines the idea that we are at our most creative when rational thought is suspended--for example, in dreams and trancelike states. All who read The Act of Creation will find it a compelling and illuminating book.

2,201 citations