scispace - formally typeset
Search or ask a question

Showing papers by "Gao Cong published in 2019"


Proceedings ArticleDOI
08 Apr 2019
TL;DR: NeuTraj is generic to accommodate any existing trajectory measure and fast to compute the similarity of a given trajectory pair in linear time, and obtains 50x-1000x speed up over bruteforce methods and 3x-500x speedup over existing approximate algorithms, while yielding more accurate approximations of the similarity functions.
Abstract: Trajectory similarity computation is a fundamental problem for various applications in trajectory data analysis. However, the high computation cost of existing trajectory similarity measures has become the key bottleneck for trajectory analysis at scale. While there have been many research efforts for reducing the complexity, they are specific to one similarity measure and often yield limited speedups. We propose NeuTraj to accelerate trajectory similarity computation. NeuTraj is generic to accommodate any existing trajectory measure and fast to compute the similarity of a given trajectory pair in linear time. Furthermore, NeuTraj is elastic to collaborate with all spatial-based trajectory indexing methods to reduce the search space. NeuTraj samples a number of seed trajectories from the given database, and then uses their pair-wise similarities as guidance to approximate the similarity function with a neural metric learning framework. NeuTraj features two novel modules to achieve accurate approximation of the similarity function: (1) a spatial attention memory module that augments existing recurrent neural networks for trajectory encoding; and (2) a distance-weighted ranking loss that effectively transcribes information from the seed-based guidance. With these two modules, NeuTraj can yield high accuracies and fast convergence rates even if the training data is small. Our experiments on two real-life datasets show that NeuTraj achieves over 80% accuracy on Fre chet, Hausdorff, ERP and DTW measures, which outperforms state-of-the-art baselines consistently and significantly. It obtains 50x-1000x speedup over bruteforce methods and 3x-500x speedup over existing approximate algorithms, while yielding more accurate approximations of the similarity functions.

74 citations


Proceedings ArticleDOI
13 May 2019
TL;DR: A deep generative model to learn the travel time distribution for any route by conditioning on the real-time traffic, which produces substantially better results than state-of-the-art alternatives in two tasks: travel time estimation and route recovery from sparse trajectory data.
Abstract: Travel time estimation of a given route with respect to real-time traffic condition is extremely useful for many applications like route planning. We argue that it is even more useful to estimate the travel time distribution, from which we can derive the expected travel time as well as the uncertainty. In this paper, we develop a deep generative model - DeepGTT - to learn the travel time distribution for any route by conditioning on the real-time traffic. DeepGTT interprets the generation of travel time using a three-layer hierarchical probabilistic model. In the first layer, we present two techniques, amortization and spatial smoothness embeddings, to share statistical strength among different road segments; a convolutional neural net based representation learning component is also proposed to capture the dynamically changing real-time traffic condition. In the middle layer, a nonlinear factorization model is developed to generate auxiliary random variable i.e., speed. The introduction of this middle layer separates the statical spatial features from the dynamically changing real-time traffic conditions, allowing us to incorporate the heterogeneous influencing factors into a single model. In the last layer, an attention mechanism based function is proposed to collectively generate the observed travel time. DeepGTT describes the generation process in a reasonable manner, and thus it not only produces more accurate results but also is more efficient. On a real-world large-scale data set, we show that DeepGTT produces substantially better results than state-of-the-art alternatives in two tasks: travel time estimation and route recovery from sparse trajectory data.

50 citations


Proceedings ArticleDOI
18 Jul 2019
TL;DR: This paper proposes Medley of Sub-Attention Networks (MoSAN), a new novel neural architecture for the group recommendation task that not only achieves state-of-the-art performance but also improves standard baselines by a considerable margin.
Abstract: This paper proposes Medley of Sub-Attention Networks (MoSAN), a new novel neural architecture for the group recommendation task. Group-level recommendation is known to be a challenging task, in which intricate group dynamics have to be considered. As such, this is to be contrasted with the standard recommendation problem where recommendations are personalized with respect to a single user. Our proposed approach hinges upon the key intuition that the decision making process (in groups) is generally dynamic, i.e., a user's decision is highly dependent on the other group members. All in all, our key motivation manifests in a form of an attentive neural model that captures fine-grained interactions between group members. In our MoSAN model, each sub-attention module is representative of a single member, which models a user's preference with respect to all other group members. Subsequently, a Medley of Sub-Attention modules is then used to collectively make the group's final decision. Overall, our proposed model is both expressive and effective. Via a series of extensive experiments, we show that MoSAN not only achieves state-of-the-art performance but also improves standard baselines by a considerable margin.

46 citations


Proceedings ArticleDOI
25 Jul 2019
TL;DR: A deep learning approach to learn the representations of sports plays, called play2vec, is proposed, which is robust against noise and takes only linear time to compute the similarity between two sports plays.
Abstract: With the proliferation of commercial tracking systems, sports data is being generated at an unprecedented speed and the interest in sports play retrieval has grown dramatically as well. However, it is challenging to design an effective, efficient and robust similarity measure for sports play retrieval. To this end, we propose a deep learning approach to learn the representations of sports plays, called play2vec, which is robust against noise and takes only linear time to compute the similarity between two sports plays. We conduct experiments on real-world soccer match data, and the results show that our solution performs more effectively and efficiently compared with the state-of-the-art methods.

33 citations


Journal ArticleDOI
01 Jul 2019
TL;DR: This work formalizes and study a new problem called the attribute-aware similar region search (ASRS) problem, and proposes a novel algorithm called DS-Search to find the most similar region of the same size.
Abstract: With the proliferation of mobile devices and location-based services, increasingly massive volumes of geo-tagged data are becoming available. This data typically also contains non-location information. We study how to use such information to characterize a region and then how to find a region of the same size and with the most similar characteristics. This functionality enables a user to identify regions that share characteristics with a user-supplied region that the user is familiar with and likes. More specifically, we formalize and study a new problem called the attribute-aware similar region search (ASRS) problem. We first define so-called composite aggregators that are able to express aspects of interest in terms of the information associated with a user-supplied region. When applied to a region, an aggregator captures the region's relevant characteristics. Next, given a query region and a composite aggregator, we propose a novel algorithm called DS-Search to find the most similar region of the same size. Unlike any previous work on region search, DS-Search repeatedly discretizes and splits regions until an split region either satisfies a drop condition or it is guaranteed to not contribute to the result. In addition, we extend DS-Search to solve the ASRS problem approximately. Finally, we report on extensive empirical studies that offer insight into the efficiency and effectiveness of the paper's proposals.

13 citations


Journal ArticleDOI
01 Oct 2019
TL;DR: It is proved that answering spatial pattern matching queries is computationally intractable and proposed algorithms to address two related problems of the SPM are highly effective and efficient.
Abstract: In this paper, we study the spatial pattern matching (SPM) query. Given a set D of spatial objects (e.g., houses and shops), each with a textual description, we aim at finding all combinations of objects from D that match a user-defined spatial patternP. A pattern P is a graph whose vertices represent spatial objects, and edges denote distance relationships between them. The SPM query returns the instances that satisfy P. An example of P can be “a house within 10-min walk from a school, which is at least 2 km away from a hospital.” The SPM query can benefit users such as house buyers, urban planners, and archeologists. We prove that answering such queries is computationally intractable and propose two efficient algorithms for their evaluation. Moreover, we study efficient solutions to address two related problems of the SPM: (1) find top-k matches that are close to a query location and (2) return partial matches for a query pattern. Experiments and case studies on real datasets show that our proposed solutions are highly effective and efficient.

9 citations


Journal ArticleDOI
01 Feb 2019
TL;DR: A novel framework equipped by a generative model for mining topics and market competition, an Octree-based off-line pre-training method for the model and an efficient algorithm for combining pre-trained models to return the topics andMarket competition on each topic within a user-specified pair of region and time span is proposed.
Abstract: With the prominence of location-based services and social networks in recent years, huge amounts of spatio-temporal document collections (e.g., geo-tagged tweets) have been generated. These data collections often imply user’s ideas on different products and thus are helpful for business owners to explore hot topics of their brands and the competition relation to other brands in different spatial regions during different periods. In this work, we aim to mine the topics and the market competition of different brands over each topic for a category of business (e.g., coffeehouses) from spatio-temporal documents within a user-specified region and time period. To support such spatio-temporal search online in an exploratory manner, we propose a novel framework equipped by (1) a generative model for mining topics and market competition, (2) an Octree-based off-line pre-training method for the model and (3) an efficient algorithm for combining pre-trained models to return the topics and market competition on each topic within a user-specified pair of region and time span. Extensive experiments show that our framework is able to improve the runtime by up to an order of magnitude compared with baselines while achieving similar model quality in terms of training log-likelihood.

4 citations


Book ChapterDOI
01 Jan 2019