scispace - formally typeset
Search or ask a question
Author

Minhao Jiang

Bio: Minhao Jiang is an academic researcher from Hong Kong University of Science and Technology. The author has contributed to research in topics: Optimal matching & Approximation algorithm. The author has an hindex of 6, co-authored 7 publications receiving 175 citations.

Papers
More filters
Journal ArticleDOI
01 Aug 2014
TL;DR: This work proposes to build an index for answering point-to-point distance querying for massive scale-free graphs based on a novel hop-doubling labeling technique, and derives bounds on the index size, the computation costs and I/O costs based on the properties of unweighted scale- free graphs.
Abstract: We study the problem of point-to-point distance querying for massive scale-free graphs, which is important for numerous applications. Given a directed or undirected graph, we propose to build an index for answering such queries based on a novel hop-doubling labeling technique. We derive bounds on the index size, the computation costs and I/O costs based on the properties of unweighted scale-free graphs. We show that our method is much more efficient and effective compared to the state-of-the-art techniques, in terms of both querying time and indexing costs. Our empirical study shows that our method can handle graphs that are orders of magnitude larger than existing methods.

67 citations

Proceedings ArticleDOI
27 May 2015
TL;DR: This paper proposes algorithms for top-k nearest keyword search that provide exact solutions and which handle networks of very large sizes and verified the performance of the solutions compared with the best-known approximation algorithms with experiments on real datasets.
Abstract: Top-k nearest keyword search has been of interest because of applications ranging from road network location search by keyword to search of information on an RDF repository. We consider the evaluation of a query with a given vertex and a keyword, and the problem is to find a set of $k$ nearest vertices that contain the keyword. The known algorithms for handling this problem only give approximate answers. In this paper, we propose algorithms for top-k nearest keyword search that provide exact solutions and which handle networks of very large sizes. We have also verified the performance of our solutions compared with the best-known approximation algorithms with experiments on real datasets.

53 citations

Journal ArticleDOI
01 May 2017
TL;DR: A random walk based indexing scheme to compute SimRank efficiently and accurately over large dynamic graphs is proposed and it is shown that the algorithm outperforms the state-of-the-art static and dynamic SimRank algorithms.
Abstract: Similarity among entities in graphs plays a key role in data analysis and mining. SimRank is a widely used and popular measurement to evaluate the similarity among the vertices. In real-life applications, graphs do not only grow in size, requiring fast and precise SimRank computation for large graphs, but also change and evolve continuously over time, demanding an efficient maintenance process to handle dynamic updates. In this paper, we propose a random walk based indexing scheme to compute SimRank efficiently and accurately over large dynamic graphs. We show that our algorithm outperforms the state-of-the-art static and dynamic SimRank algorithms.

43 citations

Proceedings ArticleDOI
22 Jun 2013
TL;DR: This paper proposes a new problem called Spatial Matching for Minimizing Maximum matching distance (SPM-MM), designs two algorithms for SPM-MM, Threshold-Adapt and Swap-Chain, and conducts extensive empirical studies which verified the efficiency and scalability of Swap- Chain.
Abstract: Bichromatic reverse nearest neighbor (BRNN) queries have been studied extensively in the literature of spatial databases. Given a set P of service-providers and a set O of customers, a BRNN query is to find which customers in O are "interested" in a given service-provider in P. Recently, it has been found that this kind of queries lacks the consideration of the capacities of service-providers and the demands of customers. In order to address this issue, some spatial matching problems have been proposed, which, however, cannot be used for some real-life applications like emergency facility allocation where the maximum matching cost (or distance) should be minimized. In this paper, we propose a new problem called Spatial Matching for Minimizing Maximum matching distance (SPM-MM). Then, we design two algorithms for SPM-MM, Threshold-Adapt and Swap-Chain. Threshold-Adapt is simple and easy to understand but not scalable to large datasets due to its relatively high time/space complexity. Swap-Chain, which follows a fundamentally different idea from Threshold-Adapt, runs faster than Threshold-Adapt by orders of magnitude and uses significantly less memory. We conducted extensive empirical studies which verified the efficiency and scalability of Swap-Chain.

25 citations

Posted Content
TL;DR: In this article, a hop-doubling labeling technique is proposed to build an index for answering point-to-point distance queries on directed or undirected graphs, and bounds on the index size, the computation costs and I/O costs based on the properties of unweighted scale-free graphs are derived.
Abstract: We study the problem of point-to-point distance querying for massive scale-free graphs, which is important for numerous applications. Given a directed or undirected graph, we propose to build an index for answering such queries based on a hop-doubling labeling technique. We derive bounds on the index size, the computation costs and I/O costs based on the properties of unweighted scale-free graphs. We show that our method is much more efficient compared to the state-of-the-art technique, in terms of both querying time and indexing time. Our empirical study shows that our method can handle graphs that are orders of magnitude larger than existing methods.

15 citations


Cited by
More filters
Proceedings ArticleDOI
16 May 2016
TL;DR: This paper identifies a more practical micro-task allocation problem, called the Global Online Micro-task Allocation in spatial crowdsourcing (GOMA) problem, and proposes a two-phase-based framework, based on which the TGOA algorithm with 1 over 4 -competitive ratio under the online random order model is presented.
Abstract: With the rapid development of smartphones, spatial crowdsourcing platforms are getting popular. A foundational research of spatial crowdsourcing is to allocate micro-tasks to suitable crowd workers. Most existing studies focus on offline scenarios, where all the spatiotemporal information of micro-tasks and crowd workers is given. However, they are impractical since micro-tasks and crowd workers in real applications appear dynamically and their spatiotemporal information cannot be known in advance. In this paper, to address the shortcomings of existing offline approaches, we first identify a more practical micro-task allocation problem, called the Global Online Micro-task Allocation in spatial crowdsourcing (GOMA) problem. We first extend the state-of-art algorithm for the online maximum weighted bipartite matching problem to the GOMA problem as the baseline algorithm. Although the baseline algorithm provides theoretical guarantee for the worst case, its average performance in practice is not good enough since the worst case happens with a very low probability in real world. Thus, we consider the average performance of online algorithms, a.k.a online random order model.We propose a two-phase-based framework, based on which we present the TGOA algorithm with 1 over 4 -competitive ratio under the online random order model. To improve its efficiency, we further design the TGOA-Greedy algorithm following the framework, which runs faster than the TGOA algorithm but has lower competitive ratio of 1 over 8. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments on real and synthetic datasets.

271 citations

Proceedings ArticleDOI
23 Apr 2018
TL;DR: VERtex Similarity Embeddings (VERSE), a simple, versatile, and memory-efficient method that derives graph embeddings explicitly calibrated to preserve the distributions of a selected vertex-to-vertex similarity measure, is proposed.
Abstract: Embedding a web-scale information network into a low-dimensional vector space facilitates tasks such as link prediction, classification, and visualization. Past research has addressed the problem of extracting such embeddings by adopting methods from words to graphs, without defining a clearly comprehensible graph-related objective. Yet, as we show, the objectives used in past works implicitly utilize similarity measures among graph nodes. In this paper, we carry the similarity orientation of previous works to its logical conclusion; we propose VERtex Similarity Embeddings (VERSE), a simple, versatile, and memory-efficient method that derives graph embeddings explicitly calibrated to preserve the distributions of a selected vertex-to-vertex similarity measure. VERSE learns such embeddings by training a single-layer neural network. While its default, scalable version does so via sampling similarity information, we also develop a variant using the full information per vertex. Our experimental study on standard benchmarks and real-world datasets demonstrates that VERSE, instantiated with diverse similarity measures, outperforms state-of-the-art methods in terms of precision and recall in major data mining tasks and supersedes them in time and space efficiency, while the scalable sampling-based variant achieves equally good result as the non-scalable full variant.

242 citations

Journal ArticleDOI
01 Jan 2020
TL;DR: A comprehensive and systematic review of existing research on four core algorithmic issues in spatial crowdsourcing: (1) task assignment, (2) quality control, (3) incentive mechanism design, and (4) privacy protection.
Abstract: Crowdsourcing is a computing paradigm where humans are actively involved in a computing task, especially for tasks that are intrinsically easier for humans than for computers. Spatial crowdsourcing is an increasing popular category of crowdsourcing in the era of mobile Internet and sharing economy, where tasks are spatiotemporal and must be completed at a specific location and time. In fact, spatial crowdsourcing has stimulated a series of recent industrial successes including sharing economy for urban services (Uber and Gigwalk) and spatiotemporal data collection (OpenStreetMap and Waze). This survey dives deep into the challenges and techniques brought by the unique characteristics of spatial crowdsourcing. Particularly, we identify four core algorithmic issues in spatial crowdsourcing: (1) task assignment, (2) quality control, (3) incentive mechanism design, and (4) privacy protection. We conduct a comprehensive and systematic review of existing research on the aforementioned four issues. We also analyze representative spatial crowdsourcing applications and explain how they are enabled by these four technical issues. Finally, we discuss open questions that need to be addressed for future spatial crowdsourcing research and applications.

185 citations

Journal ArticleDOI
01 Aug 2018
TL;DR: A new system GraphS is presented to efficiently detect constrained cycles in a dynamic graph, which is changing constantly, and return the satisfying cycles in real-time, to greatly speed-up query time and achieve high system throughput.
Abstract: As graph data is prevalent for an increasing number of Internet applications, continuously monitoring structural patterns in dynamic graphs in order to generate real-time alerts and trigger prompt actions becomes critical for many applications In this paper, we present a new system GraphS to efficiently detect constrained cycles in a dynamic graph, which is changing constantly, and return the satisfying cycles in real-time A hot point based index is built and efficiently maintained for each query so as to greatly speed-up query time and achieve high system throughput The GraphS system is developed at Alibaba to actively monitor various online fraudulent activities based on cycle detection For a dynamic graph with hundreds of millions of edges and vertices, the system is capable to cope with a peak rate of tens of thousands of edge updates per second and find all the cycles with predefined constraints with a 999% latency of 20 milliseconds

108 citations

Journal ArticleDOI
TL;DR: A comprehensive service framework, called BCloud-IFog, which consists of blind cloud servers and intelligent fog servers, and an Outsourced Real-time Route Planning (OR2P) scheme, where the search index is built as a G*-tree structure and each G-tree leaf node is split into a set of non-confidential outsourced graphs.

97 citations