scispace - formally typeset
Journal ArticleDOI

PathSim: meta path-based top-K similarity search in heterogeneous information networks

Reads0
Chats0
TLDR
Under the meta path framework, a novel similarity measure called PathSim is defined that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures.
Abstract
Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks.In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept of meta path-based similarity, where a meta path is a path consisting of a sequence of relations defined between different object types (i.e., structural paths at the meta level). No matter whether a user would like to explicitly specify a path combination given sufficient domain knowledge, or choose the best path by experimental trials, or simply provide training examples to learn it, meta path forms a common base for a network-based similarity search engine. In particular, under the meta path framework we define a novel similarity measure called PathSim that is able to find peer objects in the network (e.g., find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. In order to support fast online query processing for PathSim queries, we develop an efficient solution that partially materializes short meta paths and then concatenates them online to compute top-k results. Experiments on real data sets demonstrate the effectiveness and efficiency of our proposed paradigm.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Vulnerability Risk Assessment Method Based on Heterogeneous Information Network

TL;DR: A ranking method based on the heterogeneous information network is innovatively proposed to assess the vulnerability risk in a specific network and shows that it can accurately assess the risk of vulnerabilities in aspecific network environment and has a lower computational complexity than other methods.
Journal ArticleDOI

Proximity-aware heterogeneous information network embedding

TL;DR: A novel framework named Proximity-Aware Heterogeneous Information Network Embedding (PAHINE), where the native information of a network is extracted from node sequences, which are generated by walking on a probability-sensitive metagraph and fed into deep neural networks to derive the desired embedding vectors.
Journal ArticleDOI

DPRel: A Meta-Path Based Relevance Measure for Mining Heterogeneous Networks

TL;DR: This paper proposes a meta-path based semi-metric measure for relevance measurement on objects in a general heterogeneous network with a specified network schema that incorporates path semantics by following the specified meta- path.
Journal ArticleDOI

Meta-path-based outlier detection in heterogeneous information network

TL;DR: This paper proposes a meta-path-based outlier detection method (MPOutliers) in heterogeneous information network to deal with problems in one go under a unified framework and calculates the heterogeneous reachable probability by combining different types of objects and their relationships.
Proceedings ArticleDOI

KADetector: Automatic Identification of Key Actors in Online Hack Forums Based on Structured Heterogeneous Information Network

TL;DR: This paper proposes and develops an intelligent system to automate the analysis of Hack Forums for the identification of its key actors who play the vital role in the value chain, and is the first work to use structured HIN for underground participant analysis.
References
More filters
Journal ArticleDOI

Normalized cuts and image segmentation

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Proceedings ArticleDOI

Normalized cuts and image segmentation

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Journal ArticleDOI

Cumulated gain-based evaluation of IR techniques

TL;DR: This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position, and test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences.
Proceedings ArticleDOI

SimRank: a measure of structural-context similarity

TL;DR: A complementary approach, applicable in any domain with object-to-object relationships, that measures similarity of the structural context in which objects occur, based on their relationships with other objects is proposed.