Anomaly detection in web graphs using vertex neighbourhood based signature similarity methods

doi:10.1109/ICDSE.2016.7823959

Proceedings ArticleDOI

Anomaly detection in web graphs using vertex neighbourhood based signature similarity methods

- Vol. 2016, pp 1-6

TLDR

Two different types of anomalies which occur during crawling and two novel similarity measures based on vertex neighbourhood, which overcomes the proposed anomalies are proposed.

Abstract:

With massive increase in the amount of data being generated each day, we need automated tools to oversee the evolution of the web and to quantify global effects like pagerank of webpages. Search engines crawl the web every now and then to build web graphs which store information about the structure of the web. This is an expensive and error prone process. Central to this problem is the notion of graph similarity (between two graphs spaced in time), which validates how well search engines secure content from web and the quality of the search results they produce. In this paper, we propose two different types of anomalies which occur during crawling and two novel similarity measures based on vertex neighbourhood, which overcomes the proposed anomalies. Extensive experimentation on real world datasets shows significant improvement over state of art signature similarity based methods.

Anomaly detection in web graphs using vertex neighbourhood based signature similarity methods

Citations

Boosting Positive and Unlabeled Learning for Anomaly Detection With Multi-Features

Imbalanced Aircraft Data Anomaly Detection

References

Similarity estimation techniques from rounding algorithms

Finding near-duplicate web pages: a large-scale evaluation of algorithms

Web graph similarity for anomaly detection

Effective web crawling

Methods and apparatus for computing graph similarity via signature similarity

Related Papers (5)

Using a Layered Markov Model for Distributed Web Ranking Computation

Web Spam Detection by Exploring Densely Connected Subgraphs

Reverse top-k search using random walk with restart

Capturing missing edges in social networks using vertex similarity

Robustness of social and web graphs to node removal