scispace - formally typeset
M

Marc Najork

Researcher at Google

Publications -  198
Citations -  9755

Marc Najork is an academic researcher from Google. The author has contributed to research in topics: Web page & Ranking (information retrieval). The author has an hindex of 45, co-authored 182 publications receiving 8504 citations. Previous affiliations of Marc Najork include Hewlett-Packard & Association for Computing Machinery.

Papers
More filters
Proceedings ArticleDOI

Detecting spam web pages through content analysis

TL;DR: Some previously-undescribed techniques for automatically detecting spam pages are considered, and the effectiveness of these techniques in isolation and when aggregated using classification algorithms is examined.
Journal ArticleDOI

Mercator: A scalable, extensible Web crawler

Allan Heydon, +1 more
- 15 Apr 1999 - 
TL;DR: This paper describes Mercator, a scalable, extensible Web crawler written entirely in Java, and comments on Mercator's performance, which is found to be comparable to that of other crawlers for which performance numbers have been published.
Proceedings ArticleDOI

A large-scale study of the evolution of web pages

TL;DR: It is found that the average degree of change varies widely across top-level domains, and that larger pages change more often and more severely than smaller ones.
Journal ArticleDOI

Web Crawling

TL;DR: The fundamental challenges of web crawling are outlined and the state-of-the-art models and solutions are described, and avenues for future work are highlighted.
Proceedings ArticleDOI

Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

TL;DR: This paper proposes that some spam web pages can be identified through statistical analysis, and examines a variety of properties, including linkage structure, page content, and page evolution, and finds that outliers in the statistical distribution of these properties are highly likely to be caused by web spam.