M
Marc Najork
Researcher at Google
Publications - 198
Citations - 9755
Marc Najork is an academic researcher from Google. The author has contributed to research in topics: Web page & Ranking (information retrieval). The author has an hindex of 45, co-authored 182 publications receiving 8504 citations. Previous affiliations of Marc Najork include Hewlett-Packard & Association for Computing Machinery.
Papers
More filters
Proceedings ArticleDOI
Detecting spam web pages through content analysis
TL;DR: Some previously-undescribed techniques for automatically detecting spam pages are considered, and the effectiveness of these techniques in isolation and when aggregated using classification algorithms is examined.
Journal ArticleDOI
Mercator: A scalable, extensible Web crawler
Allan Heydon,Marc Najork +1 more
TL;DR: This paper describes Mercator, a scalable, extensible Web crawler written entirely in Java, and comments on Mercator's performance, which is found to be comparable to that of other crawlers for which performance numbers have been published.
Proceedings ArticleDOI
A large-scale study of the evolution of web pages
TL;DR: It is found that the average degree of change varies widely across top-level domains, and that larger pages change more often and more severely than smaller ones.
Journal ArticleDOI
Web Crawling
Christopher Olston,Marc Najork +1 more
TL;DR: The fundamental challenges of web crawling are outlined and the state-of-the-art models and solutions are described, and avenues for future work are highlighted.
Proceedings ArticleDOI
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
TL;DR: This paper proposes that some spam web pages can be identified through statistical analysis, and examines a variety of properties, including linkage structure, page content, and page evolution, and finds that outliers in the statistical distribution of these properties are highly likely to be caused by web spam.