scispace - formally typeset
Proceedings ArticleDOI

EvilSeed: A Guided Approach to Finding Malicious Web Pages

TLDR
EVILSEED leverages the crawling infrastructure of search engines to retrieve URLs that are much more likely to be malicious than a random page on the web, and increases the "toxicity" of the input URL stream.
Abstract
Malicious web pages that use drive-by download attacks or social engineering techniques to install unwanted software on a user's computer have become the main avenue for the propagation of malicious code. To search for malicious web pages, the first step is typically to use a crawler to collect URLs that are live on the Internet. Then, fast prefiltering techniques are employed to reduce the amount of pages that need to be examined by more precise, but slower, analysis tools (such as honey clients). While effective, these techniques require a substantial amount of resources. A key reason is that the crawler encounters many pages on the web that are benign, that is, the "toxicity" of the stream of URLs being analyzed is low. In this paper, we present EVILSEED, an approach to search the web more efficiently for pages that are likely malicious. EVILSEED starts from an initial seed of known, malicious web pages. Using this seed, our system automatically generates search engines queries to identify other malicious pages that are similar or related to the ones in the initial seed. By doing so, EVILSEED leverages the crawling infrastructure of search engines to retrieve URLs that are much more likely to be malicious than a random page on the web. In other words EVILSEED increases the "toxicity" of the input URL stream. Also, we envision that the features that EVILSEED presents could be directly applied by search engines in their prefilters. We have implemented our approach, and we evaluated it on a large-scale dataset. The results show that EVILSEED is able to identify malicious web pages more efficiently when compared to crawler-based approaches.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Graph based anomaly detection and description: a survey

TL;DR: This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs, and gives a general framework for the algorithms categorized under various settings.
Posted Content

Graph-based Anomaly Detection and Description: A Survey

TL;DR: A comprehensive survey of the state-of-the-art methods for anomaly detection in data represented as graphs can be found in this article, where the authors highlight the effectiveness, scalability, generality, and robustness aspects of the methods.
Proceedings ArticleDOI

Manufacturing compromise: the emergence of exploit-as-a-service

TL;DR: DNS traffic from real networks is used to provide a unique perspective on the popularity of malware families based on the frequency that their binaries are installed by drivebys, as well as the lifetime and popularity of domains funneling users to exploits.
Proceedings ArticleDOI

Nazca: Detecting Malware Distribution in Large-Scale Networks

TL;DR: This paper studies how clients in real-world networks download and install malware, and presents Nazca, a system that detects infections in large scale networks and looks at the telltale signs of the malicious network infrastructures that orchestrate these malware installers.
Proceedings Article

Automatically detecting vulnerable websites before they turn malicious

TL;DR: A novel classification system which predicts, whether a given, not yet compromised website will become malicious in the future, that currently benign websites will become compromised within a year is designed, implemented, and evaluated.
References
More filters
Proceedings ArticleDOI

Detection and analysis of drive-by-download attacks and malicious JavaScript code

TL;DR: A novel approach to the detection and analysis of malicious JavaScript code is presented that uses a number of features and machine-learning techniques to establish the characteristics of normal JavaScript code and is able to identify anomalous JavaScript code by emulating its behavior and comparing it to the established profiles.
Proceedings Article

EXPOSURE : Finding malicious domains using passive DNS analysis

TL;DR: This paper introduces EXPOSURE, a system that employs large-scale, passive DNS analysis techniques to detect domains that are involved in malicious activity, and uses 15 features that it extracts from the DNS traffic that allow it to characterize different properties of DNS names and the ways that they are queried.
Proceedings Article

All your iFRAMEs point to Us

TL;DR: The relationship between the user browsing habits and exposure to malware, the techniques used to lure the user into the malware distribution networks, and the different properties of these networks are studied.
Proceedings Article

The ghost in the browser analysis of web-based malware

TL;DR: This work identifies the four prevalent mechanisms used to inject malicious content on popular web sites: web server security, user contributed content, advertising and third-party widgets, and presents examples of abuse found on the Internet.
Proceedings Article

Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities.

TL;DR: The design and implementation of the Strider HoneyMonkey Exploit Detection System is described, which consists of a pipeline of “monkey programs” running possibly vulnerable browsers on virtual machines with different patch levels and patrolling the Web to seek out and classify web sites that exploit browser vulnerabilities.
Related Papers (5)