scispace - formally typeset
Open AccessJournal ArticleDOI

Understanding the network-level behavior of spammers

Anirudh Ramachandran, +1 more
- Vol. 36, Iss: 4, pp 291-302
Reads0
Chats0
TLDR
It is found that most spam is being sent from a few regions of IP address space, and that spammers appear to be using transient "bots" that send only a few pieces of email over very short periods of time.
Abstract
This paper studies the network-level behavior of spammers, including: IP address ranges that send the most spam, common spamming modes (e.g., BGP route hijacking, bots), how persistent across time each spamming host is, and characteristics of spamming botnets. We try to answer these questions by analyzing a 17-month trace of over 10 million spam messages collected at an Internet "spam sinkhole", and by correlating this data with the results of IP-based blacklist lookups, passive TCP fingerprinting information, routing information, and botnet "command and control" traces.We find that most spam is being sent from a few regions of IP address space, and that spammers appear to be using transient "bots" that send only a few pieces of email over very short periods of time. Finally, a small, yet non-negligible, amount of spam is received from IP addresses that correspond to short-lived BGP routes, typically for hijacked prefixes. These trends suggest that developing algorithms to identify botnet membership, filtering email messages based on network-level properties (which are less variable than email content), and improving the security of the Internet routing infrastructure, may prove to be extremely effective for combating spam.

read more

Content maybe subject to copyright    Report

Understanding the Network-Level Behavior of Spammers
Anirudh Ramachandran and Nick Feamster
College of Computing, Georgia Tech
{avr, feamster}@cc.gatech.edu
ABSTRACT
This paper studies the network-level behavior of spammers, includ-
ing: IP address ranges that send the most spam, common spamming
modes (e.g., BGP route hijacking, bots), how persistent across time
each spamming host is, and characteristics of spamming botnets.
We try to answer these questions by analyzing a 17-month trace
of over 10 million spam messages collected at an Internet “spam
sinkhole”, and by correlating this data with the results of IP-based
blacklist lookups, passive TCP fingerprinting information, routing
information, and botnet “command and control” traces.
We find that most spam is being sent from a few regions of
IP address space, and that spammers appear to be using transient
“bots” that send only a few pieces of email over very short peri-
ods of time. Finally, a small, yet non-negligible, amount of spam
is received from IP addresses that correspond to short-lived BGP
routes, typically for hijacked prefixes. These trends suggest that de-
veloping algorithms to identify botnet membership, filtering email
messages based on network-level properties (which are less vari-
able than email content), and improving the security of the Internet
routing infrastructure, may prove to be extremely effective for com-
bating spam.
Categories and Subject Descriptors
C.2.0 [Computer Communication Networks]: Security and pro-
tection; C.2.3 [Computer Communication Networks]: Network
operations network management
General Terms
Design, Management, Reliability, Security
Keywords
spam, botnet, BGP, network management, security
1. Introduction
This paper presents a study of the network-level characteristics
of unsolicited commercial email (“spam”). Much attention has been
devoted to studying the content of spam, but comparatively little at-
tention has been paid to spam’s network-level properties. Conven-
tional wisdom often asserts that most of today’s spam comes from
botnets, and that a large fraction of spam comes from Asia; a few
studies have attempted to quantify some of these characteristics [
5].
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGCOMM06,September11-15,2006,Pisa,Italy.
Copyright2006ACM1-59593-308-5/06/0009...
$5.00.
Unfortunately, little is known about how much spam comes from
botnets versus other techniques (e.g., short-lived route announce-
ments, open relays, etc.), the geographic and topological distribu-
tion of where most spam originates (in terms of Internet Service
Providers, countries, and IP address space), the extent to which dif-
ferent spammers use the same network resources, the stationarity
of these properties over time, and so forth. A primary goal of this
paper is to shed some light on these relatively unstudied questions.
Beyond merely exposing spammers’ behavior, gathering infor-
mation about the network-level behavior of spam could be a ma-
jor asset for designing spam filters that are based on spammers’
network-level behavior (presuming that the network-level charac-
teristics of spam are sufficiently different than those of legitimate
mail, a question we explore further in Section 4). Whereas spam-
mers have the flexibility to alter the content of emails—both per-
recipient and over time as users update spam filters—they have far
less flexibility when it comes to altering the network-level proper-
ties of the spam they send. It is far easier for a spammer to alter the
content of email messages to evade spam lters than it is for that
spammer to change the ISP, IP address space, or botnet from which
spam is sent.
Towards the goal of developing techniques that will help in the
design of more robust network-level spam filters, this paper char-
acterizes the network-level behavior of spammers as observed at
a large spam sinkhole domain, which stores complete logs of all
spam received from August 2004 through December 2005. We
perform a joint analysis of the data collected at this sinkhole with
an archive of BGP route advertisements as heard from the receiving
network, traces from the “command and control” of a Bobax botnet,
and traces of legitimate email from the mail server logs of a large
email service provider. Although many aspects of mail headers can
be forged, we base our analysis strictly on properties of the sender
that are difficult to forge (e.g., IP addresses that made connections
to our mail servers, passive TCP fingerprints, corresponding route
announcements, etc.).
We draw the following surprising conclusions from our study:
The vast majority of received spam arrives from a few con-
centrated portions of IP address space (Section
4). Spam
filtering techniques currently make no assumptions about
the distribution of spam across IP address space. In a re-
lated area, many worm propagation models assume a uni-
form distribution of vulnerable hosts across IP address space
(e.g., [
29]). In contrast, we find that the vast majority
of spamming hosts—and, perhaps not coincidentally, most
Bobax-infected hosts—lie within a small number of IP ad-
dress space regions. Unfortunately, with a few exceptions
(e.g., 60.* 70.*), most legitimate email comes from the
same regions of IP address space, which suggests that, in
general, effective filtering based on network-level properties
may require determining second-order characteristics (e.g.,
botnet membership).
291

Most received spam is sent from Windows hosts, each of
which sends a relatively small volume of spam to our do-
main (Section 5). Most bots send a relatively small volume
of spam to our sinkhole (i.e., less than 100 pieces of spam
over 17 months), and about three-quarters of them are only
active for a single time period of less than two minutes (65%
of them send all spam in a “single shot”).
A small set of spammers continually use short-lived route an-
nouncements to remain untraceable (Section
6). A small por-
tion of spam is sent by sophisticated spammers, who briefly
advertise IP prefixes, establish a connection to the victim’s
mail relay, and withdraw the route to that IP address space
after spam is sent. Anecdotal evidence has suggested that
spammers might be exploiting the routing infrastructure to
remain untraceable [
1, 30]; this paper quantifies and docu-
ments this activity for the first time. To our surprise, we dis-
covered a new class of attack, where spammers attempt to
evade detection by hijacking large IP address blocks (e.g.,
/8s) and sending spam from widely dispersed “dark” (i.e.,
unused or unallocated) IP addresses within this space.
Beyond these findings, this paper’s joint analysis of several
datasets provides a unique window into the network-level charac-
teristics of spam. To our knowledge, this paper presents the first
study that examines the interplay between spam, botnets, and the
Internet routing infrastructure.
We acknowledge that our spam corpus represents only a sin-
gle vantage point, and, as such, drawing general conclusions about
Internet-wide spam is not possible. Our goal is not to present con-
clusive figures about Internet-wide characteristics of spam. Indeed,
the data we have collected is a small, localized sample of all spam
traffic, and our statistics may not be reflective of Internet-wide char-
acteristics. However, the spam we have collected represents an in-
teresting dataset as it reflects the complete set of spam emails re-
ceived by a single Internet domain. This dataset exposes spamming
as a typical network operator for some Internet domain might also
witness it. This unique view can help us better understand whether
the features of spam that any single network operator observes
could be useful in developing more effective filtering techniques.
With these goals in mind and an understanding of the context
of our data, we offer the following additional observations on the
implications of our results for the design of more effective tech-
niques for spam mitigation, which we revisit in more detail in Sec-
tion
7. First, the ability to trace the identities of spammers hinges
on securing the routing infrastructure. Second, the distribution of
spam and botnet activity across IP space suggests that, for some IP
address ranges and networks, spam filters might monitor network-
wide spam arrival patterns and attribute higher levels of suspicion
to spam originating from networks with higher spam activity. Given
the highly variable nature of the content of spam messages, incor-
porating general network-level properties of spam into filters may
ultimately provide significant gains over more traditional methods
(e.g., content-based ltering), both through increased robustness
and the ability to stop spam closer to its source.
The rest of this paper is organized as follows. Section
2 pro-
vides background on spamming and an overview of previous re-
lated work. In Section 3, we describe our data collection techniques
and the datasets we used in our analysis. In Section
4, we study the
distribution of spammers, spamming botnets, and legitimate mail
senders across IP address space. Section 5 presents our findings
regarding the relationship between the spam received at our sink-
holes and known spamming bots. Section
6 examines the extent to
which spammers use IP addresses that are generally unreachable
(e.g., using short-lived BGP route announcements) to send spam
untraceably. Based on our findings, Section
7 offers positive rec-
ommendations for designing more effective mitigation techniques.
We conclude in Section 8.
2. Background and Related Work
This section provides an overview of techniques both for sending
and for mitigating spam and discusses related work in these areas.
2.1 Spam: Methods and Mitigation
In this section, we offer background on the main techniques used
by spammers to send email, as well as some of the more commonly
used mitigation techniques.
2.1.1 Spamming methods
Spammers use various techniques to send large volumes of mail
while attempting to remain untraceable. We describe several of
these techniques, beginning with “conventional” methods and pro-
gressing to more intricate techniques.
Direct spamming. Spammers may purchase upstream connec-
tivity from “spam-friendly ISPs”, which turn a blind eye to the
activity. Occasionally, spammers buy connectivity and send spam
from ISPs that do not condone this activity and are forced to change
ISPs. Ordinarily, changing from one ISP to another would require
a spammer to renumber the IP addresses of their mail relays. To
remain untraceable and avoid renumbering headaches, spammers
sometimes obtain a pool of dispensable dialup IP addresses, send
outgoing traffic from a high-bandwidth connection the IP address
spoofed to appear as if it came from the dialup connection, and
proxy the reverse traffic through the dialup connection back to the
spamming hosts [
25].
Open relays and proxies. Open relays are mail servers that
allow unauthenticated Internet hosts to connect and relay email
through them. Originally intended for user convenience (e.g., to let
users send mail from a particular relay while they are traveling or
otherwise in a different network), open relays have been exploited
by spammers due to the anonymity and amplification offered by
the extra level of indirection. It appears that the widespread deploy-
ment and use of blacklisting techniques have all but extinguished
the use of open relays and proxies to send spam [
21, 26].
Botnets. Conventional wisdom suggests that the majority of
spam on the Internet today is sent by botnets—collections of ma-
chines acting under one centralized controller [
3, 4, 31]. The
W32/Bobax (“Bobax”) worm (of which there are many variants)
exploits the DCOM and LSASS vulnerabilities on Windows sys-
tems [
18], allows infected hosts to be used as a mail relay, and at-
tempts to spread itself to other machines affected by the above vul-
nerabilities, as well as over email. This paper studies the network-
level properties of spam sent by Bobax drones. Agobot and SDBot
are two other bots purported to send spam [12].
BGP spectrum agility. This study has discovered a new type of
cloaking mechanism—BGP “spectrum agility”—whereby spam-
mers briefly announce (often hijacked) IP address space from
which they send spam and the routes to that IP address space once
the spam has been sent. Although we have observed this behavior
informally several years ago [6] and subsequent anecdotal evidence
has suggested that spammers may use this technique [
1], our study
thoroughly documents this activity, and further finds that spammers
may be using spectrum agility to complement spamming by other
methods.
2.1.2 Mitigation techniques
Techniques for mitigating spam are as varied as techniques to
send spam, and most existing techniques have significant draw-
292

backs. One of the most widely used anti-spam techniques is filter-
ing, which typically classifies email based on its content; content-
based ltering uses features of the contents of an email’s headers
or body to determine whether it is likely to be spam. Content-based
filters, such as those incorporated by popular spam filters like Spa-
mAssassin [
27], successfully reduce the amount of spam that ac-
tually reaches a user’s inbox. On the other hand, content-based fil-
tering has drawbacks. Users and system administrators must con-
tinually update their filtering rules and use large corpuses of spam
for training; in response, spammers devise new ways of altering the
contents of an email to circumvent these filters. The cost of evading
content-based filters for spammers is negligible, since spammers
can easily alter content to attempt to evade these filters.
In addition to performing content-based checks, many mail l-
ters, including SpamAssassin, also perform lookups to determine
whether the sending IP address is in a “blacklist”. Blacklists of
known spammers, open relays and open proxies remain one of to-
day’s predominant spam filtering techniques. There are more than
30 widely used blacklists in use today; each of these lists is sep-
arately maintained, and insertion into these lists is based on many
different types of observations (e.g., operating an open relay, send-
ing mail to a spam trap, etc.). The results in this paper—in par-
ticular, that IP address space is often “stolen” to send spam and
that many bot IP addresses are short-lived—indicate that this long-
standing method for filtering spam could become much less effec-
tive as spammers adopt these more sophisticated techniques.
2.2 Related Work
In this section, we first review previous work that has studied
various spamming and spam-mitigation techniques, as well as the
behavior of various worms and botnets. We then briefly discuss pre-
vious studies of unorthodox routing announcements. Previous work
has studied each of these phenomena to some degree in isolation,
but this study is the first to perform a joint analysis of spamming be-
havior, botnet characteristics, and Internet routing to better under-
stand the characteristics and network-level behavior of spammers.
2.2.1 Spam and botnets
Previous studies have investigated the behavior and properties of
worms, botnets, and other spam sources. Casado et al. used passive
measurements of packet traces captured from about 2,500 spam
sources to estimate the bottleneck bandwidths of roughly 25,000
TCP ows from spam sources and found peaks at common band-
widths (e.g., modem speeds) [
2]. Kumar et al. deconstructed the
source code of the “Witty” worm to estimate various properties
about Internet hosts (e.g., host uptime) as well as about the propaga-
tion of the worm itself (e.g., who infected whom) [
14]. In contrast,
our work explores the behavior of spammers in depth, although we
also peripherally study malware whose exclusive purpose is to send
spam (i.e., the “Bobax drone).
Several previous and ongoing projects are studying spammers’
attempts to harvest email addresses for the purposes of spamming.
For instance, Project Honeypot sinks email traffic for unused MX
records and hands out “trap” email addresses to investigate harvest-
ing behavior and to help identify spammers [23]. A previous study
has used the data from Project Honeypot to analyze the methods
employed by spammers; monitor the time it takes from when an
email address is harvested to the time when that address rst re-
ceives spam; the countries where most harvesting infrastructure is
located; and the persistence (across time) of various harvesters [22].
We present preliminary results from a similar study in a technical
report version of this paper [24].
In Section
5, we correlate spam arrivals with traces of hosts
known to be infected with malware. Moore et al. found that the ma-
jority of hosts—and more than 80% of the hosts in Asia—did not
patch the relevant vulnerability until well after actual outbreak [
19],
which makes it more reasonable to assume that IP addresses of
Bobax drones remain infected for the duration of our spam trace.
2.2.2 Mitigation
A recent presentation from the SpamAssassin project discusses
several techniques that the SpamAssassin spam filtering tool has
incorporated to detect forged X-Mailer headers, weak “hash-
busting” schemes, etc. [17]. Although their work also involves re-
verse engineering, the project focuses on analyzing mail contents
to reverse-engineer spamming tools and techniques (with the goal
of using this analysis to incorporate better content-filtering rules
into SpamAssassin). Though our paper also studies such properties
of spam, our analysis hinges on network-level properties—for in-
stance, the IP address of the last remote mail relay (which previous
work has also observed as one of the few parts of the SMTP header
that cannot be forged [
10])—rather than the artifacts of spamming
software that appear in email content.
Jung et al. performed a study of DNS blacklist (DNSBL) traffic
and the effectiveness of blacklists [
13] and observed that 80% of the
IP addresses that were sending spam were listed in DNSBLs two
months after the collection of the traffic trace. Our study also mea-
sures the effectiveness of DNSBLs albeit in real time—we examine
whether a host IP is listed in a set of DNSBLs at the time the host
spammed our domain. While we also find that about 80% of the re-
ceived spam was listed in at least one of eight blacklists, hosts that
employ spamming techniques such as BGP spectrum agility tend
to be listed in far fewer blacklists. We also find that even the most
aggressive blacklist has a false negative rate of about 50%.
2.2.3 Unorthodox route announcements
Feamster et al. studied route advertisements for “bogon” IP ad-
dress space (i.e., private address space or unassigned addresses) [
8].
However, since bogus or reserved address ranges are well-known,
transit ISPs often lter them, resulting in little or no spam from
such ranges. Cursory studies have suggested that spammers adver-
tise routes to hijacked IP prefixes for short amounts of time to send
spam [
6, 28, 30]. In Section 6, we quantify the extent to which the
sending of spam coincides with short-lived BGP route announce-
ments for IP prefixes containing the mail relays that send spam.
3. Data Collection
This section describes the datasets that we use in our analysis.
Our primary dataset consists of the actual spam email messages
collected at a large spam sinkhole. To study the specific charac-
teristics of certain subsets of spammers, we augment this dataset
with three other data sources. First, to compare the network-level
characteristics of spam received at our sinkhole with similar char-
acteristics of legitimate email traffic, we obtain a corpus of email
logs from a large email provider who automatically rejects email
likely to be spam (thus allowing us to distinguish legitimate mail
from spam). Second, we intercept the “command and control” traf-
fic from a Bobax botnet at a sinkhole to identify IP addresses that
were infected with the Bobax worm (and, hence, are likely mem-
bers of botnets that are used for the sole purpose of sending spam).
Third, we collect BGP routing data at the upstream border router
of the same network where we are receiving spam and monitor the
routing activity for the IP prefixes corresponding to the IP addresses
from which spam was sent.
293

0
20000
40000
60000
80000
100000
120000
140000
160000
0 100 200 300 400 500
Count
Day
Spam
Distinct IPs
Figure 1: The amount of spam received per day at our sinkhole from
August 2004 through December 2005.
3.1 Spam Email Traces
To obtain a sample of spam, we registered a domain with no le-
gitimate email addresses and established a DNS Mail Exchange
(MX) record for it. Hence, all mail received by this server is spam.
The “sinkhole” has been capturing spam since August 5, 2004. Fig-
ure
1 shows the amount of spam that this sinkhole received per day
through January 6, 2006 (the period of time over which we conduct
our analysis). Although the total amount of spam received on any
given day is rather erratic, the data indicates two unsettling trends.
First, the amount of spam that the sinkhole is receiving generally
appears to be increasing. Second, and perhaps more troubling, the
number of distinct IP addresses from which we see spam on any
given day also appears to be on the rise.
In addition to simply collecting spam traces, the sinkhole runs
Mail Avenger [
16], a customizable Simple Mail Transfer Protocol
(SMTP) server that allows us to take specific actions upon receiv-
ing email from a mail relay (e.g., running traceroute to the mail
relay sending the mail, performing DNSBL lookups for the relay’s
IP address, performing a passive TCP fingerprint of the relay). We
have configured Mail Avenger to (1) accept all mail, regardless
of the username for which the mail was destined and (2) gather
network-level properties about the mail relay from which spam is
received. In particular, the mail server collects the following infor-
mation about the mail relay when the spam is received:
the IP address of the relay that established the SMTP con-
nection to the sinkhole
a traceroute to that IP address, to help us estimate the network
location of the mail relay
a passive p0f TCP fingerprint, based on properties of the
TCP stack, to allow us to determine the operating system of
the mail relay
the result of DNS blacklist (DNSBL) lookups for that mail
relay at eight different DNSBLs.
Note that, unlike many features of the SMTP header, these features
are not easily forged.
3.2 Legitimate Email Traces
One of the motivations for our study was to determine whether
the network-level characteristics of spam differ markedly from
those of legitimate email. To perform this comparison, we obtained
a corpus of mail logs from a large email provider that runs a Post-
fix mail server. Because this provider manages millions of mail-
boxes, it performs extensive spam filtering at its incoming SMTP
servers. Accordingly, the logs for this mail server record, for each
SMTP connection attempt, the time at which the connection at-
tempt was made, the IP address of the connecting host, whether the
mail was accepted or rejected, and, if the email was rejected, the
reason for rejection. Using these logs, we can estimate the network-
level properties of email that this domain deems to be legitimate.
We performed our analysis over approximately 700,000 pieces of
legitimate mail, as received at this provider’s mail server on June
13, 2006. Although the corpus of legitimate mail is from a different
domain than our sinkhole, both the spam sinkhole and the domain
for legitimate email constitute large, domain-wide data sources for
spam and legitimate mail, respectively, and are representative sam-
ples of spam and legitimate email that could be expected at any
Internet domain.
3.3 Botnet Command and Control Data
To identify a set of hosts that are sending email from botnets,
we used a trace of hosts infected by the W32/Bobax (“Bobax”)
worm from April 28-29, 2005. This trace was captured by hijack-
ing the authoritative DNS server for the domain running the com-
mand and control of the botnet and redirecting it to a machine at
a large campus network. This method was only possible because
(1) the Bobax drones contacted a centralized controller using a do-
main name, and (2) the researchers who obtained the trace were
able to obtain the trust of the network operators hosting the author-
itative DNS for that domain name. This technique directs control of
the botnet to the honeypot, which effectively disables it for spam-
ming for this period. On the upside, because all Bobax drones now
attempt to contact our command-and-control sinkhole rather than
the intended command-and-control host, we can collect a packet
trace to determine the members of the botnet.
To obtain a sample of spamming behavior from known botnets,
we correlate Bobax botnet membership from the 1.5-day trace of
Bobax drones with the IP addresses from which we receive spam in
the sinkhole trace. This technique, of course, is not perfect: over the
course of our spam trace, hosts may be patched. Although we can-
not precisely determine the extent to which the transience of bots
affects our analysis, previous work suggests that, even for highly
publicized worms, the rate at which vulnerable hosts are patched
is slow enough to expect that many of these infected hosts remain
unpatched [
19]. We also acknowledge another shortcoming of our
approach: if hosts use dynamic addressing, different hosts (some of
which may be Bobax-infected and some of which may not be) may
use one of the IP addresses observed in the Bobax trace. However,
we believe that the resulting inaccuracies are small: We observe
a significantly higher percentage of Windows hosts in the subset
of spam messages sent by IP addresses in our Bobax trace than in
the complete spam dataset, which indirectly suggests that the hosts
with IP addresses from the Bobax trace were indeed part of a spam-
ming botnet when they spammed our sinkhole.
3.4 BGP Routing Measurements
In this paper, we study whether an IP address of the mail relay
from which we receive spam is reachable and how long it remains
reachable. We are particularly interested in cases where a route for
an IP address is reachable for only a short period of time, coinciding
with time at which spam was sent. To measure network-layer reach-
ability from the network where spam was received, we co-located
a “BGP monitor” in the same network as our spam sinkhole, sim-
ilar to that in our previous work [
7]. The monitor receives BGP
updates from the border router, and our analysis includes a BGP
update stream that overlaps with our spam trace. Since the moni-
294

tor has an internal BGP session to the network’s border router, it
will see only those BGP updates that cause a change in the border
router’s choice of best route to a prefix. Despite not observing all
BGP updates, the monitor receives enough information to allow us
to study the properties of short-lived BGP route announcements:
the monitor will have no route to the prefix at all if the prefix is
unreachable.
4. Network-level Characteristics of Spammers
In this section, we study some first-order network-level char-
acteristics of spam sources. We survey the portions of IP address
space from which our sinkhole received spam and the ASes that
sent spam to the sinkhole. We also observe the persistence of these
characteristics over time. To determine whether these network level
characteristics could be suitable for filtering spam, we compare the
network-level characteristics of spam to the same characteristics
for legitimate email, as received at a large domain that manages
approximately 40 million mailboxes.
We find that the distribution of spam across IP address space is
(1) nearly identical to the legitimate mail distributions (with a few
exceptions), and (2) quite persistent over time. Still, the distribu-
tion of spam senders across IP address space is far from uniform,
and spam arrival by IP address range is much more pronounced,
persistent, and concentrated than similar characteristics by IP ad-
dress. Additionally, we find that a large fraction of spam is received
from just a handful of ASes: nearly 12% of all received spam origi-
nates from mail relays in just two ASes (from Korea and China, re-
spectively), and the top 20 ASes are responsible for sending nearly
37% of all spam. This distribution (as well as the main perpetrators)
is also persistent over time. This heavily skewed distribution sug-
gests that spam filtering efforts might better focus on identifying
high-volume, persistent groups of spammers (e.g., by AS number),
rather than on blacklisting individual IP addresses, many of which
are transient.
4.1 Distribution Across Networks
To determine the address space from which spam was arriving
(“prevalence”) and whether the distribution across IP addresses
changes over time (“persistence”), we tabulated the spam in our
trace by IP address space. We find that spam arrivals across IP space
are far from uniform.
Finding 4.1 (Distribution across IP address space) The major-
ity of spam is sent from a relatively small fraction of IP address
space.
Figure
2 shows the number of spam email messages received
over the course of the entire trace, as a function of IP address space.
Several ranges of IP address space originate large amount of email
traffic (both spam and legitimate), including space allocated to ca-
ble modem providers (e.g., 24.*) and the address space allocated
to the Asia Pacific Network Information Center (APNIC) regional
Internet registry (e.g., 61.*). Although most IP address ranges that
originate a significant amount of spam also originate a lot of legit-
imate mail traffic, a few IP address ranges have significantly more
spam than legitimate mail (e.g., 80.*–90.*), and vice versa (e.g.,
60.*–70.*). This characteristic suggests that it may be possible to
use IP address ranges to distinguish spam from legitimate email.
We repeated the analysis of the network-level characteristics of
spam per day across months, per month across years, and so forth.
We also compared the distribution of spam collected at our sink-
hole to the distribution of rejected SMTP connections at the domain
where we performed our analysis of legitimate email and found
0
0.2
0.4
0.6
0.8
1
240.0.0.0
210.0.0.0
180.0.0.0
150.0.0.0
120.0.0.0
90.0.0.0
60.0.0.0
30.0.0.0
0.0.0.0
CDF
IP Space
Legitimate email
Spam
Spamming IPs
Figure 2: Fraction of spam email messages and comparison with legit-
imate email received (as a function of IP address space); also, fraction
of client IP addresses that sent spam, binned by /24.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fraction of clients
Number of Appearances
Figure 3: The number of distinct times that each client IP sent mail to
our sinkhole (regardless of the nu mber emails sent in each batch).
that the distribution of these connections across IP address space
is similar to that shown in Figure
2. All of these distributions have
remained roughly constant over time (i.e., the results look similar
to those shown in Figure 2). In contrast, individual IP addresses
are far more transient. Figure
3 shows that even though a few IP
addresses sent more than 10,000 emails, about 85% of client IP ad-
dresses sent less than 10 emails to the sinkhole, indicating that tar-
geting an individual IP address might not help mitigate spam with-
out sharing information across domains. This finding has an impor-
tant implication for spam lter design: Though the individual IP ad-
dresses from which spam is received changes from day-to-day, the
fact that spam continually comes from the same IP address space
suggests that incorporating these more persistent features may be
more effective, particularly in portions of the IP address space that
send either mostly spam or mostly legitimate email.
In many cases, IP address ranges are not adequate for distin-
guishing spam from legitimate email. To determine whether other
network-level properties, such as the AS from which the email was
sent, could serve as better classifiers, we examined the distribution
of spam across ASes and compared this feature to the distribution
of legitimate email across ASes.
Finding 4.2 (Distribution across ASes) More than 10% of spam
received at our sinkhole originated from mail rel ays in tw o ASes,
295

Citations
More filters
Proceedings Article

BotMiner: clustering analysis of network traffic for protocol- and structure-independent botnet detection

TL;DR: This paper presents a general detection framework that is independent of botnet C&C protocol and structure, and requires no a priori knowledge of botnets (such as captured bot binaries and hence the botnet signatures, and C &C server names/addresses).
Journal ArticleDOI

SybilGuard: defending against sybil attacks via social networks

TL;DR: This paper presents SybilGuard, a novel protocol for limiting the corruptive influences of sybil attacks, based on the "social network "among user identities, where an edge between two identities indicates a human-established trust relationship.
Proceedings Article

BotHunter: detecting malware infection through IDS-driven dialog correlation

TL;DR: A new kind of network perimeter monitoring strategy, which focuses on recognizing the infection and coordination dialog that occurs during a successful malware infection, and contrast this strategy to other intrusion detection and alert correlation methods.
Proceedings ArticleDOI

Your botnet is my botnet: analysis of a botnet takeover

TL;DR: This paper reports on efforts to take control of the Torpig botnet and study its operations for a period of ten days, which provides a new understanding of the type and amount of personal information that is stolen by botnets.
Proceedings ArticleDOI

SybilLimit: A Near-Optimal Social Network Defense against Sybil Attacks

TL;DR: The novel SybilLimit protocol is presented, which leverages the same insight as SybilGuard but offers dramatically improved and near-optimal guarantees, and provides the first evidence that real-world social networks are indeed fast mixing.
References
More filters
Proceedings Article

How to Own the Internet in Your Spare Time

TL;DR: This work develops and evaluates several new, highly virulent possible techniques: hit-list scanning, permutation scanning, self-coordinating scanning, and use of Internet-sized hit-lists (which creates a flash worm).
Proceedings ArticleDOI

Code-Red: a case study on the spread and victims of an internet worm

TL;DR: The experience of the Code-Red worm demonstrates that wide-spread vulnerabilities in Internet hosts can be exploited quickly and dramatically, and that techniques other than host patching are required to mitigate Internet worms.
Proceedings ArticleDOI

Understanding BGP misconfiguration

TL;DR: This paper presents the first quantitative study of BGP misconfiguration, and finds that configuration errors are pervasive, with 200-1200 prefixes suffering from misconfigurations each day.
Proceedings ArticleDOI

An empirical study of spam traffic and the use of DNS black lists

TL;DR: In this paper, the authors present quantitative data about SMTP traffic to MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) based on packet traces taken in December 2000 and February 2004, and show that the volume of email has increased by 866% between 2000 and 2004.
Proceedings ArticleDOI

Measuring the effects of internet path faults on reactive routing

TL;DR: The findings suggest that reactive routing is most effective between hosts that have multiple connections to the Internet, and passive observations of BGP routing messages could be used to predict about 20% of impending failures, allowing re-routing systems to react more quickly to failures.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What would make BGP spectrum agility attacks more difficult to mount?

A routing infrastructure that instead provided protection against route hijacking (specifically, unauthorized announcement of IP address blocks) would make BGP spectrum agility attacks more difficult to mount. 

Originally intended for user convenience (e.g., to let users send mail from a particular relay while they are traveling or otherwise in a different network), open relays have been exploited by spammers due to the anonymity and amplification offered by the extra level of indirection. 

To test this hypothesis, the authors used theresults from real-time DNSBL lookups performed by Mail Avenger to 8 different blacklists at the time the mail was received . 

A small portion of spam is sent by sophisticated spammers, who briefly advertise IP prefixes, establish a connection to the victim’s mail relay, and withdraw the route to that IP address space after spam is sent. 

Because a very large fraction of spam comes from Windows hosts, their hypothesis is that many of these machines are infected hosts that are bots. 

This heavily skewed distribution suggests that spam filtering efforts might better focus on identifying high-volume, persistent groups of spammers (e.g., by AS number), rather than on blacklisting individual IP addresses, many of which are transient. 

The persistence of Bobax-infected hosts appears to be mildly bimodal: although roughly 75% of Bobax drones persist for less than two minutes, the remainder persist for a day or longer, about 50 persist for about six months, and 10 persist for entire length of the trace. 

Since one of their objectives is to study the effectiveness of IP-based filtering (rather than, say, count the total number of hosts), the authors are interested more in measuring the persistence of IP addresses, not hosts. 

2. Network-level properties may be observable in the middle of the network, or closer to the source of the spam, which may allow spam to be quarantined or disposed of before it ever reaches a destination mail server. 

As an added benefit, route announcements for shorter IP prefixes (i.e., larger blocks of IP addresses) are less likely to be blocked by ISPs’ route filters than route announcements or hijacks for longer prefixes. 

only two ASes—AS 4788 (Telekom Malaysia) and AS 4678 (Canon Network Communications, in Japan)—appear among both the top-10 most persistent and most voluminous spammers using short-lived BGP routing announcements. 

More striking is that, while only about 4% of the hosts from which the authors receive spam are from hosts are running operating systems other than Windows, this small set of hosts appears to be responsible for at least 8% of the spam the authors receive. 

Although many aspects of mail headers can be forged, the authors base their analysis strictly on properties of the sender that are difficult to forge (e.g., IP addresses that made connections to their mail servers, passive TCP fingerprints, corresponding route announcements, etc.). 

Given the sophistication required to send spam under the protection of short-lived routing announcements (especially compared with the relative simplicity of purchasing access to a botnet), the authors doubted that it was particularly prevalent. 

The authors are at a loss to explain certain aspects of this behavior, such as why some of the machines appear to have IP addresses from allocated space, when it would be simpler to “step around” the allocated prefix blocks, but, needless to say, the spammers using this technique appear to be very sophisticated.