What is the effect of using this modification in a user-based collaborative filtering algorithm?

Since the probability for regular users is almost 0, and for attackers is almost 1, the effect of using this modification in a user-based collaborative filtering algorithm is to practically filter out malicious users from making recommendations, while reducing the influence of users with special preferences.

What is the overhead for the recommender system?

The only overhead for the recommender system would be to take the shilling probability into account when computing the recommendation list.

What is the effect of a button on a web based recommender system?

In an on-line web based recommender system, the authors can provide the user with an additional button to be pressed foractivating protection against shilling attackers.

What is the argument that a collaborative filtering system should not be perceived as malicious?

attackers in a collaborative filtering based recommender system should not necessarily be perceived as malicious by the other users, since the process of giving ratings to items is mostly a question of taste.

How many users have a 0 shilling probability?

Lines 8-12 assign a 0 shilling probability for users having less than the average RDMA and use function f to compute the shilling probability for the other users.

(Open Access) Preventing shilling attacks in online recommender systems (2005) | Paul-Alexandru Chirita

Q: What have the authors contributed in "Preventing shilling attacks in online recommender systems" ?

In this paper the authors propose several metrics for analyzing rating patterns of malicious users and evaluate their potential for detecting such shilling attacks. Building upon these results, the authors propose and evaluate an algorithm for protecting recommender systems against shilling attacks.

Q: What are the popular types of algorithms for collaborative filtering?

The most popular types of algorithms for collaborative filtering (CF) are user-based and item-based:1. User-based algorithms build for each user a neighborhood of users with similar opinions (i.e., ratings) in the system.

Q: What is the main idea behind the Random Bot attack?

The authors think that zero-knowledge attacks such as the Random Bot are particularly interesting, since for the other attacks, recommender systems administrators could increase the privacy of user profiles using cryptographic means [6, 10, 2], thus falling back to the zero-knowledge ones.

Preventing Shilling Attacks in Online Recommender

Systems

Paul-Alexandru Chirita

L3S Research Center

University of Hannover

Hannover, Germany

chirita@l3s.de

Wolfgang Nejdl

L3S Research Center

University of Hannover

Hannover, Germany

nejdl@l3s.de

Cristian Zamﬁr

L3S Research Center

University of Hannover

Hannover, Germany

zamﬁr@l3s.de

ABSTRACT

Collaborative ﬁltering techniques have been successfully em-

ployed in recommender systems in order to help users deal

with information overload by making high quality person-

alized recommendations. However, such systems have been

shown to be vulnerable to attacks in which malicious users

with carefully chosen proﬁles are inserted into the system

in order to push the predictions of some targeted items. In

this paper we propose several metrics for analyzing rating

patterns of malicious users and evaluate their potential for

detecting such shilling attacks. Bui ldi ng upon these results,

we propose and evaluate an algorithm for protecting rec-

ommender systems against shilling attacks. The algorithm

can be employed for monitoring user ratings and remov-

ing shilling attacker proﬁles from the process of computing

recommendations, thus maintaining the high quality of the

recommendations.

Categories and Subject Descriptors

H.3.3 [Information Storage and Retrieval]: Information

Search and Retrieval; H.3.4 [Information Storage and

Retrieval]: Systems and Software; H.3.5 [Information

Storage and Retrieval]: Online Information Services

General Terms

Algorithms, Experimentation, Performance

Keywords

Web applications, Recommender systems, Collaborative ﬁl-

tering, Shilling attacks

1. INTRODUCTION

Recommender systems based on collaborative ﬁltering play

an increasing role in ﬁltering information in an overloaded

information system. They are not only helping users ﬁnd

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

WIDM’05, November 5, 2005, Bremen, Germany.

relevant items, but are also beneﬁcial to companies produc-

ing items by increasing both selling rate and cross-sales.

Currently, there are quite several commercial recommender

systems used in e-commerce

[5], movie

2,3

and music recom-

mendations

[14], etc., as well as some research oriented ones

(see, for example, PocketLens [6], or ClickStream CF [3]),

and even governmental ones (such as NASA’s service for

recommendation of related technical reports [7]). In such

a collaborative ﬁltering based recommender system, users

build proﬁles by rating certain items, and obtain p ers onal-

ized recommendations for other, uknown items, based on the

correlation betwe en their ratings and those of other users.

The most popular types of algorithms for collaborative

ﬁltering (CF) are user-based and item-based:

1. User-based algorithms build for each user a neighbor-

ho od of users with similar opinions (i.e., ratings) in the

system. Ratings from these users are then employed

to generate recommendations for the target user.

2. Item-based algorithms compute a set of similar items

for each item and use these similarities to compute

recommendations.

Unfortunately, since good ratings promise a good selling

rate, these sy stems are prone to manipulation from produc-

ers or malicious users. Examples of manipulation have been

outlined in [4] and include attacks from popular systems

like Amazon and eBay. Recent research has shown that

most popular algorithms employed in current CF applica-

tions can be rather easily manipulated through biased pro-

ﬁles [4]. More s peciﬁcally, this can be achieved by introduc-

ing fake user proﬁl es that highly rate a set of target items,

and then rate other items, in such a way that they become

similar to many proﬁles of regular users. The desired result

is known as a shilling attack and consists of either increasing

(push attack) or lowering (nuke attack) the ratings of some

target items.

Attacks on recommender systems can aﬀect the quality of

the prediction for many users, resulting in decreasing over-

all user satisfaction with the system. Such threats may cost

users’ time and money and pose a serious challenge to the

recommender system administrators, who have to manually

discover the shilling attacke rs. This vulnerability of recom-

mender systems is even more severe if we think it actually

extends to any personalized information system in which an

http://www.amazon.com

http://www.movielens.org

http://www.tivo.com

http://www.audioscrobbler.com

attacker can introduce fake proﬁles in order to increase the

general interest for a set of target resources.

Based on the above observation that shilling attackers use

synthetic proﬁles

, in this paper we i nvestigate the use of

statistical metrics to reveal rating patterns of shilling attack-

ers. We experimentally evaluate these metrics for existing

zero-knowledge shilling attacks and propose an algorithm

that makes use of them to detect and i solate shilling attack-

ers. To the best of our knowledge, this is the ﬁrst algorithm

that eﬀectively detects the most general attacks on recom-

mender systems [4].

The rest of the paper is structured as follows: In Section

2 we introduce the most popular CF algorithms and outline

existing work on developing and guarding against attacks

on recommender systems. In Se ction 3 we deﬁne several

statistical metrics, which could be utilized to identify rat-

ing patterns of shilling attackers, and then we empirically

analyze each of them in Section 4. In Section 5 we propose

an algorithm, which detects shilling attackers by exploiting

these metrics. Finally, we show how it could be integrated

into an web-based recommender systems in Section 6, and

conclude with a summary and future work in Section 7.

2. BACKGROUND

2.1 Common CF Algorithms

User-based collaborative ﬁltering. The most popular

collaborative ﬁltering algorithm is the kNN-based algorithm.

Data is represented as a user × item matrix, with an entry

(u,i) representing either the rating user u gave to item i, if

she rated it, or null otherwise. Similarity between users is

then computed using the Pearson correlation [11]:

k∈I

−

)(R

−

)

k∈I

−

)

k∈I

−

)

(1)

where I is the set of items users i and j both rated, R

is the rating user i gave to item k, and

is the average

rating of user i. Finally, predictions for user i and item a

are computed using the k-nearest neighbors formula below:

j=1

−

)

j=1

(2)

Item-based collaborative ﬁltering. Another popular

CF algorithm is based on the item-item similarity [13]. Here,

items are thought of as two vectors in the |users| multidi-

mensional space, and the similarity between items i and j

can be computed using the cosine-based sim ilari ty:

Sim(i, j) =

i ·

i k · k

j k

(3)

Then, the prediction for an item is computed using a

weighted average of user’s ratings R

, weighed by the sim-

ilarity score:

all similar items,a

Sim(i, a)

all similar items,a

|sim(i, a)|

(4)

This is in fact reasonable, since no large-scale success could

be achieved by manually inspecting items and rating them

as a regular user would.

2.2 Identifying and Detecting Shilling Attacks

While there is a lot of work in the ﬁeld of developing

collaborative ﬁltering algorithms, only recently some papers

have concentrated on deve loping shilling attack models [9,

12] and on benchmarking the robustness of recommender

systems against shilling attacks [4, 8].

Lam and Riedl [4] introduce the Random Bot and the

Average Bot types of shilling attacks and evaluates their ef-

fectiveness in promoting the target items by computing the

prediction shift and expected Top-N occupancy for these

items in both user-based and item-based collaborative ﬁl-

tering env ironm ents.

A Random Bot attacker rates all the items in the system

with the mean 3.6 out of 5 and a 1.1 deviation. The intuition

behind this is that making random ratings within a certain

average interval will allow the attacker to have a high inﬂu-

ence in making predictions for other users. Depending on

the objective of the attack, the i tems in the target set are

rated with the minimum rating (for nuke attack) or maxi-

mum rating (for push attack). An Average Bot attacker is

more eﬀective but requires knowledge of the average rating

for each item in the system. Each Average Bot attacker

rates the items outside the target set randomly, following a

normal distribution with a mean equal to the ave rage rating

for that item, thus becoming more similar to the real users

than the Random Bot.

In [12] several other attack models are develop ed, under

the assumption that the attacker has some knowledge about

the ratings of the other users. We think that such knowledge

is hard to obtain, if not impossible in a real world sys tem.

Another disadvantage of this approach is that the ratings in-

tro duced by an attacker are algorithm dependent. Finally,

the detection of the attackers is not addressed in the paper.

In fact, the only work that partially tackles this challenge

is [15]. There, a spreadi ng similarity algorithm is develop e d

in order to detect groups of very similar shilling attackers.

While this is indeed a ﬁrst step, it only applies to a simpli-

ﬁed attack scenario, whereas our algorithm applies to more

general and powerful attack.

We think that zero-knowledge attacks such as the Ran-

dom Bot are particularly interesting, si nce for the other at-

tacks, recommender systems administrators could increase

the privacy of user proﬁles using cryptographic means [6, 10,

2], thus falling back to the zero-knowledge ones. In general,

the more insight a recommender system oﬀers about its rat-

ings, the more susceptible to attack it is, allowing powerful

low cost (in terms of number of fake proﬁles) attacks to be

mounted on the system.

3. METRICS FOR DETECTING RATING

PATTERNS OF SHILLING ATTACKERS

3.1 Introduction

We have argued that shilling attacks could be very nox-

ious to CF systems. Now, could we deﬁne an algorithm

independent approach, which protects against these attacks

by mining the rating patterns from the users database? We

consider the answer to be positive. This section will lay the

foundation towards such an approach, presenting ﬁrst the

statistical metrics that could be utilized to analyze user rat-

ings, and then a na¨ıve algorithm exploiting them. We will

then complete our attack detection scheme in Section 5.

3.2 Metrics

In [1], a number of algorithm independent qualitative fac-

tors are used in analyzing the inﬂuence of a user on a rec-

ommender system. While the goal of [1] was not related to

attacks at all, we think some of these factors could be use-

ful in analyzing patterns for the fake proﬁles introduced by

the diﬀerent types of shilling attacks. More speciﬁcally, we

found the following metrics suitable to address our problem

of detecting shill ing attacks:

1. Number of Prediction-Diﬀerences (NPD)

NPD is deﬁned for each user as the number of net pre-

diction changes in the system after her removal from

the system.

2. Standard Deviation in User’s Ratings

This m etric represents the degree in which a rating

given by a user to an item diﬀers from her average

ratings.

3. Degree of Agreement with Other Users

The degree of agreement is i n fact the average dev ia-

tion in a user’s ratings from the average rating of each

item: 1/k

a=1

−

|, where R

is the rating

user i gave to item a and

is the average rating of

item a.

4. Degree of Similarity with Top Neighbors

As stated by its name, this metric describes the av-

erage similarity weight w ith the Top-K neighbors of a

user. It uses the following formula:

i=1

As we will see from Section 4 these metrics provide a useful

insight into detecting shilling attackers, but are not suﬃcient

for identifying attackers, as they output quite a few false

positives. Therefore, we also deﬁned a new measure, Rating

Deviation from Mean Agreement (RDMA). Intuitively, this

can be seen as the measure of the deviation of agreement

with other users on a set of target items, combined with

the inverse rating frequency for these items. RDMA can be

computed in the following way:

RDMA

i=0

i,j

−Avg

(5)

where N

is the number of items user j rated, r

i,j

is the

rating given by user j to item i, NR

is the overall number

of ratings in the system given to item i. Alternatively, one

could also compute the number of ratings and the average

rating using only a subset of users, thus giving a local view

to our measure. We will discuss these variants in Section 5.

Since shilling attacks usually try to push items with low

ratings, the users mounting such an attack will also have a

high RDMA, because for the target items, the numerator of

each term (the diﬀerence from the average rating) will be

high, whereas the denominator (the number of predictions)

will be low; thus the overall term will be high

, and the

attackers will be simply removed from the computation of

recommendations.

Users with very special tastes might also have a high

RDMA value though. This is one of the reasons that de-

termined us to seek a more complex algorithm for shilling

attacks detection. The outcome will be presented in Section

3.3 Basic Algorithm for Detecting Shilling

Attackers

Considering the fact that attackers should have a high

inﬂuence in the system in order to eﬀectively promote the

target items, we want the metrics in Section 3.2 to reveal

distinctive features in the rating patterns. Attackers should

therefore have very high values for NPD, Average Similarity,

Degree of agreement with other users, and RDMA, as well

as a very low value for Standard Deviation in User Ratings.

The following algorithm detects shilling attackers based on

these expectations:

Algorithm 1. Basic algorithm for detecting shilling attackers.

01. Let MetricsLow=Standard Deviation in Ratings and

MetricsHigh= NPD, Degree of agreement,

Average Similarity, RDMA

02. for each m in MetricsHigh and MetricsLow

03. for each user u

04. compute m(u)

05. for each user u

06. if u has high values in MetricsHigh

and very low values in MetricsLow

then u is a shilling attacker

In lines 2-4, the algorithm computes for each user the

values for all statistical metrics, and then in lines 5-6 de-

cides, based on her assessed probability of being an attacker,

whether her proﬁle will be discarded from the computation

of recommendations or not.

4. EXPERIMENTS

4.1 Attack Scenario

An attack consists of a group of proﬁl es that are intro-

duced into the system in order to push the ratings of a set of

target items. The target items are usually unpopular items

(low average rating) that are not rated by many users

. Our

experiments are conducted using the Random Bot shilling

attack. In [4], the number of attacker proﬁles re aches alm ost

1% of the total number of users. The cost of an attack can

be estimated as a function of the number of proﬁles that

have to be introduced in the system. Usually, intro duci ng

more than 1% f ake proﬁles can be considered infeasible, but

we will go up to 3% attacker proﬁles. For implementing the

experiments we used the open source MultiLens

collabora-

tive ﬁltering platform.

We have evaluated the patterns from section 3 for databases

without any attackers and with 3% Random Bot attackers.

The target items set is composed of three randomly selected

items that have a low average rating, as well as a small

number of ratings.

In our exp eri ments we have used the MovieLens database

containing 100,000 movie ratings for 1682 items from 943

users. For the attack scenario we introduced 30 additional

users, which were Random Bot attackers with the same tar-

get items set. All users have rated at least 20 movies, with

Popular items are already recommended by the system to

some extent, and are therefore rated by a larger numb er of

users. Thus, making them even more popular would usually

imply a too costly eﬀort from the attackers. We further

discuss on thi s choice in Section 5.2.

http://sourceforge.net/projects/multilens/

100

200

300

400

500

600

700

800

900

0 100 200 300 400 500 600 700 800 900 1000

NPD

Users

NPD

"final_sorted_npd_2"

Figure 1: NPD without attackers

200

400

600

800

1000

1200

1400

1600

0 100 200 300 400 500 600 700 800 900 1000

NPD

Users

NPD with random bots

NPD for normal users

NPD for attackers

Figure 2: NPD for a database with attackers

ratings between 1 and 5. The Random Bot attackers rated

the items i n the system as explained in Section 2.

Our experiments show that all the patterns studi ed are

relevant in exposing behavior of the attackers. The values

for the patterns studied are either high, or almost the same

(with a small deviation) for most of attackers, but a few

regular users can also be considered attackers if we use only

one of these patterns to detect attackers. The rest of this

section discusses each of the patterns in detail.

4.2 Using Rating Patterns to Detect Shilling

Attacks

NPD. As discussed in [1], NPD shows a power-law distri-

bution, most of the users having very low NPD, while only

a few are having a very high NPD. This appli es for both

the database with and without attackers. Figure 1 presents

the NPD values for the database without attackers. After

having introduced the attackers (Figure 2), they will slightly

decrease for the regular users, leaving the top NPD value to

the malicious users. This is because the attackers are very

similar to many users and removing them from the neigh-

borhood of a user would result in prediction changes for all

these users. However, we notice that they also overlap with

the top 0.3% of the normal users.

Standard Deviation in User’s Ratings. Random Bot

attackers give random ratings within a relatively small inter-

val, centered around 3.6. This distribution of ratings makes

them outstanding from the database, because they are the

only “users” having the Standard Deviation in Ratings close

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

0 100 200 300 400 500 600 700 800 900 1000

Standard agreement

Users

Standard agreement for the database without the RandomBots

"standard_agreement"

Figure 3: Degree of Agreement for a da taba se with-

out attackers

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

0 100 200 300 400 500 600 700 800 900 1000

Standard agreement

Users

Standard agreement for the database with the RandomBots

"standard_agreement_randombots"

Figure 4: Degree of agreement for a database with

attackers. Notice that the variation of the degree of

agreement for the attackers is very small (Attackers

have user IDs between 944 and 973).

to 0. Most users have a greater entropy in ratings and some-

times give extreme ratings like the minimum rating. How-

ever, some small percent of users have a small Standard

Deviation in Ratings, so that attackers could disguise their

behavior by i ncreasi ng the entropy in their ratings and thus

escaping detection using this pattern. Still, avoiding de-

tection by increasing entropy will also decrease the power of

attack, as it will decrease the attacker’s similarity with other

users in the system. Therefore, we conclude that analyzing

this pattern for shilling attacks can be useful, as i t will force

malicious users to disguise their attacks, thus reducing the

overall impact of the attack on the entire system.

Degree of Agreement With Similar Users. We have

computed this metric using the top-25 similar users from the

neighborhood formation phase of user-based collaborative

ﬁltering. In spite of the fact that attackers make random

ratings, an interesting pattern we discovered was that they

had al most the same value for this metric (Figures 3 and 4).

Even though other regular users could also have very similar

values, one could use the results from analyzing this pattern

in order to reduce the false positives output by the attacker

detection algorithm.

Average Similarity with Top Neighbors. The av-

erage similarity was also computed over each user’s top-25

neighbors. We discovered that it resembles NPD: it has a

0 100 200 300 400 500 600 700 800 900 1000

Average Similarity

Users

Average Similarity for a database without attackers

Average Similarity

Max_Average_Similarity/2

Figure 5: Average Similarity without attackers

100

0 100 200 300 400 500 600 700 800 900 1000

Average Similarity

Users

Average Similarity for a database with attackers

Average Similarity for normalusers

Average Similarity for attackers

Max_Average_Similarity/2

Figure 6: Average Similarity with attackers

power law distribution and the values for the attackers over-

lap for some small set of users (Figures 5 and 6). While NPD

exhibits scalability issues, computing the average similarity

for each user is much faster, and thus preferable. More,

selecting users that have a greater value than 1/2 of the

maximum average similarity in the system, would include

all the attackers in the output. Therefore, we chose to use

this metric along with RDMA in our improved algorithm

from Section 5.

RDMA. For a database without attackers, very few users

had a high normalized RDMA, which was promisive, since

we exp ected the attackers (once introduced) to have a very

high RDMA. However, RDMA was high for the attackers

only for a small attack size. Once we increased the attack

size to 3%, several normal users had bigger RDMA than

the attackers. This is partially because an attack of such a

large size is enough to radically increase the average rating

for the target items, so that regular users who rated these

items with the minimum rating get in this case an increased

RDMA. Generally, when the attackers give a rating centered

around 3.6 to the items outside the target set, users that

only expressed extreme like (the maximum rating) or utter

dislike (the minimum rating) have an increased RDMA for

a large-scale attack.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600 800 1000

RDMA

Users

RDMA without attackers

RDMA

Figure 7: RDMA without attackers

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600 800 1000

RDMA

Users

RDMA with attackers

RDMA for attackers

RDMA for normal users

Figure 8: RDMA with attackers

5. ENHANCED ALGORITHM FOR

DETECTING SHILLING ATTACKERS

5.1 Description

When computing RDMA, we can use only a subset of the

total users for computing the Average Rating and Number of

Predictions for e ach of the items. For the database without

any attackers, this results in very few users (2, in our exper-

iments) having high RDMA so that removing them from the

database will not aﬀect the quality of the recommendations.

In the following, we describe an improved two step algo-

rithm, which exploits the above mentioned idea: we ﬁrst

compute the average similarity with the top neighbors for

all users and then select for computing RDMA only those

users that have an average similarity smaller than 1/2 of

the maximum average similarity in the system. We then as-

sociate with each value of RDMA a function that evaluates

the probability that the respective user is a shilling attacker.

The algorithm i s depicted below.

Algorithm 2. Enhanced Algorithm for Detecting Shilling Attackers.

01. for each user

02. Compute Average Similarity using

the Pearson Correlation

03. max = the maximum Avera ge Similarity

in the system

04. for each item

05. Compute the average rating and the number

of ratings using user’s u proﬁle if

Av erage Similarity(u) < max/2

Preventing shilling attacks in online recommender systems

Figures

Citations

Recommender systems survey

Evaluating Recommendation Systems

Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness

Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems

Shilling attacks against recommender systems: a comprehensive survey

References

Item-based collaborative filtering recommendation algorithms

GroupLens: an open architecture for collaborative filtering of netnews

Industry Report: Amazon.com Recommendations: Item-to-Item Collaborative Filtering.

Amazon.com recommendations: item-to-item collaborative filtering

Social information filtering: algorithms for automating “word of mouth”

Related Papers (5)

Shilling recommender systems for fun and profit

Classification features for attack detection in collaborative recommender systems

Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness

Collaborative recommendation: A robustness analysis

Shilling attacks against recommender systems: a comprehensive survey

Frequently Asked Questions (11)

Q1. What have the authors contributed in "Preventing shilling attacks in online recommender systems" ?

Q2. What is the effect of using this modification in a user-based collaborative filtering algorithm?

Q3. What are the popular types of algorithms for collaborative filtering?

Q4. What is the overhead for the recommender system?

Q5. What is the reasoning behind the Random Bot attack?

Q6. What is the main idea behind the Random Bot attack?

Q7. What is the RDMA for a large scale attack?

Q8. Why is RDMA higher for a large attack?

Q9. What is the effect of a button on a web based recommender system?

Q10. What is the argument that a collaborative filtering system should not be perceived as malicious?

Q11. How many users have a 0 shilling probability?