What have the authors contributed in "Item-based collaborative filtering recommendation algorithms" ?

In this paper the authors analyze di erent item-based recommendation generation algorithms. Finally, the authors experimentally evaluate their results and compare them to the basic k-nearest neighbor approach. Their experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.

(Open Access) Item-based collaborative filtering recommendation algorithms (2001) | Badrul Sarwar

Item-Based Collaborative Filtering Recommendation

Algorithms

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl

sarwar, karypis, konstan, riedl

@cs.umn.edu

GroupLens Research Group/Army HPC Research Center

Department of Computer Science and Engineering

University of Minnesota, Minneapolis, MN 55455

ABSTRACT

Recommender systems apply knowledge discovery techniques

to the problem of making personalized recommendations for

information, pro ducts or services during a liveinteraction.

These systems, esp ecially the k-nearest neighbor collabora-

tive ltering based ones, are achieving widespread success on

the Web. The tremendous growth in the amountof avail-

able information and the number of visitors to Web sites in

recentyears p oses some key challenges for recommender sys-

tems. These are: pro ducing high quality recommendations,

performing many recommendations p er second for millions

of users and items and achieving high coverage in the face of

data sparsity. In traditional collaborative ltering systems

the amountof work increases with the number of partici-

pants in the system. New recommender system technologies

are needed that can quickly produce high quality recom-

mendations, even for very large-scale problems. To address

these issues we have explored item-based collaborative l-

tering techniques. Item-based techniques rst analyze the

user-item matrix to identify relationships b etween dierent

items, and then use these relationships to indirectly compute

recommendations for users.

In this pap er we analyze dierent item-based recommen-

dation generation algorithms. Welook into dierenttech-

niques for computing item-item similarities (e.g., item-item

correlation vs. cosine similarities between item vectors) and

dierenttechniques for obtaining recommendations from them

(e.g., weighted sum vs. regression model). Finally, weex-

perimentally evaluate our results and compare them to the

basic k-nearest neighbor approach. Our experiments sug-

gest that item-based algorithms provide dramatically better

performance than user-based algorithms, while at the same

time providing better quality than the b est available user-

based algorithms.

1. INTRODUCTION

The amount of information in the world is increasing far

more quickly than our ability to pro cess it. All of us have

known the feeling of b eing overwhelmed by the number of

new bo oks, journal articles, and conference pro ceedings com-

ing out eachyear. Technology has dramatically reduced the

barriers to publishing and distributing information. Now

it is time to create the technologies that can help us sift

WWW10, May 1-5, 2001, Hong Kong.

ACM 1-58113-348-0/01/0005.

through all the available information to nd that whichis

most valuable to us.

One of the most promising suchtechnologies is

col labora-

tive ltering

[19, 27, 14, 16]. Collab orative ltering works by

building a database of preferences for items by users. Anew

user, Neo, is matched against the database to discover

neigh-

bors

, which are other users who have historically had similar

taste to Neo. Items that the neighbors like are then recom-

mended to Neo, as he will probably also like them. Collab-

orative ltering has been very successful in both research

and practice, and in b oth information ltering applications

and E-commerce applications. However, there remain im-

portant research questions in overcoming two fundamental

challenges for collaborative ltering recommender systems.

The rst challenge is to improve the scalability of the col-

laborative ltering algorithms. These algorithms are able to

search tens of thousands of p otential neighbors in real-time,

but the demands of modern systems are to searchtensof

millions of p otential neighbors. Further, existing algorithms

have p erformance problems with individual users for whom

the site has large amounts of information. For instance,

if a site is using browsing patterns as indications of con-

tent preference, it mayhave thousands of data p oints for its

most frequent visitors. These \long user rows" slow down

the number of neighb ors that can b e searched p er second,

further reducing scalability.

The second challenge is to improve the quality of the rec-

ommendations for the users. Users need recommendations

they can trust to help them nd items they will like. Users

will "vote with their feet" by refusing to use recommender

systems that are not consistently accurate for them.

In some ways these twochallenges are in conict, since the

less time an algorithm sp ends searching for neighbors, the

more scalable it will b e, and the worse its quality. For this

reason, it is important to treat the two challenges simul-

taneously so the solutions discovered are b oth useful and

practical.

In this pap er, we address these issues of recommender

systems by applying a dierent approach{item-based algo-

rithm. The b ottleneckinconventional collab orative lter-

ing algorithms is the search for neighbors among a large

user p opulation of p otential neighbors [12]. Item-based al-

gorithms avoid this b ottleneckby exploring the relationships

between items rst, rather than the relationships between

users. Recommendations for users are computed by nding

items that are similar to other items the user has liked. Be-

cause the relationships between items are relatively static,

285

item-based algorithms may b e able to provide the same qual-

ity as the user-based algorithms with less online computa-

tion.

1.1 Related Work

In this section we briey present some of the researchlit-

erature related to collab orative ltering, recommender sys-

tems, data mining and p ersonalization.

Tapestry [10] is one of the earliest implementations of col-

laborative ltering-based recommender systems. This sys-

tem relied on the explicit opinions of people from a close-knit

community,such as an oÆce workgroup. However, recom-

mender system for large communities cannot depend on each

person knowing the others. Later, several ratings-based au-

tomated recommender systems were developed. The Grou-

pLens research system [19, 16] provides a pseudonymous

collaborative ltering solution for Usenet news and movies.

Ringo [27] and Video Recommender [14] are email and web-

based systems that generate recommendations on music and

movies, resp ectively. A sp ecial issue of Communications of

the ACM [20] presents a number of dierent recommender

systems.

Other technologies have also been applied to recommender

systems, including Bayesian networks, clustering, and Hort-

ing. Bayesian networks create a mo del based on a training

set with a decision tree at each no de and edges represent-

ing user information. The mo del can be built o-line over a

matter of hours or days. The resulting model is very small,

very fast, and essentially as accurate as nearest neighbor

methods [6]. Bayesian networks mayprove practical for en-

vironments in whichknowledge of user preferences changes

slowly with resp ect to the time needed to build the model

but are not suitable for environments in which user prefer-

ence models must b e updated rapidly or frequently.

Clustering techniques work by identifying groups of users

who appear to have similar preferences. Once the clusters

are created, predictions for an individual can b e made byav-

eraging the opinions of the other users in that cluster. Some

clustering techniques represent each user with partial par-

ticipation in several clusters. The prediction is then an aver-

age across the clusters, weighted by degree of participation.

Clustering techniques usually pro duce less-p ersonal recom-

mendations than other metho ds, and in some cases, the clus-

ters haveworse accuracy than nearest neighbor algorithms

[6]. Once the clustering is complete, however, p erformance

can b e very goo d, since the size of the group that must b e

analyzed is much smaller. Clustering techniques can also

be applied as a "rst step" for shrinking the candidate set

in a nearest neighb or algorithm or for distributing nearest-

neighbor computation across several recommender engines.

While dividing the p opulation into clusters may hurt the

accuracy or recommendations to users near the fringes of

their assigned cluster, pre-clustering maybe a worthwhile

trade-o between accuracy and throughput.

Horting is a graph-based technique in which no des are

users, and edges between nodes indicate degree of similarity

between two users [1]. Predictions are produced bywalking

the graph to nearbynodesand combining the opinions of

the nearby users. Horting diers from nearest neighbor as

the graph maybewalked through other users who have not

rated the item in question, thus exploring transitive rela-

tionships that nearest neighbor algorithms do not consider.

In one study using synthetic data, Horting pro duced better

predictions than a nearest neighbor algorithm [1].

Schafer et al., [26] present a detailed taxonomy and exam-

ples of recommender systems used in E-commerce and how

they can provide one-to-one personalization and at the same

can capture customer loyalty. Although these systems have

been successful in the past, their widespread use has exp osed

some of their limitations such as the problems of sparsityin

the data set, problems asso ciated with high dimensionality

and so on. Sparsity problem in recommender system has

been addressed in [23, 11]. The problems asso ciated with

high dimensionality in recommender systems havebeendis-

cussed in [4], and application of dimensionality reduction

techniques to address these issues has b een investigated in

[24].

Our work explores the extentto which item-based recom-

menders, a new class of recommender algorithms, are able

to solve these problems.

1.2 Contributions

This paper has three primary researchcontributions:

1. Analysis of the item-based prediction algorithms and

identication of dierentways to implementits sub-

tasks.

2. Formulation of a precomputed mo del of item similarity

to increase the online scalability of item-based recom-

mendations.

3. An exp erimental comparison of the qualityof several

dierent item-based algorithms to the classic user-based

(nearest neighbor) algorithms.

1.3 Organization

The rest of the pap er is organized as follows. The next

section provides a brief background in collab orative lter-

ing algorithms. We rst formally describ e the collaborative

ltering pro cess and then discuss its twovariants memory-

based and mo del-based approaches. We then present some

challenges asso ciated with the memory-based approach. In

section 3, we present the item-based approach and describ e

dierent sub-tasks of the algorithm in detail. Section 4 de-

scribes our exp erimental work. It provides details of our

data sets, evaluation metrics, metho dology and results of

dierent experiments and discussion of the results. The -

nal section provides some concluding remarks and directions

for future research.

2. COLLABORATIVE FILTERING BASED

RECOMMENDER SYSTEMS

Recommender systems

systems apply data analysis tech-

niques to the problem of helping users nd the items they

would like to purchase at E-Commerce sites by pro ducing

a predicted likeliness score or a list of

top{

recommended

items for a given user. Item recommendations can b e made

using dierent methods. Recommendations can be based

on demographics of the users, overall top selling items, or

past buying habit of users as a predictor of future items.

Collaborative Filtering (CF) [19, 27] is the most success-

ful recommendation technique to date. The basic idea of

CF-based algorithms is to provide item recommendations

or predictions based on the opinions of other like-minded

286

users. The opinions of users can be obtained

explicitly

from

the users or by using some

implicit

measures.

2.0.1 Overview of the Collaborative Filtering Pro-

cess

The goal of a collab orative ltering algorithm is to sug-

gest new items or to predict the utility of a certain item for

a particular user based on the user's previous likings and

the opinions of other like-minded users. In a typical CF sce-

nario, there is a list of

users

;::: ;u

and a

list of

items

;::: ;i

. Eachuser

has a list

of items

, which the user has expressed his/her opinions

about. Opinions can be explicitly given by the user as a

rat-

ing score

, generally within a certain numerical scale, or can

be implicitly derived from purchase records, by analyzing

timing logs, by mining web hyperlinks and so on [28, 16].

Note that

I

and it is possible for

to be a

nul l-set

There exists a distinguished user

called the

active

user

for whom the task of a collaborative ltering algorithm

is to nd an item likeliness that can be of two forms.



Prediction

isanumerical value,

a;j

, expressing the

predicted likeliness of item

for the activeuser

. This predicted value is within the same scale (e.g.,

from 1 to 5) as the opinion values provided by



Recommendation

isalistof

items,

I

, that

the active user will like the most. Note that the recom-

mended list must b e on items not already purchased by

the active user, i.e.,

=. This interface of CF

algorithms is also known as

Top-N recommendation

Figure 1 shows the schematic diagram of the collab orative

ltering pro cess. CF algorithms represent the entire



user-item data as a ratings matrix,

. Eachentry

i;j

represents the preference score (ratings) of the

th user on

the

th item. Each individual ratings is within a numerical

scale and it can as well be 0 indicating that the user has

not yet rated that item. Researchers have devised a num-

ber of collaborative ltering algorithms that can b e divided

into two main categories|

Memory-based (user-based)

and

Model-based (item-based)

algorithms [6]. In this section we

provide a detailed analysis of CF-based recommender sys-

tem algorithms.

Memory-based Collab orative Filtering Algorithms

Memory-based algorithms utilize the entire user-item data-

base to generate a prediction. These systems employsta-

tistical techniques to nd a set of users, known as

neigh-

bors

, that have a history of agreeing with the target user

(i.e., they either rate dierent items similarly or they tend

to buy similar set of items). Once a neighborhood of users

is formed, these systems use dierent algorithms to com-

bine the preferences of neighbors to pro duce a prediction or

top-

recommendation for the activeuser. The techniques,

also known as

nearest-neighbor

or user-based collaborative

ltering, are more popular and widely used in practice.

Model-based Collab orative Filtering Algorithms

Mo-

del-based collab orative ltering algorithms provide item rec-

ommendation by rst developing a mo del of user ratings. Al-

gorithms in this category take a probabilistic approach and

envision the collaborative ltering pro cess as computing the

expected value of a user prediction, given his/her ratings

on other items. The mo del building process is p erformed

by dierent

machine learning

algorithms suchas

Bayesian

network, clustering,

and

rule-based

approaches. The

Bayesian network model [6] formulates a probabilistic mo del

for collab orative ltering problem. Clustering mo del treats

collaborative ltering as a classication problem [2, 6, 29]

and works by clustering similar users in same class and esti-

mating the probability that a particular user is in a partic-

ular class

, and from there computes the conditional prob-

ability of ratings. The rule-based approach applies associ-

ation rule discovery algorithms to nd asso ciation b etween

co-purchased items and then generates item recommenda-

tion based on the strength of the association b etween items

[25].

2.0.2 Challenges of User-based Collaborative Filter-

ing Algorithms

User-based collaborative ltering systems havebeenvery

successful in past, but their widespread use has revealed

some potential challenges suchas:



Sparsity.

In practice, many commercial recommender

systems are used to evaluate large item sets (e.g., Ama-

zon.com recommends b o oks and CDnow.com recom-

mends music albums). In these systems, even active

users mayhavepurchased well under 1% of the items

(1% of 2 million b ooks is 20

;

000 bo oks). Accordingly,

a recommender system based on nearest neighbor al-

gorithms may be unable to makeany item recommen-

dations for a particular user. As a result the accuracy

of recommendations maybe poor.



Scalability.

Nearest neighbor algorithms require com-

putation that grows with both the number of users

and the number of items. With millions of users and

items, a typical web-based recommender system run-

ning existing algorithms will suer serious scalability

problems.

The weakness of nearest neighbor algorithm for large,

sparse databases led us to explore alternative recommender

system algorithms. Our rst approach attempted to bridge

the sparsityby incorp orating semi-intelligent ltering agents

into the system [23, 11]. These agents evaluated and rated

each item using syntactic features. By providing a dense rat-

ings set, they helped alleviate coverage and improved qual-

ity. The ltering agent solution, however, did not address

the fundamental problem of p oor relationships among like-

minded but sparse-rating users. To explore that we to ok

an algorithmic approach and used Latent Semantic Index-

ing (LSI) to capture the similaritybetween users and items

in a reduced dimensional space [24, 25]. In this pap er we

look into another technique, the model-based approach, in

addressing these challenges, esp ecially the scalability chal-

lenge. The main idea here is to analyze the user-item repre-

sentation matrix to identify relations between dierent items

and then to use these relations to compute the prediction

score for a given user-item pair. The intuition b ehind this

approach is that a user would be interested in purchasing

items that are similar to the items the user liked earlier

and would tend to avoid items that are similar to the items

the user didn't like earlier. These techniques don't require

to identify the neighborhood of similar users when a rec-

ommendation is requested; as a result they tend to pro-

duce much faster recommendations. Anumber of dierent

287

. .

Input (ratings table)

Active user

Item for which prediction

is sought

Prediction

Recommendation

CF-Algorithm

a,j

(prediction on

item j for the active

user)

, T

, ..., T

} Top-N

list of items for the

active user

Output interface

Figure 1: The Collaborative Filtering Process.

schemes have been prop osed to compute the association b e-

tween items ranging from probabilistic approach [6] to more

traditional item-item correlations [15, 13]. We present a de-

tailed analysis of our approach in the next section.

3. ITEM-BASED COLLABORATIVE FILT-

ERING ALGORITHM

In this section we study a class of item-based recommen-

dation algorithms for producing predictions to users. Unlike

the user-based collab orative ltering algorithm discussed in

Section 2, the item-based approach lo oks into the set of

items the target user has rated and computes how simi-

lar they are to the target item

and then selects

most

Item-based collaborative filtering recommendation algorithms

Figures

Citations

Architecture of the World Wide Web

Fairness in Package-to-Group Recommendations

On bootstrapping recommender systems

Network Location-Aware Service Recommendation with Random Walk in Cyber-Physical Systems.

Rascal: A Recommender Agent for Agile Reuse

References

Indexing by Latent Semantic Analysis

Zero defections: quality comes to services.

GroupLens: an open architecture for collaborative filtering of netnews

Empirical Analysis of Predictive Algorithms for Collaborative Filtering

Empirical analysis of predictive algorithms for collaborative filtering

Related Papers (5)

Empirical analysis of predictive algorithms for collaborative filtering

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

GroupLens: an open architecture for collaborative filtering of netnews

Amazon.com recommendations: item-to-item collaborative filtering

Matrix Factorization Techniques for Recommender Systems

Frequently Asked Questions (1)

Q1. What have the authors contributed in "Item-based collaborative filtering recommendation algorithms" ?