scispace - formally typeset
Search or ask a question
Book ChapterDOI

Personalized communities in a distributed recommender system

02 Apr 2007-Vol. 4425, pp 343-355
TL;DR: A new distributed recommender system based on a user-based filtering algorithm that has been transposed for Peer-to-Peer architectures and adapts its prediction computations to the density of the user neighborhood.
Abstract: The amount of data exponentially increases in information systems and it becomes more and more difficult to extract the most relevant information within a very short time. Among others, collaborative filtering processes help users to find interesting items by modeling their preferences and by comparing them with users having the same tastes. Nevertheless, there are a lot of aspects to consider when implementing such a recommender system. The number of potential users and the confidential nature of some data are taken into account. This paper introduces a new distributed recommender system based on a user-based filtering algorithm. Our model has been transposed for Peer-to-Peer architectures. It has been especially designed to deal with problems of scalability and privacy. Moreover, it adapts its prediction computations to the density of the user neighborhood.

Summary (2 min read)

1 Introduction

  • With the development of information and communication technologies, the size of information systems all over the world has exponentially increased.
  • Collaborative filtering techniques [1] are a good way to cope with this difficulty.
  • This is why the authors introduce an adaptive minimum-correlation threshold of neighborhood which evolves in accordance with user expectations.
  • The authors model has been integrated in a document sharing system called ”SofoS”.1.
  • The distribution of computations and contents matches the constraints of scalability and reactivity.

3 SofoS

  • SofoS is a document platform, using a recommender system to provide users with content.
  • The goal of SofoS is also to assist users to find the most relevant sources of information efficiently.
  • In [7], the authors highlight the fact that there are several types of possible architectures for P2P systems.
  • The authors will illustrate their claims by basing their examples on the random approach even if others may have an added value.
  • The following subsection aims at presenting the AURA Algorithm.

3.1 AURA Algorithm

  • For this reason, the authors have conceived the platform in such a way that users have to open a session with a login and a password before using the application.
  • Then, for each of these profiles, it computes a similarity measure with the personal profile of the active user.
  • The active peer has to contact every 7 peer8 whose ID is in the list ”O”.
  • If the computed correlation coefficient is higher than ”s3” which is the threshold of u3, ua adds id3 to his/her list ”O”.
  • When ua receives this packet, he/she updates the list ”O” by removing id4 since s4 is too high for him/her.

3.2 Adaptive minimum-correlation threshold

  • As shown in the previous subsection, the active user can indirectly define the minimum-correlation threshold that other people must reach in order to be a member of his/her community .
  • In the SofoS interface, a slide bar allows the active user to ask for personalized or generalist recommendations.
  • This allows AURA to know the degree to which it can modify the threshold9.
  • At the same time, the authors update the corresponding values in the population distribution histogram.
  • If the system sets the threshold to 0.1, it means that only peers ui whose correlation coefficient |w(ua, ui)| is higher than 0.1 will be included in the group profile of the active user.

4 Discussion

  • Cranor assumes that an ideal system should be based on an explicit data collection method, transient profiles, user initiated involvment and non-invasive predictions.
  • Only numerical votes are exchanged and the logs of user actions are transient.
  • As regards scalability, their model no longer suffers from limitations since the algorithms used to compute group profiles and predictions are in o(b), where b is the number of commonly valuated items between two users, since computations are made incrementally in a stochastic context.
  • When increasing the threshold in the system, this measure becomes higher.
  • The authors have also evaluated their model in terms of prediction relevancy.

5 Conclusion

  • To cope with numerous problems specific to information retrieval, the authors proposed a Peer-toPeer collaborative filtering model which is totally distributed.
  • The authors show in this paper that they can deal with important problems such as scalability, privacy and quality.
  • The authors algorithm is anytime and incremental.
  • Contrary to PocketLens, their model is user-based because the authors consider that the set of items can change.
  • Moreover, the stochastic context of their model allows the system to update the modified profiles instead of resetting all the knowledge about neighbors.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: inria-00171796
https://hal.inria.fr/inria-00171796
Submitted on 13 Sep 2007
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Personalized Communities in a Distributed
Recommender System
Sylvain Castagnos, Anne Boyer
To cite this version:
Sylvain Castagnos, Anne Boyer. Personalized Communities in a Distributed Recommender System.
29th European Conference on Information Retrieval - ECIR’07, Fondazione Ugo Bordoni; BCS-IRSG;
ACM SIGIR, Apr 2007, Rome, Italy. pp.343-355, �10.1007/978-3-540-71496-5_32�. �inria-00171796�

Personalized Communities in a Distributed
Recommender System
CASTAGNOS Sylvain and BOYER Anne
LORIA - Universit´e Nancy 2
Campus Scientifique - B.P.239
54506 Vandoeuvre-l`es-Nancy Cedex, France
{sylvain.castagnos, anne.boyer}@loria.fr
Abstract. The amount of data exponentially increases in information
systems and it b ecomes more and more difficult to extract the most rel-
evant information within a very short time. Among others, collaborative
filtering processes help users to find interesting items by modeling their
preferences and by comparing them with users having the same tastes.
Nevertheless, there are a lot of aspects to consider when implementing
such a recommender system. The number of potential users and the confi-
dential nature of some data are taken into account. This p aper introduces
a new distributed recommender system based on a user-based filtering al-
gorithm. Our model has been transposed for Peer-to-Peer architectures.
It has been esp ecially designed to deal with problems of scalability and
privacy. Moreover, it adapts its prediction computations to the density
of the user neighborhood.
1 Introduction
With the development of information and communication technologies, the size
of information systems all over the world has exponentially increased. Conse-
quently, it becomes harder and harder for users to identify relevant items in a
reasonable time, even when using a powerful search engine. Collaborative fil-
tering techniques [1] are a good way to cope with this difficulty. It amounts to
identifying the active user to a set of persons having the same tastes, based
on his/her preferences and his/her past actions. This system starts from the
principle that users who liked the same items have the same topics of interest.
Thus, it is possible to predict the relevancy of data for the active user by taking
advantage of experiences of a similar population.
There are several fundamental problems when implementing a collaborative
filtering algorithm. In this paper, we particularly pay attention to the following
significant limitations for industrial use:
scalability and system reactivity: there are potentially several thousand users
and items to manage in real time;
intrusions into privacy: we have to be careful to be as unintrusive as possible
and at least to guarantee the anonymity of users;

2
novelty in predictions: according to the context, users want to have more
or less new recommendations. Sometimes their main concern is to retrieve
the items that they have high-rated, even if it means having less new rec-
ommendations. This is why we introduce an adaptive minimum-correlation
threshold of neighborhood which evolves in accordance with user expecta-
tions.
We propose an algorithm which is based on an analysis of usage. It relies on
a distributed user-based collaborative filtering technique. Our model has been
integrated in a document sharing system called ”SofoS”.
1
Our algorithm is implemented on a Peer-to-Peer architecture because of the
document platform context. In a lot of companies, documents are referenced
using a common codification that may require a central server
2
but are stored
on users’ devices. The distribution of computations and contents matches the
constraints of scalability and reactivity.
In this paper, we will first present the related work on collaborative filtering
approaches. We will then introduce our Peer-to-Peer user-centered model which
offers the advantage of being fully distributed. We called this model ”Adaptive
User-centered Recommender Algorithm” (AURA). It provides a service which
builds a virtual community of interests centered on the active user by selecting
his/her nearest neighbors. As the model is ego-centered, the active user can
define the expected prediction quality by specifying the minimum-correlation
threshold. AURA is an anytime algorithm which furthermore requires very few
computation time and memory space. As we want to constantly improve our
model and the document sharing platform, we are incrementally and modularly
developing them on a JXTA platform
3
.
2 Related work
In centralized collaborative filtering approaches, finding the closest neighbors
among several thousands of candidates in real time without offline computations
may be unrealistic [2]. By contrast, decentralization of data is practical to comply
with privacy rules, as long as anonymity is fulfilled [3]. This is the reason why
more and more researchers investigate various means of distributing collaborative
filtering algorithms. This also presents the advantage of giving the property of
profiles to users, so that they can be re-used in several applications.
4
We can
mention research on P2P architectures, multi-agents systems and decentralized
models (client/server, shared databases).
There are several ways to classify collaborative filtering algorithms. In [4],
authors have identified, among existing techniques, two major classes of algo-
rithms: memory-based and model-based algorithms. Memory-based techniques
1
SofoS is the acronym for ”Sharing Our Files On the System”.
2
This allows to have document IDs and to identify them easily.
3
http://www.jxta.org/
4
As the owner of the profile, the user can apply it to different pieces of software. In
centralized approaches, there must be as many profiles as services for one user.

3
offer the advantage of being very reactive, by immediately integrating mo di-
fications of users profiles into the system. They also guarantee the quality of
recommendations. However, Breese et al. [4] are unanimous in thinking that
their scalability is problematic: even if these methods work well with small-sized
examples, it is difficult to change to situations characterized by a great num-
ber of documents or users. Indeed, time and space complexities of algorithms
are serious considerations for big databases. According to Pennock et al. [5],
model-based algorithms constitute an alternative to the problem of combina-
torial complexity. Furthermore, they perceive in these models an added value
beyond the function of prediction: they highlight some correlations in data, thus
proposing an intuitive r eason for recommendations or simply making the hy-
potheses more explicit. However, these methods are not dynamic enough and
they react badly to insertion of new contents into the database. Moreover, they
require a penalizing learning phase for the user.
Another way to classify collaborative filtering techniques is to consider user-
based methods in opposition to item-based algorithms. For example, we have
explored a distributed user-based approach within a client/server context in [6].
In this mo del, implicit criteria are used to generate explicit ratings. These votes
are anonymously sent to the server. An offline clustering algorithm is then ap-
plied and group profiles are sent to clients. The identification phase is done on
the client side in order to cope with privacy. This model also deals with sparsity
and scalability. We highlight the added value of a user-based approach in the
situation where users are relatively stable, whereas the set of items may often
vary considerably. On the contrary, Miller et al.[7] show the great potential of
distributed item-based algorithms. They propose a P2P version of the item-item
algorithm. In this way, they address the problems of portability (even on mobile
devices), privacy and security with a high quality of recommendations. Their
model can adapt to different P2P configurations.
Beyond the different possible implementations, we can see there are a lot
of open questions raised by industrial use of collab orative filtering. Canny [3]
concentrates on ways to provide powerful privacy protection by computing a
”public” aggregate for each community without disclosing individual users’ data.
Furthermore, his approach is based on homomorphic encryption to protect per-
sonal data and on a probabilistic factor analysis model which handles missing
data without requiring default values for them. Privacy protection is provided
by a P2P protocol. Berkovsky et al. [8] also deal with privacy concern in P2P
recommender systems. They address the problem by electing super-peers whose
role is to compute an average profile of a sub-population. Standard peers have
to contact all these super-peers and to exploit these average profiles to compute
predictions. In this way, they never access the public pr ofile of a particular user.
We can also cite the work of Han et al.[9], which addresses the problem of privacy
protection and scalability in a distributed collaborative filtering algorithm called
PipeCF. Both user database management and prediction computation are split
between several devices. This approach has been implemented on Peer-to-Peer
overlay networks through a distributed hash table method.

4
In this paper, we introduce a new hybrid method called AURA. It combines
the reactivity of memory-based techniques with the data correlation of model-
based approaches by using an iterative clustering algorithm. Moreover, AURA is
a user-based model which is completely distributed on the user scale. It has been
integrated in the SofoS document platform and relies on a P2P architecture in
order to distribute either prediction computations, content or profiles. We design
our model to tackle, among others, the problems of scalability, and privacy.
3 SofoS
SofoS is a document platform, using a recommender system to provide users with
content. Once it is installed, users can share and/or search documents, as they do
on P2P applications like Napster. We conceive it in such a way that it is as open
as possible to different existing kinds of data: hypertext files, documents, music,
videos, etc. The goal of SofoS is also to assist users to find the most relevant
sources of information efficiently. This is why we add the AURA recommender
module to the s ystem. We assume that users can get pieces of information either
by using our sys tem or by going surfing on the web. SofoS consequently enables
to take visited websites into account in the prediction computations.
We are implementing SofoS in a generic environment for Peer-to-Peer ser-
vices, called JXTA. This choice is motivated by the fact it is greatly used in our
research community.
In [7], the authors highlight the fact that there are several types of possible
architectures for P2P systems. We can cite those with a central server (such
as Napster), random discovery ones
5
(such as Gnutella or KaZaA), transitive
traversal architectures, content addressable structures and secure blackboards.
We conceived our model with the idea that it could be adapted to different
types of architectures. However, in this paper, we will illustrate our claims by
basing our examples on the random approach even if others may have an added
value. The following subsection aims at presenting the AURA Algorithm.
3.1 AURA Algorithm
We presume that each peer in SofoS corresponds to a single user on a given
device.
6
For this reason, we have conceived the platform in such a way that users
have to open a session with a login and a password before using the application.
In this way, several persons can use the same computer (for example, the different
members of a family) without disrupting their respective profiles. That is why
each user on a given peer of the system has his/her own profile and a single ID.
The session data remain on the local machine in order to enhance privacy. There
5
Some of these architectures ar e totally distributed. Others mixed centralized and
distributed approaches but elect super-p eers whose role is to partially manage sub-
groups of peers in the system.
6
We can easily distinguish devices since SofoS has to be installed on users’ computers.

Citations
More filters
Journal ArticleDOI
TL;DR: This work is a survey of parallel and distributed collaborative filtering implementations, aiming to not only provide a comprehensive presentation of the field's development but also offer future research directions by highlighting the issues that need to be developed further.
Abstract: Collaborative filtering is among the most preferred techniques when implementing recommender systems. Recently, great interest has turned toward parallel and distributed implementations of collaborative filtering algorithms. This work is a survey of parallel and distributed collaborative filtering implementations, aiming to not only provide a comprehensive presentation of the field's development but also offer future research directions by highlighting the issues that need to be developed further.

46 citations


Cites methods from "Personalized communities in a distr..."

  • ...As far as the evaluation of the implementations is concerned, initially algorithmic accuracy was the main interest, which was measured by MAE metric....

    [...]

  • ...Both MovieLens and Flixster datasets are used for measuring the algorithm’s accuracy using the MAE metric as well as the RMSE for various values of power users’ interaction number....

    [...]

  • ...The most popular collaborative filtering algorithms are described and their MAE and RMSE is presented, as long as their execution time....

    [...]

  • ...Accuracy can measure how well a recommender system predicts a rating and is measured by means of Mean Absolute Error (MAE) or Round Mean Squared Error (RMSE)....

    [...]

  • ...Algorithm Technologies Datasets Metrics [105] User-based CF Java N/A N/A [49] PipeCF Distributed EachMovie MAE [50] Hash Table [78] PocketLens Chord architecture MovieLens Neighborhood similarity Item-based for P2P file sharing MAE, recall, coverage networks Memory usage, prediction time [16] Traditional CF Loud Voice Platform MovieLens MAE [109] User-Item N/A Audioscrobbler Coverage Relevance Model Precision [18] Distributed Hierarchical Java simulation MovieLens MAE Neighborhood Formation EachMovie in the CF algorithm Jester [113] DCFLA Algorithmic simulation EachMovie MAE [17] Distributed storage Java simulation MovieLens MAE of user profiles [19] Item Clustering Java simulation EachMovie MAE [30] User-based JXTA MovieLens MAE AURA Platform Computation time [94] Affinity networks Modification of self collected Average Phex (Java file sharing ap....

    [...]

Proceedings ArticleDOI
23 Oct 2011
TL;DR: This work designs a distributed mechanism for predicting user ratings that avoids the disclosure of information to a centralized authority or an untrusted third party, and proposes a distributed gradient descent algorithm for its solution that abides with the above restriction on how information is exchanged between users.
Abstract: Recommender systems predict user preferences based on a range of available information For systems in which users generate streams of content (eg, blogs, periodically-updated newsfeeds), users may rate the produced content that they read, and be given accurate predictions about future content they are most likely to prefer We design a distributed mechanism for predicting user ratings that avoids the disclosure of information to a centralized authority or an untrusted third party: users disclose the rating they give to certain content only to the user that produced this contentWe demonstrate how rating prediction in this context can be formulated as a matrix factorization problem Using this intuition, we propose a distributed gradient descent algorithm for its solution that abides with the above restriction on how information is exchanged between users We formally analyse the convergence properties of this algorithm, showing that it reduces a weighted root mean square error of the accuracy of predictions Although our algorithm may be used many different ways, we evaluate it on the Neflix data set and prediction problem as a benchmark In addition to the improved privacy properties that stem from its distributed nature, our algorithm is competitive with current centralized solutions Finally, we demonstrate the algorithm's fast convergence in practice by conducting an online experiment with a prototype user-generated content exchange system implemented as a Facebook application

35 citations

Posted Content
TL;DR: This thesis identifies four core functions for recommendation systems and examines the added value of algorithmic strategies and recommendation systems according to its core functions, and develops a methodology for analyzing the performance of recommender systems in industrial context.
Abstract: This thesis consists of four parts: - An analysis of the core functions and the prerequisites for recommender systems in an industrial context: we identify four core functions for recommendation systems: Help do Decide, Help to Compare, Help to Explore, Help to Discover. The implementation of these functions has implications for the choices at the heart of algorithmic recommender systems. - A state of the art, which deals with the main techniques used in automated recommendation system: the two most commonly used algorithmic methods, the K-Nearest-Neighbor methods (KNN) and the fast factorization methods are detailed. The state of the art presents also purely content-based methods, hybridization techniques, and the classical performance metrics used to evaluate the recommender systems. This state of the art then gives an overview of several systems, both from academia and industry (Amazon, Google ...). - An analysis of the performances and implications of a recommendation system developed during this thesis: this system, Reperio, is a hybrid recommender engine using KNN methods. We study the performance of the KNN methods, including the impact of similarity functions used. Then we study the performance of the KNN method in critical uses cases in cold start situation. - A methodology for analyzing the performance of recommender systems in industrial context: this methodology assesses the added value of algorithmic strategies and recommendation systems according to its core functions.

26 citations


Cites methods from "Personalized communities in a distr..."

  • ...catalogs of several tens of thousands items. For larger catalogs however, other techniques become necessary. Several approaches are currently studied: grid-based collaborative filtering such as AURA (Castagnos and Boyer, 2007) and fast approximate KNN search techniques such as LSH (Gionis et al., 1999), and MinHash (Cohen et al. 2001). We are also considering using Gravity and a classical clustering method to compute fast ...

    [...]

Journal ArticleDOI
TL;DR: A novel parallel recommender system based on collaborative filtering with correntropy that could effectively improve the computational time and achieve satisfactory performance though invalid data existed and the Spark framework was employed to facilitate parallel computing.
Abstract: Recently, the extraction of valid information from big data has witnessed a growing interest. Nowadays, in social networks, large parts of websites collect user profiles to provide some valuable information through personalized recommendation. Among the available recommendation algorithms, collaborative filtering (CF) is one of the most popular algorithms due to its simple framework. However, in some practices, the computational time of CF may be unsatisfactory. Meanwhile, in some cases there are noises in data, i.e., some data are invalid, it also has a great impact on algorithm performance. To speed up the time it takes to make recommendation and tackle the noise issue more effectively, we developed a novel parallel recommender system based on CF with correntropy. Instead of traditional measures used in recommendation algorithms, the correntropy was employed to compute the similarity of two items or users to achieve insensitive performance to outliers. Moreover, to reduce the computational cost, we employed the Spark framework to facilitate parallel computing. The experiments on three datasets consisting data collected from actual social networks were conducted and the experimental results showed that for social networks application, the proposed system could effectively improve the computational time and achieve satisfactory performance though invalid data existed.

20 citations


Cites methods from "Personalized communities in a distr..."

  • ...In [48], a user-centered filtering algorithm was introduced by taking the number of latent users and the confidential nature into account....

    [...]

Book ChapterDOI
01 Jan 2012
TL;DR: This chapter proposes to automatically select the adequate set of users in the network of users to address the cold-start problem of collaborative filtering, and considers two kinds of delegates: mentors and leaders.
Abstract: Recommender systems aim at suggesting to users items that fit their preferences. Collaborative filtering is one of the most popular approaches of recommender systems; it exploits users' ratings to express preferences. Traditional approaches of collaborative filtering suffer from the cold-start problem: when a new item enters the system, it cannot be recommended while a sufficiently high number of users have rated it. The quantity of required ratings is not known a priori and may be high as it depends on who rates the items. In this chapter, the authors propose to automatically select the adequate set of users in the network of users to address the cold-start problem. They call them the "delegates", and they correspond to those who should rate a new item first so as to reliably deduce the ratings of other users on this item. They propose to address this issue as an opinion poll problem. The authors consider two kinds of delegates: mentors and leaders. They experiment some measures, classically exploited in social networks, to select the adequate set of delegates. The experiments conducted show that only 6 delegates are sufficient to accurately estimate ratings of the whole set of other users, which dramatically reduces the number of users classically required.

16 citations

References
More filters
Proceedings ArticleDOI
01 Apr 2001
TL;DR: This paper analyzes item-based collaborative ltering techniques and suggests that item- based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.
Abstract: Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative ltering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative ltering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative ltering techniques. Item-based techniques rst analyze the user-item matrix to identify relationships between di erent items, and then use these relationships to indirectly compute recommendations for users. In this paper we analyze di erent item-based recommendation generation algorithms. We look into di erent techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and di erent techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we experimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.

8,634 citations

Proceedings ArticleDOI
22 Oct 1994
TL;DR: GroupLens is a system for collaborative filtering of netnews, to help people find articles they will like in the huge stream of available articles, and protect their privacy by entering ratings under pseudonyms, without reducing the effectiveness of the score prediction.
Abstract: Collaborative filters help people make choices based on the opinions of other people. GroupLens is a system for collaborative filtering of netnews, to help people find articles they will like in the huge stream of available articles. News reader clients display predicted scores and make it easy for users to rate articles after they read them. Rating servers, called Better Bit Bureaus, gather and disseminate the ratings. The rating servers predict scores based on the heuristic that people who agreed in the past will probably agree again. Users can protect their privacy by entering ratings under pseudonyms, without reducing the effectiveness of the score prediction. The entire architecture is open: alternative software for news clients and Better Bit Bureaus can be developed independently and can interoperate with the components we have developed.

5,644 citations

Posted Content
TL;DR: In this article, the authors compare the predictive accuracy of various methods in a set of representative problem domains, including correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods.
Abstract: Collaborative filtering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental protocols, and the 2 evaluation metrics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.

4,883 citations

Proceedings Article
24 Jul 1998
TL;DR: Several algorithms designed for collaborative filtering or recommender systems are described, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods, to compare the predictive accuracy of the various methods in a set of representative problem domains.
Abstract: Collaborative filtering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental protocols, and the 2 evaluation metr rics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.

4,557 citations

Journal ArticleDOI
TL;DR: Tapestry is intended to handle any incoming stream of electronic documents and serves both as a mail filter and repository; its components are the indexer, document store, annotation store, filterer, little box, remailer, appraiser and reader/browser.
Abstract: The Tapestry experimental mail system developed at the Xerox Palo Alto Research Center is predicated on the belief that information filtering can be more effective when humans are involved in the filtering process. Tapestry was designed to support both content-based filtering and collaborative filtering, which entails people collaborating to help each other perform filtering by recording their reactions to documents they read. The reactions are called annotations; they can be accessed by other people’s filters. Tapestry is intended to handle any incoming stream of electronic documents and serves both as a mail filter and repository; its components are the indexer, document store, annotation store, filterer, little box, remailer, appraiser and reader/browser. Tapestry’s client/server architecture, its various components, and the Tapestry query language are described.

4,299 citations

Frequently Asked Questions (2)
Q1. What have the authors contributed in "Personalized communities in a distributed recommender system" ?

This paper introduces a new distributed recommender system based on a user-based filtering algorithm. 

The authors plan on validating these points by testing their model with real users in real conditions. Contrary to PocketLens, their model is user-based because the authors consider that the set of items can change. Even 12 if an item is deleted, the authors can continue to exploit its ratings in the prediction computations. Currently, the authors are developing their protocols further to cope with other limitations, such as trust and security aspects by using specific communication protocols as in [ 13 ].