scispace - formally typeset
Search or ask a question

Showing papers on "Recommender system published in 2009"


Journal ArticleDOI
TL;DR: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.
Abstract: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels

9,583 citations


Journal ArticleDOI
TL;DR: From basic techniques to the state-of-the-art, this paper attempts to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.
Abstract: As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, modelbased, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.

3,406 citations


Proceedings ArticleDOI
Yehuda Koren1
28 Jun 2009
TL;DR: Two leading collaborative filtering recommendation approaches are revamp and a more sensitive approach is required, which can make better distinctions between transient effects and long term patterns.
Abstract: Customer preferences for products are drifting over time. Product perception and popularity are constantly changing as new selection emerges. Similarly, customer inclinations are evolving, leading them to ever redefine their taste. Thus, modeling temporal dynamics should be a key when designing recommender systems or general customer preference models. However, this raises unique challenges. Within the eco-system intersecting multiple products and customers, many different characteristics are shifting simultaneously, while many of them influence each other and often those shifts are delicate and associated with a few data instances. This distinguishes the problem from concept drift explorations, where mostly a single concept is tracked. Classical time-window or instance-decay approaches cannot work, as they lose too much signal when discarding data instances. A more sensitive approach is required, which can make better distinctions between transient effects and long term patterns. The paradigm we offer is creating a model tracking the time changing behavior throughout the life span of the data. This allows us to exploit the relevant components of all data instances, while discarding only what is modeled as being irrelevant. Accordingly, we revamp two leading collaborative filtering recommendation approaches. Evaluation is made on a large movie rating dataset by Netflix. Results are encouraging and better than those previously reported on this dataset.

1,621 citations


Proceedings ArticleDOI
28 Jun 2009
TL;DR: A random walk model combining the trust-based and the collaborative filtering approach for recommendation is proposed, which allows us to define and to measure the confidence of a recommendation.
Abstract: Collaborative filtering is the most popular approach to build recommender systems and has been successfully employed in many applications. However, it cannot make recommendations for so-called cold start users that have rated only a very small number of items. In addition, these methods do not know how confident they are in their recommendations. Trust-based recommendation methods assume the additional knowledge of a trust network among users and can better deal with cold start users, since users only need to be simply connected to the trust network. On the other hand, the sparsity of the user item ratings forces the trust-based approach to consider ratings of indirect neighbors that are only weakly trusted, which may decrease its precision. In order to find a good trade-off, we propose a random walk model combining the trust-based and the collaborative filtering approach for recommendation. The random walk model allows us to define and to measure the confidence of a recommendation. We performed an evaluation on the Epinions dataset and compared our model with existing trust-based and collaborative filtering methods.

869 citations


Proceedings ArticleDOI
19 Jul 2009
TL;DR: This work proposes a novel probabilistic factor analysis framework, which naturally fuses the users' tastes and their trusted friends' favors together and coin the term Social Trust Ensemble to represent the formulation of the social trust restrictions on the recommender systems.
Abstract: As an indispensable technique in the field of Information Filtering, Recommender System has been well studied and developed both in academia and in industry recently. However, most of current recommender systems suffer the following problems: (1) The large-scale and sparse data of the user-item matrix seriously affect the recommendation quality. As a result, most of the recommender systems cannot easily deal with users who have made very few ratings. (2) The traditional recommender systems assume that all the users are independent and identically distributed; this assumption ignores the connections among users, which is not consistent with the real world recommendations. Aiming at modeling recommender systems more accurately and realistically, we propose a novel probabilistic factor analysis framework, which naturally fuses the users' tastes and their trusted friends' favors together. In this framework, we coin the term Social Trust Ensemble to represent the formulation of the social trust restrictions on the recommender systems. The complexity analysis indicates that our approach can be applied to very large datasets since it scales linearly with the number of observations, while the experimental results show that our method performs better than the state-of-the-art approaches.

849 citations


Proceedings ArticleDOI
Frank McSherry1, Ilya Mironov1
28 Jun 2009
TL;DR: This work considers the problem of producing recommendations from collective user behavior while simultaneously providing guarantees of privacy for these users, and finds that several of the leading approaches in the Netflix Prize competition can be adapted to provide differential privacy, without significantly degrading their accuracy.
Abstract: We consider the problem of producing recommendations from collective user behavior while simultaneously providing guarantees of privacy for these users. Specifically, we consider the Netflix Prize data set, and its leading algorithms, adapted to the framework of differential privacy.Unlike prior privacy work concerned with cryptographically securing the computation of recommendations, differential privacy constrains a computation in a way that precludes any inference about the underlying records from its output. Such algorithms necessarily introduce uncertainty--i.e., noise--to computations, trading accuracy for privacy.We find that several of the leading approaches in the Netflix Prize competition can be adapted to provide differential privacy, without significantly degrading their accuracy. To adapt these algorithms, we explicitly factor them into two parts, an aggregation/learning phase that can be performed with differential privacy guarantees, and an individual recommendation phase that uses the learned correlations and an individual's data to provide personalized recommendations. The adaptations are non-trivial, and involve both careful analysis of the per-record sensitivity of the algorithms to calibrate noise, as well as new post-processing steps to mitigate the impact of this noise.We measure the empirical trade-off between accuracy and privacy in these adaptations, and find that we can provide non-trivial formal privacy guarantees while still outperforming the Cinematch baseline Netflix provides.

750 citations


Proceedings ArticleDOI
19 Jul 2009
TL;DR: This work created a collaborative recommendation system that effectively adapts to the personal information needs of each user, and adopts the generic framework of Random Walk with Restarts in order to provide with a more natural and efficient way to represent social networks.
Abstract: Social network systems, like last.fm, play a significant role in Web 2.0, containing large amounts of multimedia-enriched data that are enhanced both by explicit user-provided annotations and implicit aggregated feedback describing the personal preferences of each user. It is also a common tendency for these systems to encourage the creation of virtual networks among their users by allowing them to establish bonds of friendship and thus provide a novel and direct medium for the exchange of data. We investigate the role of these additional relationships in developing a track recommendation system. Taking into account both the social annotation and friendships inherent in the social graph established among users, items and tags, we created a collaborative recommendation system that effectively adapts to the personal information needs of each user. We adopt the generic framework of Random Walk with Restarts in order to provide with a more natural and efficient way to represent social networks. In this work we collected a representative enough portion of the music social network last.fm, capturing explicitly expressed bonds of friendship of the user as well as social tags. We performed a series of comparison experiments between the Random Walk with Restarts model and a user-based collaborative filtering method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard collaborative filtering method.

633 citations


Journal Article
TL;DR: This paper reviews the proper construction of offline experiments for deciding on the most appropriate algorithm, and discusses three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task.
Abstract: Recommender systems are now popular both commercially and in the research community, where many algorithms have been suggested for providing recommendations. These algorithms typically perform differently in various domains and tasks. Therefore, it is important from the research perspective, as well as from a practical view, to be able to decide on an algorithm that matches the domain and the task of interest. The standard way to make such decisions is by comparing a number of algorithms offline using some evaluation metric. Indeed, many evaluation metrics have been suggested for comparing recommendation algorithms. The decision on the proper evaluation metric is often critical, as each metric may favor a different algorithm. In this paper we review the proper construction of offline experiments for deciding on the most appropriate algorithm. We discuss three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task. We demonstrate how using an improper evaluation metric can lead to the selection of an improper algorithm for the task of interest. We also discuss other important considerations when designing offline experiments.

580 citations


Journal Article
TL;DR: This work proposes various scalable solutions that are validated against the Netflix Prize data set, currently the largest publicly available collection of CF techniques, and proposes various matrix factorization (MF) based techniques.
Abstract: The collaborative filtering (CF) using known user ratings of items has proved to be effective for predicting user preferences in item selection. This thriving subfield of machine learning became popular in the late 1990s with the spread of online services that use recommender systems, such as Amazon, Yahoo! Music, and Netflix. CF approaches are usually designed to work on very large data sets. Therefore the scalability of the methods is crucial. In this work, we propose various scalable solutions that are validated against the Netflix Prize data set, currently the largest publicly available collection. First, we propose various matrix factorization (MF) based techniques. Second, a neighbor correction method for MF is outlined, which alloys the global perspective of MF and the localized property of neighbor based approaches efficiently. In the experimentation section, we first report on some implementation issues, and we suggest on how parameter optimization can be performed efficiently for MFs. We then show that the proposed scalable approaches compare favorably with existing ones in terms of prediction accuracy and/or required training time. Finally, we report on some experiments performed on MovieLens and Jester data sets.

491 citations


Proceedings ArticleDOI
Jilin Chen1, Werner Geyer2, Casey Dugan2, Michael Muller2, Ido Guy2 
04 Apr 2009
TL;DR: Algorithms based on social network information were able to produce better-received recommendations and find more known contacts for users, while algorithms using similarity of user-created content were stronger in discovering new friends.
Abstract: This paper studies people recommendations designed to help users find known, offline contacts and discover new friends on social networking sites. We evaluated four recommender algorithms in an enterprise social networking site using a personalized survey of 500 users and a field study of 3,000 users. We found all algorithms effective in expanding users' friend lists. Algorithms based on social network information were able to produce better-received recommendations and find more known contacts for users, while algorithms using similarity of user-created content were stronger in discovering new friends. We also collected qualitative feedback from our survey users and draw several meaningful design implications.

487 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examine the effect of recommender systems on the diversity of sales and show that it is possible for individual-level diversity to increase but aggregate diversity to decrease.
Abstract: This paper examines the effect of recommender systems on the diversity of sales. Two anecdotal views exist about such effects. Some believe recommenders help consumers discover new products and thus increase sales diversity. Others believe recommenders only reinforce the popularity of already-popular products. This paper seeks to reconcile these seemingly incompatible views. We explore the question in two ways. First, modeling recommender systems analytically allows us to explore their path-dependent effects. Second, turning to simulation, we increase the realism of our results by combining choice models with actual implementations of recommender systems. We arrive at three main results. First, some well-known recommenders can lead to a reduction in sales diversity. Because common recommenders (e.g., collaborative filters) recommend products based on sales and ratings, they cannot recommend products with limited historical data, even if they would be rated favorably. In turn, these recommenders can create a rich-get-richer effect for popular products and vice versa for unpopular ones. This bias toward popularity can prevent what may otherwise be better consumer-product matches. That diversity can decrease is surprising to consumers who express that recommendations have helped them discover new products. In line with this, result two shows that it is possible for individual-level diversity to increase but aggregate diversity to decrease. Recommenders can push each person to new products, but they often push users toward the same products. Third, we show how basic design choices affect the outcome, and thus managers can choose recommender designs that are more consistent with their sales goals and consumers' preferences.

Proceedings ArticleDOI
Seung-Taek Park1, Wei Chu2
23 Oct 2009
TL;DR: P predictive feature-based regression models are proposed that leverage all available information of users and items, such as user demographic information and item content features, to tackle cold-start problems and scale efficiently as a linear function of the number of observations.
Abstract: Recommender systems are widely used in online e-commerce applications to improve user engagement and then to increase revenue. A key challenge for recommender systems is providing high quality recommendation to users in ``cold-start" situations. We consider three types of cold-start problems: 1) recommendation on existing items for new users; 2) recommendation on new items for existing users; 3) recommendation on new items for new users. We propose predictive feature-based regression models that leverage all available information of users and items, such as user demographic information and item content features, to tackle cold-start problems. The resulting algorithms scale efficiently as a linear function of the number of observations. We verify the usefulness of our approach in three cold-start settings on the MovieLens and EachMovie datasets, by comparing with five alternatives including random, most popular, segmented most popular, and two variations of Vibes affinity algorithm widely used at Yahoo! for recommendation.

Proceedings ArticleDOI
20 Apr 2009
TL;DR: Algorithms combining tags with recommenders may deliver both the automation inherent in recommenders, and the flexibility and conceptual comprehensibility inherent in tagging systems, and they may lead to flexible recommender systems that leverage the characteristics of items users find most important.
Abstract: Tagging has emerged as a powerful mechanism that enables users to find, organize, and understand online entities. Recommender systems similarly enable users to efficiently navigate vast collections of items. Algorithms combining tags with recommenders may deliver both the automation inherent in recommenders, and the flexibility and conceptual comprehensibility inherent in tagging systems. In this paper we explore tagommenders, recommender algorithms that predict users' preferences for items based on their inferred preferences for tags. We describe tag preference inference algorithms based on users' interactions with tags and movies, and evaluate these algorithms based on tag preference ratings collected from 995 MovieLens users. We design and evaluate algorithms that predict users' ratings for movies based on their inferred tag preferences. Our tag-based algorithms generate better recommendation rankings than state-of-the-art algorithms, and they may lead to flexible recommender systems that leverage the characteristics of items users find most important.

Journal Article
TL;DR: This paper aims to compute on-line automatic recommendations to an active learner based on his/her recent navigation history, as well as exploiting similarities and dissimilarities among user preferences and among the contents of the learning resources.
Abstract: Introduction Up to the very recent years, most e-learning systems have not been personalized Several works have addressed the need for personalization in the e-learning domain However, even today, personalization systems are still mostly confined to research labs, and most of the current e-learning platforms are still delivering the same educational resources in the same way to learners with different profiles In general, to enable personalization, existing systems used one or more types of knowledge (learners' knowledge, learning material knowledge, learning process knowledge, etc) Generally, personalization in e-learning systems concerns: adaptive interaction, adaptive course delivery, content discovery and assembly, and adaptive collaboration support The category of adaptive course delivery represents the most common and widely used collection of adaptation techniques applied in e-learning systems today Typical examples include dynamic course re-structuring and adaptive selection of learning objects, as well as adaptive navigation support, which have all benefited from the rise of using recommendation strategies to generate new and relevant links and items In fact, one of the new forms of personalization in e-learning environment is to give recommendations to learners in order to support and help them through the e-learning process A number of personalized systems have relied on explicit information given by a learner (demographic, questionnaire, etc) and have applied known methods and techniques of adapting the presentation and navigation (Chorfi et al, 2004) As explained in (Brusilovsky, 1996), two different classes of adaptation can be considered: adaptive presentation and adaptive navigation support Later, in (Brusilovsky, 2001), the taxonomy of adaptive hypermedia technologies was updated to add some extensions in relation with new technologies Then, the distinction between two modes of adaptive navigation support became a necessity, especially with the growth of recommender systems Automatic recommendation implies that the user profiles are created and eventually maintained dynamically by the system without explicit user information Examples include amazoncom's personalized recommendations and music recommenders like Mystrandcom in commercial systems (Mobasher 2006), smart recommenders in e-learning (Zaiane, 2002), etc In general, such systems differ in the input data, in user modeling strategies, and in prediction techniques Several approaches for automatic personalization have been reported in the literature, such as content-based or item-based filtering, collaborative filtering, rule-based filtering, and techniques relying on Web usage mining, etc (Nasraoui, 2005) Web recommender systems can be categorized depending on these approaches Content-based filtering (or item-based filtering) systems recommend items to a given user based on the correlation between the content of these items and the preferences of the user (Meteren et al, 2000) This means that the recommended items are considered to be similar to those seen and liked by the same user in the past Thus, there is no notion of a community of users, rather only one user profile is considered while making recommendations Classical examples of systems applying content based filtering approach include among other Personal webwatcher (Mladenic , 1996), syskill and webert (Pazzani et al, 1997), etc Collaborative filtering system recommends items that are liked by other users with similar interests Thus, the exploration of new items is assured by the fact that other similar user profiles are also considered Examples of such systems include GroupLens (Konstan et al, 1997) and (Sarwar et al, 1998) Hybrid recommender systems combine several recommendation strategies to provide better performance than either strategy alone Most hybrids work by combining several input data sources or several recommendation strategies …

Proceedings ArticleDOI
23 Oct 2009
TL;DR: This paper presents the first study of the effect of non-random missing data on collaborative ranking, and extends the previous results regarding the impact ofNon-randomMissingData on collaborative prediction.
Abstract: A fundamental aspect of rating-based recommender systems is the observation process, the process by which users choose the items they rate. Nearly all research on collaborative filtering and recommender systems is founded on the assumption that missing ratings are missing at random. The statistical theory of missing data shows that incorrect assumptions about missing data can lead to biased parameter estimation and prediction. In a recent study, we demonstrated strong evidence for violations of the missing at random condition in a real recommender system. In this paper we present the first study of the effect of non-random missing data on collaborative ranking, and extend our previous results regarding the impact of non-random missing data on collaborative prediction.

Journal ArticleDOI
TL;DR: To achieve this objective, some new equations are designed in the nucleus of the memory-based collaborative filtering, in such a way that the existent equations are extended to collect and process the information relative to the scores obtained by each user in a variable number of level tests.
Abstract: In the context of e-learning recommender systems, we propose that the users with greater knowledge (for example, those who have obtained better results in various tests) have greater weight in the calculation of the recommendations than the users with less knowledge. To achieve this objective, we have designed some new equations in the nucleus of the memory-based collaborative filtering, in such a way that the existent equations are extended to collect and process the information relative to the scores obtained by each user in a variable number of level tests.

Proceedings ArticleDOI
Wei Chu1, Seung-Taek Park1
20 Apr 2009
TL;DR: This work proposes a feature-based machine learning approach to personalized recommendation that is capable of handling the cold-start issue effectively and results in an offline model with light computational overhead compared with other recommender systems that require online re-training.
Abstract: In Web-based services of dynamic content (such as news articles), recommender systems face the difficulty of timely identifying new items of high-quality and providing recommendations for new users. We propose a feature-based machine learning approach to personalized recommendation that is capable of handling the cold-start issue effectively. We maintain profiles of content of interest, in which temporal characteristics of the content, e.g. popularity and freshness, are updated in real-time manner. We also maintain profiles of users including demographic information and a summary of user activities within Yahoo! properties. Based on all features in user and content profiles, we develop predictive bilinear regression models to provide accurate personalized recommendations of new items for both existing and new users. This approach results in an offline model with light computational overhead compared with other recommender systems that require online re-training. The proposed framework is general and flexible for other personalized tasks. The superior performance of our approach is verified on a large-scale data set collected from the Today-Module on Yahoo! Front Page, with comparison against six competitive approaches.

Proceedings ArticleDOI
29 Jun 2009
TL;DR: It is shown that the optimal strategy is different from the fixed one, and supports more effective and efficient interaction sessions, and allows conversational systems to autonomously improve a fixed strategy and eventually learn a better one using reinforcement learning techniques.
Abstract: Conversational recommender systems (CRSs) assist online users in their information-seeking and decision making tasks by supporting an interactive process. Although these processes could be rather diverse, CRSs typically follow a fixed strategy, e.g., based on critiquing or on iterative query reformulation. In a previous paper, we proposed a novel recommendation model that allows conversational systems to autonomously improve a fixed strategy and eventually learn a better one using reinforcement learning techniques. This strategy is optimal for the given model of the interaction and it is adapted to the users' behaviors. In this paper we validate our approach in an online CRS by means of a user study involving several hundreds of testers. We show that the optimal strategy is different from the fixed one, and supports more effective and efficient interaction sessions.

01 Jan 2009
TL;DR: This work introduces a new context-aware recommendation approach called user micro-proling, which split each single user prole into several possibly overlapping sub-proles, each representing users in particular contexts.
Abstract: Context-aware recommender systems (CARS) aim at improving users’ satisfaction by tailoring recommendations to each particular context. In this work we propose a contextual pre-ltering technique based on implicit user feedback. We introduce a new context-aware recommendation approach called user micro-proling . We split each single user prole into several possibly overlapping sub-proles, each representing users in particular contexts. The predictions are done using these micro-proles instead of a single user model. The users’ taste can depend on the exact partition of the contextual variable. The identication of a meaningful partition of the users’ prole and its evaluation is a non-trivial task, especially when using implicit feedback and a continuous contextual domain. We propose an o-line evaluation procedure for CARS in these conditions and evaluate our approach on a time-aware music recommendation sytem.

Proceedings ArticleDOI
19 Jul 2009
TL;DR: This paper presents a systematic study of the effectiveness of five variant sources of contextual information for user interest modeling, and demonstrates that context overlap outperforms any isolated source.
Abstract: Search and recommendation systems must include contextual information to effectively model users' interests. In this paper, we present a systematic study of the effectiveness of five variant sources of contextual information for user interest modeling. Post-query navigation and general browsing behaviors far outweigh direct search engine interaction as an information-gathering activity. Therefore we conducted this study with a focus on Website recommendations rather than search results. The five contextual information sources used are: social, historic, task, collection, and user interaction. We evaluate the utility of these sources, and overlaps between them, based on how effectively they predict users' future interests. Our findings demonstrate that the sources perform differently depending on the duration of the time window used for future prediction, and that context overlap outperforms any isolated source. Designers of Website suggestion systems can use our findings to provide improved support for post-query navigation and general browsing behaviors.

Proceedings ArticleDOI
20 Apr 2009
TL;DR: This work presents a probabilistic model for generating personalised recommendations of items to users of a web service capable of incrementally taking account of new data so the system can immediately reflect the latest user preferences and demonstrates that training the model using the on-line ADF approach yields state-of-the-art performance.
Abstract: We present a probabilistic model for generating personalised recommendations of items to users of a web service. The Matchbox system makes use of content information in the form of user and item meta data in combination with collaborative filtering information from previous user behavior in order to predict the value of an item for a user. Users and items are represented by feature vectors which are mapped into a low-dimensional `trait space' in which similarity is measured in terms of inner products. The model can be trained from different types of feedback in order to learn user-item preferences. Here we present three alternatives: direct observation of an absolute rating each user gives to some items, observation of a binary preference (like/ don't like) and observation of a set of ordinal ratings on a user-specific scale. Efficient inference is achieved by approximate message passing involving a combination of Expectation Propagation (EP) and Variational Message Passing. We also include a dynamics model which allows an item's popularity, a user's taste or a user's personal rating scale to drift over time. By using Assumed-Density Filtering (ADF) for training, the model requires only a single pass through the training data. This is an on-line learning algorithm capable of incrementally taking account of new data so the system can immediately reflect the latest user preferences. We evaluate the performance of the algorithm on the MovieLens and Netflix data sets consisting of approximately 1,000,000 and 100,000,000 ratings respectively. This demonstrates that training the model using the on-line ADF approach yields state-of-the-art performance with the option of improving performance further if computational resources are available by performing multiple EP passes over the training data.

Book
24 May 2009
TL;DR: A generic user modeling data representation model is provided, which demonstrates its compatibility with existing recommendation techniques, and allows improving the quality of the recommendations provided to the users in certain conditions.
Abstract: Provision of personalized recommendations to users requires accurate modeling of their interests and needs. This work proposes a general framework and specific methodologies for enhancing the accuracy of user modeling in recommender systems by importing and integrating data collected by other recommender systems. Such a process is defined as user models mediation. The work discusses the details of such a generic user modeling mediation framework. It provides a generic user modeling data representation model, demonstrates its compatibility with existing recommendation techniques, and discusses the general steps of the mediation. Specifically, four major types of mediation are presented: cross-user, cross-item, cross-context, and cross-representation. Finally, the work reports the application of the mediation framework and illustrates it with practical mediation scenarios. Evaluations of these scenarios demonstrate the potential benefits of user modeling data mediation, as in certain conditions it allows improving the quality of the recommendations provided to the users.

Book ChapterDOI
01 Sep 2009
TL;DR: This paper presents a user study aimed at quantifying the noise in user ratings that is due to inconsistencies, and analyzes how factors such as item sorting and time of rating affect this noise.
Abstract: Recent growing interest in predicting and influencing consumer behavior has generated a parallel increase in research efforts on Recommender Systems. Many of the state-of-the-art Recommender Systems algorithms rely on obtaining user ratings in order to later predict unknown ratings. An underlying assumption in this approach is that the user ratings can be treated as ground truth of the user's taste. However, users are inconsistent in giving their feedback, thus introducing an unknown amount of noise that challenges the validity of this assumption. In this paper, we tackle the problem of analyzing and characterizing the noise in user feedback through ratings of movies. We present a user study aimed at quantifying the noise in user ratings that is due to inconsistencies. We measure RMSE values that range from 0.557 to 0.8156. We also analyze how factors such as item sorting and time of rating affect this noise.

Proceedings ArticleDOI
23 Oct 2009
TL;DR: A factor analysis-based optimization framework to incorporate the user trust and distrust relationships into the recommender systems and the experimental results show that the distrust relations among users are as important as the trust relations.
Abstract: With the exponential growth of Web contents, Recommender System has become indispensable for discovering new information that might interest Web users. Despite their success in the industry, traditional recommender systems suffer from several problems. First, the sparseness of the user-item matrix seriously affects the recommendation quality. Second, traditional recommender systems ignore the connections among users, which loses the opportunity to provide more accurate and personalized recommendations. In this paper, aiming at providing more realistic and accurate recommendations, we propose a factor analysis-based optimization framework to incorporate the user trust and distrust relationships into the recommender systems. The contributions of this paper are three-fold: (1) We elaborate how user distrust information can benefit the recommender systems. (2) In terms of the trust relations, distinct from previous trust-aware recommender systems which are based on some heuristics, we systematically interpret how to constrain the objective function with trust regularization. (3) The experimental results show that the distrust relations among users are as important as the trust relations. The complexity analysis shows our method scales linearly with the number of observations, while the empirical analysis on a large Epinions dataset proves that our approaches perform better than the state-of-the-art approaches.

Proceedings ArticleDOI
24 Mar 2009
TL;DR: This paper develops efficient diversification algorithms built upon the notion of explanation-based diversity and demonstrates their efficiency and effectiveness in diversification on two real life data sets: del.icio.us and Yahoo! Movies.
Abstract: Recommendations in collaborative tagging sites such as del.icio.us and Yahoo! Movies, are becoming increasingly important, due to the proliferation of general queries on those sites and the ineffectiveness of the traditional search paradigm to address those queries. Regardless of the underlying recommendation strategy, item-based or user-based, one of the key concerns in producing recommendations, is over-specialization, which results in returning items that are too homogeneous. Traditional solutions rely on post-processing returned items to identify those which differ in their attribute values (e.g., genre and actors for movies). Such approaches are not always applicable when intrinsic attributes are not available (e.g., URLs in del.icio.us). In a recent paper [20], we introduced the notion of explanation-based diversity and formalized the diversification problem as a compromise between accuracy and diversity. In this paper, we develop efficient diversification algorithms built upon this notion. The algorithms explore compromises between accuracy and diversity. We demonstrate their efficiency and effectiveness in diversification on two real life data sets: del.icio.us and Yahoo! Movies.

Proceedings ArticleDOI
23 Oct 2009
TL;DR: This strategy provides analysts and companies with a practical suggestion on how to pick a good pre- or post-filtering approach in an effective manner to improve performance of a context-aware recommender system.
Abstract: Recently, methods for generating context-aware recommendations were classified into the pre-filtering, post-filtering and contextual modeling approaches. Although some of these methods have been studied independently, no prior research compared the performance of these methods to determine which of them is better than the others. This paper focuses on comparing the pre-filtering and the post-filtering approaches and identifying which method dominates the other and under which circumstances. Since there are no clear winners in this comparison, we propose an alternative more effective method of selecting the winners in the pre- vs. the post-filtering comparison. This strategy provides analysts and companies with a practical suggestion on how to pick a good pre- or post-filtering approach in an effective manner to improve performance of a context-aware recommender system.

Proceedings ArticleDOI
06 Nov 2009
TL;DR: It is shown that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations and the opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.
Abstract: In this paper we show that the extraction of opinions from free-text reviews can improve the accuracy of movie recommendations. We present three approaches to extract movie aspects as opinion targets and use them as features for the collaborative filtering. Each of these approaches requires different amounts of manual interaction. We collected a data set of reviews with corresponding ordinal (star) ratings of several thousand movies to evaluate the different features for the collaborative filtering. We employ a state-of-the-art collaborative filtering engine for the recommendations during our evaluation and compare the performance with and without using the features representing user preferences mined from the free-text reviews provided by the users. The opinion mining based features perform significantly better than the baseline, which is based on star ratings and genre information only.

Proceedings ArticleDOI
20 Apr 2009
TL;DR: Empirical comparisons show that LDA performs consistently better than ARM for the community recommendation task when recommending a list of 4 or more communities, however, for recommendation lists of up to 3 communities, ARM is still a bit better.
Abstract: Users of social networking services can connect with each other by forming communities for online interaction. Yet as the number of communities hosted by such websites grows over time, users have even greater need for effective community recommendations in order to meet more users. In this paper, we investigate two algorithms from very different domains and evaluate their effectiveness for personalized community recommendation. First is association rule mining (ARM), which discovers associations between sets of communities that are shared across many users. Second is latent Dirichlet allocation (LDA), which models user-community co-occurrences using latent aspects. In comparing LDA with ARM, we are interested in discovering whether modeling low-rank latent structure is more effective for recommendations than directly mining rules from the observed data. We experiment on an Orkut data set consisting of 492,104 users and 118,002 communities. Our empirical comparisons using the top-k recommendations metric show that LDA performs consistently better than ARM for the community recommendation task when recommending a list of 4 or more communities. However, for recommendation lists of up to 3 communities, ARM is still a bit better. We analyze examples of the latent information learned by LDA to explain this finding. To efficiently handle the large-scale data set, we parallelize LDA on distributed computers and demonstrate our parallel implementation's scalability with varying numbers of machines.

Proceedings ArticleDOI
23 Oct 2009
TL;DR: This paper proposes a novel framework, called tag informed collaborative filtering (TagiCoFi), to seamlessly integrate tagging information into the CF procedure, and demonstrates that TagiCoFi outperforms its counterpart which discards the tagging information even when it is available, and achieves state-of-the-art performance.
Abstract: Besides the rating information, an increasing number of modern recommender systems also allow the users to add personalized tags to the items. Such tagging information may provide very useful information for item recommendation, because the users' interests in items can be implicitly reflected by the tags that they often use. Although some content-based recommender systems have made preliminary attempts recently to utilize tagging information to improve the recommendation performance, few recommender systems based on collaborative filtering (CF) have employed tagging information to help the item recommendation procedure. In this paper, we propose a novel framework, called tag informed collaborative filtering (TagiCoFi), to seamlessly integrate tagging information into the CF procedure. Experimental results demonstrate that TagiCoFi outperforms its counterpart which discards the tagging information even when it is available, and achieves state-of-the-art performance.

Journal ArticleDOI
TL;DR: The paper presents a fuzzy set theoretic method (FTM) for recommender systems that handles the non-stochastic uncertainty induced from subjectivity, vagueness and imprecision in the data, and the domain knowledge and the task under consideration.