scispace - formally typeset
Search or ask a question
Book ChapterDOI

REFEREE: an open framework for practical testing of recommender systems using ResearchIndex

TL;DR: REFEREE, a framework for building recommender systems using ResearchIndex, is created so that anyone in the research community can develop, deploy, and evaluateRecommender systems relatively easily and quickly.
Abstract: Automated recommendation (e.g., personalized product recommendation on an ecommerce web site) is an increasingly valuable service associated with many databases--typically online retail catalogs and web logs. Currently, a major obstacle for evaluating recommendation algorithms is the lack of any standard, public, real-world testbed appropriate for the task. In an attempt to fill this gap, we have created REFEREE, a framework for building recommender systems using ResearchIndex--a huge online digital library of computer science research papers--so that anyone in the research community can develop, deploy, and evaluate recommender systems relatively easily and quickly. Research Index is in many ways ideal for evaluating recommender systems, especially so-called hybrid recommenders that combine information filtering and collaborative filtering techniques. The documents in the database are associated with a wealth of content information (author, title, abstract, full text) and collaborative information (user behaviors), as well as linkage information via the citation structure. Our framework supports more realistic evaluation metrics that assess user buy-in directly, rather than resorting to offline metrics like prediction accuracy that may have little to do with end user utility. The sheer scale of ResearchIndex (over 500,000 documents with thousands of user accesses per hour) will force algorithm designers to make real-world trade-offs that consider performance, not just accuracy. We present our own tradeoff decisions in building an example hybrid recommender called PD-Live. The algorithm uses content-based similarity information to select a set of documents from which to recommend, and collaborative information to rank the documents. PD-Live performs reasonably well compared to other recommenders in ResearchIndex.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
10 May 2005
TL;DR: This work presents topic diversification, a novel method designed to balance and diversify personalized recommendation lists in order to reflect the user's complete spectrum of interests, and introduces the intra-list similarity metric to assess the topical diversity of recommendation lists.
Abstract: In this work we present topic diversification, a novel method designed to balance and diversify personalized recommendation lists in order to reflect the user's complete spectrum of interests. Though being detrimental to average accuracy, we show that our method improves user satisfaction with recommendation lists, in particular for lists generated using the common item-based collaborative filtering algorithm.Our work builds upon prior research on recommender systems, looking at properties of recommendation lists as entities in their own right rather than specifically focusing on the accuracy of individual recommendations. We introduce the intra-list similarity metric to assess the topical diversity of recommendation lists and the topic diversification approach for decreasing the intra-list similarity. We evaluate our method using book recommendation data, including offline analysis on 361, !, 349 ratings and an online study involving more than 2, !, 100 subjects.

1,813 citations


Cites background from "REFEREE: an open framework for prac..."

  • ...There have been several efforts in the past arguing that “accuracy does not tell the whole story” [4, 12]....

    [...]

Journal ArticleDOI
01 Sep 2003
TL;DR: Traditional relevance feedback methods require that users explicitly give feedback by specifying keywords, selecting and marking documents, or answering questions about their interests, which can be difficult to collect the necessary data and the effectiveness of explicit techniques can be limited.
Abstract: Relevance feedback has a history in information retrieval that dates back well over thirty years (c.f [SL96]). Relevance feedback is typically used for query expansion during short-term modeling of a user's immediate information need and for user profiling during long-term modeling of a user's persistent interests and preferences. Traditional relevance feedback methods require that users explicitly give feedback by, for example, specifying keywords, selecting and marking documents, or answering questions about their interests . Such relevance feedback methods force users to engage in additional activities beyond their normal searching behavior . Since the cost to the user is high and the benefits are not always apparent, it can be difficult to collect the necessary data and the effectiveness of explicit techniques can be limited.

825 citations


Cites background from "REFEREE: an open framework for prac..."

  • ...[ CLP00 ] [BP99] [BPC00] The classification is shown in Table 2. Some of the papers, such as [BLG00], [MS94] and [RS01], overlap a number of categories and are shown in overlapping gray boxes....

    [...]

  • ...[KOR00] [MS94] [RS01] [SZ00] [JFM97] [KMM+97] [ CLP00 ]...

    [...]

Proceedings ArticleDOI
28 Jan 2007
TL;DR: SuggestBot, software that performs intelligent task routing (matching people with tasks) in Wikipedia using broadly applicable strategies of text analysis, collaborative filtering, and hyperlink following to recommend tasks is presented.
Abstract: Member-maintained communities ask their users to perform tasks the community needs. From Slashdot, to IMDb, to Wikipedia, groups with diverse interests create community-maintained artifacts of lasting value (CALV) that support the group's main purpose and provide value to others. Said communities don't help members find work to do, or do so without regard to individual preferences, such as Slashdot assigning meta-moderation randomly. Yet social science theory suggests that reducing the cost and increasing the personal value of contribution would motivate members to participate more.We present SuggestBot, software that performs intelligent task routing (matching people with tasks) in Wikipedia. SuggestBot uses broadly applicable strategies of text analysis, collaborative filtering, and hyperlink following to recommend tasks. SuggestBot's intelligent task routing increases the number of edits by roughly four times compared to suggesting random articles. Our contributions are: 1) demonstrating the value of intelligent task routing in a real deployment; 2) showing how to do intelligent task routing; and 3) sharing our experience of deploying a tool in Wikipedia, which offered both challenges and opportunities for research.

272 citations

Proceedings ArticleDOI
13 Nov 2004
TL;DR: Relationships between super-concepts and sub- Concepts constitute an important cornerstone of the novel approach, providing powerful inference opportunities for profile generation based upon the classification of products that customers have chosen.
Abstract: Recommender systems have been subject to an enormous rise in popularity and research interest over the last ten years At the same time, very large taxonomies for product classification are becoming increasingly prominent among e-commerce systems for diverse domains, rendering detailed machine-readable content descriptions feasible Amazoncom makes use of an entire plethora of hand-crafted taxonomies classifying books, movies, apparel, and various other goods We exploit such taxonomic background knowledge for the computation of personalized recommendations Hereby, relationships between super-concepts and sub-concepts constitute an important cornerstone of our novel approach, providing powerful inference opportunities for profile generation based upon the classification of products that customers have chosen Ample empirical analysis, both offline and online, demonstrates our proposal's superiority over common existing approaches when user information is sparse and implicit ratings prevail

218 citations

Proceedings ArticleDOI
24 Aug 2003
TL;DR: This work proposes a new model that considers both the order information of pages in a session and the time spent on them, and cluster user sessions based on their pair-wise similarity and represent the resulting clusters by a click-stream tree.
Abstract: Predicting the next request of a user as she visits Web pages has gained importance as Web-based activity increases. Markov models and their variations, or models based on sequence mining have been found well suited for this problem. However, higher order Markov models are extremely complicated due to their large number of states whereas lower order Markov models do not capture the entire behavior of a user in a session. The models that are based on sequential pattern mining only consider the frequent sequences in the data set, making it difficult to predict the next request following a page that is not in the sequential pattern. Furthermore, it is hard to find models for mining two different kinds of information of a user session. We propose a new model that considers both the order information of pages in a session and the time spent on them. We cluster user sessions based on their pair-wise similarity and represent the resulting clusters by a click-stream tree. The new user session is then assigned to a cluster based on a similarity measure. The click-stream tree of that cluster is used to generate the recommendation set. The model can be used as part of a cache prefetching system as well as a recommendation model.

197 citations


Cites methods from "REFEREE: an open framework for prac..."

  • ...We define the hit-ratio metric and click-soon metric as proposed in [5] to evaluate our method:...

    [...]

References
More filters
Proceedings ArticleDOI
22 Oct 1994
TL;DR: GroupLens is a system for collaborative filtering of netnews, to help people find articles they will like in the huge stream of available articles, and protect their privacy by entering ratings under pseudonyms, without reducing the effectiveness of the score prediction.
Abstract: Collaborative filters help people make choices based on the opinions of other people. GroupLens is a system for collaborative filtering of netnews, to help people find articles they will like in the huge stream of available articles. News reader clients display predicted scores and make it easy for users to rate articles after they read them. Rating servers, called Better Bit Bureaus, gather and disseminate the ratings. The rating servers predict scores based on the heuristic that people who agreed in the past will probably agree again. Users can protect their privacy by entering ratings under pseudonyms, without reducing the effectiveness of the score prediction. The entire architecture is open: alternative software for news clients and Better Bit Bureaus can be developed independently and can interoperate with the components we have developed.

5,644 citations


"REFEREE: an open framework for prac..." refers methods in this paper

  • ...For example, the original GroupLens recommender treated each Usenet group as a separate set of items [21], so that users viewing recipes would not be recommended jokes or Microsoft flames....

    [...]

  • ...Common similarity metrics used include Pearson correlation [21], mean squared difference [24], and vector similarity [5]....

    [...]

  • ...GroupLens [21] uses this approach to filter Usenet news, while other systems have used this approach to recommend items from music [26] and movies [10] to web pages [1] and jokes [9]....

    [...]

Posted Content
TL;DR: In this article, the authors compare the predictive accuracy of various methods in a set of representative problem domains, including correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods.
Abstract: Collaborative filtering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental protocols, and the 2 evaluation metrics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.

4,883 citations

Proceedings Article
24 Jul 1998
TL;DR: Several algorithms designed for collaborative filtering or recommender systems are described, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods, to compare the predictive accuracy of the various methods in a set of representative problem domains.
Abstract: Collaborative filtering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental protocols, and the 2 evaluation metr rics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.

4,557 citations


"REFEREE: an open framework for prac..." refers background or methods in this paper

  • ...There were presented several approaches for combining CF and CBF methods ( 3] 8] [5], 1] We propose a method for CF and CBF combination (Appendix C) where CBF estimates are used to ll up some missing ratings for CF....

    [...]

  • ...Common similarity metrics used include Pearson correlation [21], mean squared difference [24], and vector similarity [5]....

    [...]

  • ...They have applied a number of machine learning techniques, including inductive learning [2], clustering [26], neural networks [3], and Bayesian networks [5]....

    [...]

  • ...[5] note that users are more likely to rate items that they like or which the system presents....

    [...]

Journal ArticleDOI
TL;DR: Tapestry is intended to handle any incoming stream of electronic documents and serves both as a mail filter and repository; its components are the indexer, document store, annotation store, filterer, little box, remailer, appraiser and reader/browser.
Abstract: The Tapestry experimental mail system developed at the Xerox Palo Alto Research Center is predicated on the belief that information filtering can be more effective when humans are involved in the filtering process. Tapestry was designed to support both content-based filtering and collaborative filtering, which entails people collaborating to help each other perform filtering by recording their reactions to documents they read. The reactions are called annotations; they can be accessed by other people’s filters. Tapestry is intended to handle any incoming stream of electronic documents and serves both as a mail filter and repository; its components are the indexer, document store, annotation store, filterer, little box, remailer, appraiser and reader/browser. Tapestry’s client/server architecture, its various components, and the Tapestry query language are described.

4,299 citations


Additional excerpts

  • ...Tapestryquerieswereoftenof theform “documentsthat Mark likes”; this requiresthatyou know Mark, or more generally, thatyouknow thepeoplewhoarelikeyouand whoseopinionsyoushouldvalue....

    [...]

  • ...The term collaborative filtering was introduced by Tapestry [8], although they used it in the broader senseusually denotedby “recommendersystems”today....

    [...]

  • ...Tapestry saw documentsas structuredentities (their model was email) and userscould createstructured queries,not unlike today’s email filters....

    [...]

  • ...Second,Tapestryis best suitedfor small groupswherepeopleknow eachother....

    [...]

  • ...The term collaborative filtering was introduced by Tapestry [8], although they used it in the broader sense usually denoted by “recommender systems” today....

    [...]

Proceedings ArticleDOI
01 May 1995
TL;DR: The implementation of a networked system called Ringo, which makes personalized recommendations for music albums and artists, and four different algorithms for making recommendations by using social information filtering were tested and compared.
Abstract: This paper describes a technique for making personalized recommendations from any type of database to a user based on similarities between the interest profile of that user and those of other users. In particular, we discuss the implementation of a networked system called Ringo, which makes personalized recommendations for music albums and artists. Ringo's database of users and artists grows dynamically as more people use the system and enter more information. Four different algorithms for making recommendations by using social information filtering were tested and compared. We present quantitative and qualitative results obtained from the use of Ringo by more than 2000 people.

3,237 citations


"REFEREE: an open framework for prac..." refers background in this paper

  • ...For example,a usercould sayto ignoreany mail with a subject containing“toner”, or from a senderwhoseaddress endedin “hotmail.com”....

    [...]