scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A survey of collaborative filtering techniques

01 Jan 2009-Advances in Artificial Intelligence (Hindawi Publishing Corp.)-Vol. 2009, pp 4
TL;DR: From basic techniques to the state-of-the-art, this paper attempts to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.
Abstract: As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, modelbased, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.

Content maybe subject to copyright    Report

Citations
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: An overview of recommender systems as well as collaborative filtering methods and algorithms is provided, which explains their evolution, provides an original classification for these systems, identifies areas of future implementation and develops certain areas selected for past, present or future importance.
Abstract: Recommender systems have developed in parallel with the web. They were initially based on demographic, content-based and collaborative filtering. Currently, these systems are incorporating social information. In the future, they will use implicit, local and personal information from the Internet of things. This article provides an overview of recommender systems as well as collaborative filtering methods and algorithms; it also explains their evolution, provides an original classification for these systems, identifies areas of future implementation and develops certain areas selected for past, present or future importance.

2,639 citations


Cites background from "A survey of collaborative filtering..."

  • ...Collaborative Filtering [3,94,92,51,212] allows users to give ratings about a set of elements (e....

    [...]

  • ...Su and Khoshgoftaar [212] presents a survey of CF techniques....

    [...]

  • ...The rest of this section deal is dealt with the concepts and research in the two lines considered previously: Filtering of social information and content filtering....

    [...]

  • ...The pure CBF has several shortcomings [16,176,212]:...

    [...]

  • ...Breese et al. [43] evaluated the predictive accuracy of different algorithms for CF; later, the classical paper [94] describes the base for evaluating the Collaborative Filtering RS....

    [...]

Journal ArticleDOI
TL;DR: Recent progress about link prediction algorithms is summarized, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods.
Abstract: Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labeled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.

2,530 citations


Cites background from "A survey of collaborative filtering..."

  • ...tering 2 framework [30]. 2 Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. [29] 5 Node similarity can be defined by using the essential attributes of nodes: two nodes are considered to be similar if they have many common features [31]. However, the attributes of nodes are general...

    [...]

Proceedings ArticleDOI
Paul Covington1, Jay Adams1, Emre Sargin1
07 Sep 2016
TL;DR: This paper details a deep candidate generation model and then describes a separate deep ranking model and provides practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.
Abstract: YouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence. In this paper, we describe the system at a high level and focus on the dramatic performance improvements brought by deep learning. The paper is split according to the classic two-stage information retrieval dichotomy: first, we detail a deep candidate generation model and then describe a separate deep ranking model. We also provide practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.

2,469 citations


Cites background from "A survey of collaborative filtering..."

  • ...YouTube is the world’s largest platform for creating, sharing and discovering video content....

    [...]

Journal ArticleDOI
Jens Kattge1, Sandra Díaz2, Sandra Lavorel3, Iain Colin Prentice4, Paul Leadley5, Gerhard Bönisch1, Eric Garnier3, Mark Westoby4, Peter B. Reich6, Peter B. Reich7, Ian J. Wright4, Johannes H. C. Cornelissen8, Cyrille Violle3, Sandy P. Harrison4, P.M. van Bodegom8, Markus Reichstein1, Brian J. Enquist9, Nadejda A. Soudzilovskaia8, David D. Ackerly10, Madhur Anand11, Owen K. Atkin12, Michael Bahn13, Timothy R. Baker14, Dennis D. Baldocchi10, Renée M. Bekker15, Carolina C. Blanco16, Benjamin Blonder9, William J. Bond17, Ross A. Bradstock18, Daniel E. Bunker19, Fernando Casanoves20, Jeannine Cavender-Bares7, Jeffrey Q. Chambers21, F. S. Chapin22, Jérôme Chave3, David A. Coomes23, William K. Cornwell8, Joseph M. Craine24, B. H. Dobrin9, Leandro da Silva Duarte16, Walter Durka25, James J. Elser26, Gerd Esser27, Marc Estiarte28, William F. Fagan29, Jingyun Fang, Fernando Fernández-Méndez30, Alessandra Fidelis31, Bryan Finegan20, Olivier Flores32, H. Ford33, Dorothea Frank1, Grégoire T. Freschet34, Nikolaos M. Fyllas14, Rachael V. Gallagher4, Walton A. Green35, Alvaro G. Gutiérrez25, Thomas Hickler, Steven I. Higgins36, John G. Hodgson37, Adel Jalili, Steven Jansen38, Carlos Alfredo Joly39, Andrew J. Kerkhoff40, Don Kirkup41, Kaoru Kitajima42, Michael Kleyer43, Stefan Klotz25, Johannes M. H. Knops44, Koen Kramer, Ingolf Kühn16, Hiroko Kurokawa45, Daniel C. Laughlin46, Tali D. Lee47, Michelle R. Leishman4, Frederic Lens48, Tanja Lenz4, Simon L. Lewis14, Jon Lloyd49, Jon Lloyd14, Joan Llusià28, Frédérique Louault50, Siyan Ma10, Miguel D. Mahecha1, Peter Manning51, Tara Joy Massad1, Belinda E. Medlyn4, Julie Messier9, Angela T. Moles52, Sandra Cristina Müller16, Karin Nadrowski53, Shahid Naeem54, Ülo Niinemets55, S. Nöllert1, A. Nüske1, Romà Ogaya28, Jacek Oleksyn56, Vladimir G. Onipchenko57, Yusuke Onoda58, Jenny C. Ordoñez59, Gerhard E. Overbeck16, Wim A. Ozinga59, Sandra Patiño14, Susana Paula60, Juli G. Pausas60, Josep Peñuelas28, Oliver L. Phillips14, Valério D. Pillar16, Hendrik Poorter, Lourens Poorter59, Peter Poschlod61, Andreas Prinzing62, Raphaël Proulx63, Anja Rammig64, Sabine Reinsch65, Björn Reu1, Lawren Sack66, Beatriz Salgado-Negret20, Jordi Sardans28, Satomi Shiodera67, Bill Shipley68, Andrew Siefert69, Enio E. Sosinski70, Jean-François Soussana50, Emily Swaine71, Nathan G. Swenson72, Ken Thompson37, Peter E. Thornton73, Matthew S. Waldram74, Evan Weiher47, Michael T. White75, S. White11, S. J. Wright76, Benjamin Yguel3, Sönke Zaehle1, Amy E. Zanne77, Christian Wirth58 
Max Planck Society1, National University of Cordoba2, Centre national de la recherche scientifique3, Macquarie University4, University of Paris-Sud5, University of Western Sydney6, University of Minnesota7, VU University Amsterdam8, University of Arizona9, University of California, Berkeley10, University of Guelph11, Australian National University12, University of Innsbruck13, University of Leeds14, University of Groningen15, Universidade Federal do Rio Grande do Sul16, University of Cape Town17, University of Wollongong18, New Jersey Institute of Technology19, Centro Agronómico Tropical de Investigación y Enseñanza20, Lawrence Berkeley National Laboratory21, University of Alaska Fairbanks22, University of Cambridge23, Kansas State University24, Helmholtz Centre for Environmental Research - UFZ25, Arizona State University26, University of Giessen27, Autonomous University of Barcelona28, University of Maryland, College Park29, Universidad del Tolima30, University of São Paulo31, University of La Réunion32, University of York33, University of Sydney34, Harvard University35, Goethe University Frankfurt36, University of Sheffield37, University of Ulm38, State University of Campinas39, Kenyon College40, Royal Botanic Gardens41, University of Florida42, University of Oldenburg43, University of Nebraska–Lincoln44, Tohoku University45, Northern Arizona University46, University of Wisconsin–Eau Claire47, Naturalis48, James Cook University49, Institut national de la recherche agronomique50, Newcastle University51, University of New South Wales52, Leipzig University53, Columbia University54, Estonian University of Life Sciences55, Polish Academy of Sciences56, Moscow State University57, Kyushu University58, Wageningen University and Research Centre59, Spanish National Research Council60, University of Regensburg61, University of Rennes62, Université du Québec à Trois-Rivières63, Potsdam Institute for Climate Impact Research64, Technical University of Denmark65, University of California, Los Angeles66, Hokkaido University67, Université de Sherbrooke68, Syracuse University69, Empresa Brasileira de Pesquisa Agropecuária70, University of Aberdeen71, Michigan State University72, Oak Ridge National Laboratory73, University of Leicester74, Utah State University75, Smithsonian Institution76, University of Missouri77
01 Sep 2011
TL;DR: TRY as discussed by the authors is a global database of plant traits, including morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs, which can be used for a wide range of research from evolutionary biology, community and functional ecology to biogeography.
Abstract: Plant traits – the morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs – determine how primary producers respond to environmental factors, affect other trophic levels, influence ecosystem processes and services and provide a link from species richness to ecosystem functional diversity. Trait data thus represent the raw material for a wide range of research from evolutionary biology, community and functional ecology to biogeography. Here we present the global database initiative named TRY, which has united a wide range of the plant trait research community worldwide and gained an unprecedented buy-in of trait data: so far 93 trait databases have been contributed. The data repository currently contains almost three million trait entries for 69 000 out of the world's 300 000 plant species, with a focus on 52 groups of traits characterizing the vegetative and regeneration stages of the plant life cycle, including growth, dispersal, establishment and persistence. A first data analysis shows that most plant traits are approximately log-normally distributed, with widely differing ranges of variation across traits. Most trait variation is between species (interspecific), but significant intraspecific variation is also documented, up to 40% of the overall variation. Plant functional types (PFTs), as commonly used in vegetation models, capture a substantial fraction of the observed variation – but for several traits most variation occurs within PFTs, up to 75% of the overall variation. In the context of vegetation models these traits would better be represented by state variables rather than fixed parameter values. The improved availability of plant trait data in the unified global database is expected to support a paradigm shift from species to trait-based ecology, offer new opportunities for synthetic plant trait research and enable a more realistic and empirically grounded representation of terrestrial vegetation in Earth system models.

2,017 citations

References
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


"A survey of collaborative filtering..." refers background in this paper

  • ...By starting with an initial policy π0(s) = arg maxa∈AR(s, a), computing the reward value function Vi(s) based on the previous policy, and updating the policy with the new value function at each step, the iterations will converge to an optimal policy [90, 91]....

    [...]

Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations


"A survey of collaborative filtering..." refers methods in this paper

  • ...A user rating profile (URP) model [97] combines the intuitive appeal of the multinomial mixture model and aspect model [83], with the high-level generative semantics of Latent Dirichlet Allocation (LDA, a generative probabilistic model, in which each item is modeled as a finite mixture over an underlying set of users) [99]....

    [...]

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations


"A survey of collaborative filtering..." refers methods in this paper

  • ...A commonly-used partitioning method is k-means, proposed by MacQueen [78], which has two main advantages: relative efficiency and easy implementation....

    [...]