A survey of collaborative filtering techniques

doi:10.1155/2009/421425

Home
/
Papers
/
A survey of collaborative filtering techniques

Journal Article•DOI•

A survey of collaborative filtering techniques

Xiaoyuan Su¹, Taghi M. Khoshgoftaar¹•Institutions (1)

Florida Atlantic University¹

01 Jan 2009-Advances in Artificial Intelligence (Hindawi Publishing Corp.)-Vol. 2009, pp 4

TL;DR: From basic techniques to the state-of-the-art, this paper attempts to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.

read less

Abstract: As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, modelbased, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Recommender systems survey

[...]

Jesús Bobadilla¹, Fernando Ortega¹, Antonio Hernando¹, Abraham Gutiérrez¹•Institutions (1)

Technical University of Madrid¹

01 Jul 2013-Knowledge Based Systems

TL;DR: An overview of recommender systems as well as collaborative filtering methods and algorithms is provided, which explains their evolution, provides an original classification for these systems, identifies areas of future implementation and develops certain areas selected for past, present or future importance.

...read moreread less

Abstract: Recommender systems have developed in parallel with the web. They were initially based on demographic, content-based and collaborative filtering. Currently, these systems are incorporating social information. In the future, they will use implicit, local and personal information from the Internet of things. This article provides an overview of recommender systems as well as collaborative filtering methods and algorithms; it also explains their evolution, provides an original classification for these systems, identifies areas of future implementation and develops certain areas selected for past, present or future importance.

...read moreread less

2,639 citations

Cites background from "A survey of collaborative filtering..."

...Collaborative Filtering [3,94,92,51,212] allows users to give ratings about a set of elements (e....
[...]
...Su and Khoshgoftaar [212] presents a survey of CF techniques....
[...]
...The rest of this section deal is dealt with the concepts and research in the two lines considered previously: Filtering of social information and content filtering....
[...]
...The pure CBF has several shortcomings [16,176,212]:...
[...]
...Breese et al. [43] evaluated the predictive accuracy of different algorithms for CF; later, the classical paper [94] describes the base for evaluating the Collaborative Filtering RS....
[...]

Journal Article•DOI•

Link prediction in complex networks: A survey

[...]

Linyuan Lü¹, Linyuan Lü², Linyuan Lü³, Tao Zhou⁴, Tao Zhou² - Show less +1 more•Institutions (4)

University of Shanghai for Science and Technology¹, University of Electronic Science and Technology of China², University of Fribourg³, University of Science and Technology of China⁴

15 Mar 2011-Physica A-statistical Mechanics and Its Applications

TL;DR: Recent progress about link prediction algorithms is summarized, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods.

...read moreread less

Abstract: Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labeled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.

...read moreread less

2,530 citations

Cites background from "A survey of collaborative filtering..."

...tering 2 framework [30]. 2 Collaborative ﬁltering is the process of ﬁltering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. [29] 5 Node similarity can be deﬁned by using the essential attributes of nodes: two nodes are considered to be similar if they have many common features [31]. However, the attributes of nodes are general...
[...]

Proceedings Article•DOI•

Deep Neural Networks for YouTube Recommendations

[...]

Paul Covington¹, Jay Adams¹, Emre Sargin¹•Institutions (1)

Google¹

07 Sep 2016

TL;DR: This paper details a deep candidate generation model and then describes a separate deep ranking model and provides practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.

...read moreread less

Abstract: YouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence. In this paper, we describe the system at a high level and focus on the dramatic performance improvements brought by deep learning. The paper is split according to the classic two-stage information retrieval dichotomy: first, we detail a deep candidate generation model and then describe a separate deep ranking model. We also provide practical lessons and insights derived from designing, iterating and maintaining a massive recommendation system with enormous user-facing impact.

...read moreread less

2,469 citations

Cites background from "A survey of collaborative filtering..."

...YouTube is the world’s largest platform for creating, sharing and discovering video content....
[...]

Journal Article•DOI•

TRY - a global database of plant traits

[...]

Jens Kattge¹, Sandra Díaz², Sandra Lavorel³, Iain Colin Prentice⁴, Paul Leadley⁵, Gerhard Bönisch¹, Eric Garnier³, Mark Westoby⁴, Peter B. Reich⁶, Peter B. Reich⁷, Ian J. Wright⁴, Johannes H. C. Cornelissen⁸, Cyrille Violle³, Sandy P. Harrison⁴, P.M. van Bodegom⁸, Markus Reichstein¹, Brian J. Enquist⁹, Nadejda A. Soudzilovskaia⁸, David D. Ackerly¹⁰, Madhur Anand¹¹, Owen K. Atkin¹², Michael Bahn¹³, Timothy R. Baker¹⁴, Dennis D. Baldocchi¹⁰, Renée M. Bekker¹⁵, Carolina C. Blanco¹⁶, Benjamin Blonder⁹, William J. Bond¹⁷, Ross A. Bradstock¹⁸, Daniel E. Bunker¹⁹, Fernando Casanoves²⁰, Jeannine Cavender-Bares⁷, Jeffrey Q. Chambers²¹, F. S. Chapin²², Jérôme Chave³, David A. Coomes²³, William K. Cornwell⁸, Joseph M. Craine²⁴, B. H. Dobrin⁹, Leandro da Silva Duarte¹⁶, Walter Durka²⁵, James J. Elser²⁶, Gerd Esser²⁷, Marc Estiarte²⁸, William F. Fagan²⁹, Jingyun Fang, Fernando Fernández-Méndez³⁰, Alessandra Fidelis³¹, Bryan Finegan²⁰, Olivier Flores³², H. Ford³³, Dorothea Frank¹, Grégoire T. Freschet³⁴, Nikolaos M. Fyllas¹⁴, Rachael V. Gallagher⁴, Walton A. Green³⁵, Alvaro G. Gutiérrez²⁵, Thomas Hickler, Steven I. Higgins³⁶, John G. Hodgson³⁷, Adel Jalili, Steven Jansen³⁸, Carlos Alfredo Joly³⁹, Andrew J. Kerkhoff⁴⁰, Don Kirkup⁴¹, Kaoru Kitajima⁴², Michael Kleyer⁴³, Stefan Klotz²⁵, Johannes M. H. Knops⁴⁴, Koen Kramer, Ingolf Kühn¹⁶, Hiroko Kurokawa⁴⁵, Daniel C. Laughlin⁴⁶, Tali D. Lee⁴⁷, Michelle R. Leishman⁴, Frederic Lens⁴⁸, Tanja Lenz⁴, Simon L. Lewis¹⁴, Jon Lloyd⁴⁹, Jon Lloyd¹⁴, Joan Llusià²⁸, Frédérique Louault⁵⁰, Siyan Ma¹⁰, Miguel D. Mahecha¹, Peter Manning⁵¹, Tara Joy Massad¹, Belinda E. Medlyn⁴, Julie Messier⁹, Angela T. Moles⁵², Sandra Cristina Müller¹⁶, Karin Nadrowski⁵³, Shahid Naeem⁵⁴, Ülo Niinemets⁵⁵, S. Nöllert¹, A. Nüske¹, Romà Ogaya²⁸, Jacek Oleksyn⁵⁶, Vladimir G. Onipchenko⁵⁷, Yusuke Onoda⁵⁸, Jenny C. Ordoñez⁵⁹, Gerhard E. Overbeck¹⁶, Wim A. Ozinga⁵⁹, Sandra Patiño¹⁴, Susana Paula⁶⁰, Juli G. Pausas⁶⁰, Josep Peñuelas²⁸, Oliver L. Phillips¹⁴, Valério D. Pillar¹⁶, Hendrik Poorter, Lourens Poorter⁵⁹, Peter Poschlod⁶¹, Andreas Prinzing⁶², Raphaël Proulx⁶³, Anja Rammig⁶⁴, Sabine Reinsch⁶⁵, Björn Reu¹, Lawren Sack⁶⁶, Beatriz Salgado-Negret²⁰, Jordi Sardans²⁸, Satomi Shiodera⁶⁷, Bill Shipley⁶⁸, Andrew Siefert⁶⁹, Enio E. Sosinski⁷⁰, Jean-François Soussana⁵⁰, Emily Swaine⁷¹, Nathan G. Swenson⁷², Ken Thompson³⁷, Peter E. Thornton⁷³, Matthew S. Waldram⁷⁴, Evan Weiher⁴⁷, Michael T. White⁷⁵, S. White¹¹, S. J. Wright⁷⁶, Benjamin Yguel³, Sönke Zaehle¹, Amy E. Zanne⁷⁷, Christian Wirth⁵⁸ - Show less +133 more•Institutions (77)

Max Planck Society¹, National University of Cordoba², Centre national de la recherche scientifique³, Macquarie University⁴, University of Paris-Sud⁵, University of Western Sydney⁶, University of Minnesota⁷, VU University Amsterdam⁸, University of Arizona⁹, University of California, Berkeley¹⁰, University of Guelph¹¹, Australian National University¹², University of Innsbruck¹³, University of Leeds¹⁴, University of Groningen¹⁵, Universidade Federal do Rio Grande do Sul¹⁶, University of Cape Town¹⁷, University of Wollongong¹⁸, New Jersey Institute of Technology¹⁹, Centro Agronómico Tropical de Investigación y Enseñanza²⁰, Lawrence Berkeley National Laboratory²¹, University of Alaska Fairbanks²², University of Cambridge²³, Kansas State University²⁴, Helmholtz Centre for Environmental Research - UFZ²⁵, Arizona State University²⁶, University of Giessen²⁷, Autonomous University of Barcelona²⁸, University of Maryland, College Park²⁹, Universidad del Tolima³⁰, University of São Paulo³¹, University of La Réunion³², University of York³³, University of Sydney³⁴, Harvard University³⁵, Goethe University Frankfurt³⁶, University of Sheffield³⁷, University of Ulm³⁸, State University of Campinas³⁹, Kenyon College⁴⁰, Royal Botanic Gardens⁴¹, University of Florida⁴², University of Oldenburg⁴³, University of Nebraska–Lincoln⁴⁴, Tohoku University⁴⁵, Northern Arizona University⁴⁶, University of Wisconsin–Eau Claire⁴⁷, Naturalis⁴⁸, James Cook University⁴⁹, Institut national de la recherche agronomique⁵⁰, Newcastle University⁵¹, University of New South Wales⁵², Leipzig University⁵³, Columbia University⁵⁴, Estonian University of Life Sciences⁵⁵, Polish Academy of Sciences⁵⁶, Moscow State University⁵⁷, Kyushu University⁵⁸, Wageningen University and Research Centre⁵⁹, Spanish National Research Council⁶⁰, University of Regensburg⁶¹, University of Rennes⁶², Université du Québec à Trois-Rivières⁶³, Potsdam Institute for Climate Impact Research⁶⁴, Technical University of Denmark⁶⁵, University of California, Los Angeles⁶⁶, Hokkaido University⁶⁷, Université de Sherbrooke⁶⁸, Syracuse University⁶⁹, Empresa Brasileira de Pesquisa Agropecuária⁷⁰, University of Aberdeen⁷¹, Michigan State University⁷², Oak Ridge National Laboratory⁷³, University of Leicester⁷⁴, Utah State University⁷⁵, Smithsonian Institution⁷⁶, University of Missouri⁷⁷

01 Sep 2011

TL;DR: TRY as discussed by the authors is a global database of plant traits, including morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs, which can be used for a wide range of research from evolutionary biology, community and functional ecology to biogeography.

...read moreread less

Abstract: Plant traits – the morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs – determine how primary producers respond to environmental factors, affect other trophic levels, influence ecosystem processes and services and provide a link from species richness to ecosystem functional diversity. Trait data thus represent the raw material for a wide range of research from evolutionary biology, community and functional ecology to biogeography. Here we present the global database initiative named TRY, which has united a wide range of the plant trait research community worldwide and gained an unprecedented buy-in of trait data: so far 93 trait databases have been contributed. The data repository currently contains almost three million trait entries for 69 000 out of the world's 300 000 plant species, with a focus on 52 groups of traits characterizing the vegetative and regeneration stages of the plant life cycle, including growth, dispersal, establishment and persistence. A first data analysis shows that most plant traits are approximately log-normally distributed, with widely differing ranges of variation across traits. Most trait variation is between species (interspecific), but significant intraspecific variation is also documented, up to 40% of the overall variation. Plant functional types (PFTs), as commonly used in vegetation models, capture a substantial fraction of the observed variation – but for several traits most variation occurs within PFTs, up to 75% of the overall variation. In the context of vegetation models these traits would better be represented by state variables rather than fixed parameter values. The improved availability of plant trait data in the unified global database is expected to support a paradigm shift from species to trait-based ecology, offer new opportunities for synthetic plant trait research and enable a more realistic and empirically grounded representation of terrestrial vegetation in Earth system models.

...read moreread less

2,017 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Maximum likelihood from incomplete data via the EM algorithm

[...]

Arthur P. Dempster¹, Nan M. Laird¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Sep 1977-Journal of the royal statistical society series b-methodological

49,597 citations

Book•

Reinforcement Learning: An Introduction

[...]

Richard S. Sutton¹, Andrew G. Barto•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

...read moreread less

37,989 citations

"A survey of collaborative filtering..." refers background in this paper

...By starting with an initial policy π0(s) = arg maxa∈AR(s, a), computing the reward value function Vi(s) based on the previous policy, and updating the policy with the new value function at each step, the iterations will converge to an optimal policy [90, 91]....
[...]

Journal Article•DOI•

Latent dirichlet allocation

[...]

David M. Blei¹, Andrew Y. Ng², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Stanford University²

01 Mar 2003-Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

...read moreread less

30,570 citations

"A survey of collaborative filtering..." refers methods in this paper

...A user rating profile (URP) model [97] combines the intuitive appeal of the multinomial mixture model and aspect model [83], with the high-level generative semantics of Latent Dirichlet Allocation (LDA, a generative probabilistic model, in which each item is modeled as a finite mixture over an underlying set of users) [99]....
[...]

Proceedings Article•

Latent Dirichlet Allocation

[...]

David M. Blei¹, Andrew Y. Ng¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

...read moreread less

25,546 citations

Some methods for classification and analysis of multivariate observations

[...]

James B. MacQueen

01 Jan 1967

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

...read moreread less

24,320 citations

"A survey of collaborative filtering..." refers methods in this paper

...A commonly-used partitioning method is k-means, proposed by MacQueen [78], which has two main advantages: relative efficiency and easy implementation....
[...]