Pattern Recognition and Machine Learning

doi:10.1198/TECH.2007.S518

Journal Article•DOI•

Pattern Recognition and Machine Learning

01 Aug 2007-Technometrics (Taylor & Francis)-Vol. 49, Iss: 3, pp 366-366

TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.

read less

Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Collaborative Deep Learning for Recommender Systems

[...]

Hao Wang¹, Naiyan Wang¹, Dit-Yan Yeung¹•Institutions (1)

Hong Kong University of Science and Technology¹

10 Aug 2015

TL;DR: Wang et al. as discussed by the authors proposed a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix.

...read moreread less

Abstract: Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recently advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art.

...read moreread less

1,546 citations

Journal Article•DOI•

Struck: Structured Output Tracking with Kernels

[...]

Sam Hare, Stuart Golodetz¹, Amir Saffari, Vibhav Vineet², Ming-Ming Cheng³, Stephen Hicks¹, Philip H. S. Torr¹ - Show less +3 more•Institutions (3)

University of Oxford¹, Stanford University², Nankai University³

01 Oct 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A framework for adaptive visual object tracking based on structured output prediction that is able to outperform state-of-the-art trackers on various benchmark videos and can easily incorporate additional features and kernels into the framework, which results in increased tracking performance.

...read moreread less

Abstract: Adaptive tracking-by-detection methods are widely used in computer vision for tracking arbitrary objects. Current approaches treat the tracking problem as a classification task and use online learning techniques to update the object model. However, for these updates to happen one needs to convert the estimated object position into a set of labelled training examples, and it is not clear how best to perform this intermediate step. Furthermore, the objective for the classifier (label prediction) is not explicitly coupled to the objective for the tracker (estimation of object position). In this paper, we present a framework for adaptive visual object tracking based on structured output prediction. By explicitly allowing the output space to express the needs of the tracker, we avoid the need for an intermediate classification step. Our method uses a kernelised structured output support vector machine (SVM), which is learned online to provide adaptive tracking. To allow our tracker to run at high frame rates, we (a) introduce a budgeting mechanism that prevents the unbounded growth in the number of support vectors that would otherwise occur during tracking, and (b) show how to implement tracking on the GPU. Experimentally, we show that our algorithm is able to outperform state-of-the-art trackers on various benchmark videos. Additionally, we show that we can easily incorporate additional features and kernels into our framework, which results in increased tracking performance.

...read moreread less

1,507 citations

Journal Article•DOI•

Machine learning and the physical sciences

[...]

Giuseppe Carleo, J. Ignacio Cirac¹, Kyle Cranmer², Laurent Daudet, Maria Schuld³, Naftali Tishby⁴, Leslie Vogt-Maranto², Lenka Zdeborová⁵ - Show less +4 more•Institutions (5)

Max Planck Society¹, New York University², University of KwaZulu-Natal³, Hebrew University of Jerusalem⁴, Université Paris-Saclay⁵

06 Dec 2019-Reviews of Modern Physics

TL;DR: This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences, including conceptual developments in ML motivated by physical insights, applications of machine learning techniques to several domains in physics, and cross fertilization between the two fields.

...read moreread less

Abstract: Machine learning (ML) encompasses a broad range of algorithms and modeling tools used for a vast array of data processing tasks, which has entered most scientific disciplines in recent years. This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences. This includes conceptual developments in ML motivated by physical insights, applications of machine learning techniques to several domains in physics, and cross fertilization between the two fields. After giving a basic notion of machine learning methods and principles, examples are described of how statistical physics is used to understand methods in ML. This review then describes applications of ML methods in particle physics and cosmology, quantum many-body physics, quantum computing, and chemical and material physics. Research and development into novel computing architectures aimed at accelerating ML are also highlighted. Each of the sections describe recent successes as well as domain-specific methodology and challenges.

...read moreread less

1,504 citations

Cites methods from "Pattern Recognition and Machine Lea..."

...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994). The importance of these early works in physics is acknowledged in some of the landmark papers on the subject in statistics (Johnstone and Lu, 2009). However, the lack of mathematical rigor and limited understanding of algorithmic tractability caused the impact of these works in machine learning and statistics to remain limited. A resurrection of interest in the statistical-physics approach to low-rank matrix decompositions came with the study of the stochastic block model for detection of clusters or communities in sparse networks. The problem of community detection was studied heuristically and algorithmically extensively in statistical physics; for a review see Fortunato (2010). However, the exact solution and understanding of algorithmic limitations in the stochastic block model came from the spin glass theory by Decelle et al. (2011a, 2011b). These works computed (nonrigorously) the asymptotically optimal performance and delimited sharply regions of parameters where this performance is reached by the belief propagation (BP) algorithm (Yedidia, Freeman, and Weiss, 2003). Second-order phase transitions appearing in the model separate a phase where clustering cannot be performed better than by random guessing from a region where it can be done efficiently with BP. First-order phase transitions and one of their spinodal lines then separate regions where clustering is impossible, possible but not doable with the BP algorithm, and easy with the BP algorithm. Decelle et al. (2011a, 2011b) also conjectured that when the BP algorithm is not able to reach the optimal performance on large instances of the model, then no other polynomial algorithm will. These works attracted a large amount of follow-up work in mathematics, statistics, machine learning, and computerscience communities. The statistical-physics understanding of the stochastic block model and the conjecture about belief propagation algorithm being optimal among all polynomial ones inspired the discovery of a new class of spectral algorithms for sparse data (i.e., when the matrix X is sparse) (Krzakala et al., 2013). Spectral algorithms are basic tools in data analysis (Ng, Jordan, and Weiss, 2002; Von Luxburg, 2007), based on the singular value decomposition of the matrix X or functions of X. Yet for sparse matrices X, the spectrum is known to have leading singular values with localized singular vectors unrelated to the latent underlying structure. A more robust spectral method is obtained by linearizing the belief propagation, thus obtaining a so-called nonbacktracking matrix (Krzakala et al., 2013). A variant on this spectral method based on algorithmic interpretation of the Hessian of the Bethe free energy also originated in physics (Saade, Krzakala, and Zdeborová, 2014). This line of statistical-physics-inspired research is merging into the mainstream in statistics and machine learning. This is largely thanks to recent progress in (a) our understanding of algorithmic limitations, due to the analysis of approximate message passing (AMP) algorithms (Rangan and Fletcher, 2012; Javanmard and Montanari, 2013; Matsushita and Tanaka, 2013; Bolthausen, 2014; Deshpande and Montanari, 2014) for low-rank matrix estimation that is a generalization of the Thouless-Anderson-Palmer equations (Thouless, Anderson, and Palmer, 1977) well known in the physics literature on spin glasses; and (b) progress in proving many of the corresponding results in a mathematically rigorous way. Some of the influential papers in this direction (related to low-rank matrix estimation) are Deshpande and Montanari (2014), Barbier et al....
[...]
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994)....
[...]
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994). The importance of these early works in physics is acknowledged in some of the landmark papers on the subject in statistics (Johnstone and Lu, 2009). However, the lack of mathematical rigor and limited understanding of algorithmic tractability caused the impact of these works in machine learning and statistics to remain limited. A resurrection of interest in the statistical-physics approach to low-rank matrix decompositions came with the study of the stochastic block model for detection of clusters or communities in sparse networks. The problem of community detection was studied heuristically and algorithmically extensively in statistical physics; for a review see Fortunato (2010). However, the exact solution and understanding of algorithmic limitations in the stochastic block model came from the spin glass theory by Decelle et al....
[...]
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994). The importance of these early works in physics is acknowledged in some of the landmark papers on the subject in statistics (Johnstone and Lu, 2009). However, the lack of mathematical rigor and limited understanding of algorithmic tractability caused the impact of these works in machine learning and statistics to remain limited. A resurrection of interest in the statistical-physics approach to low-rank matrix decompositions came with the study of the stochastic block model for detection of clusters or communities in sparse networks. The problem of community detection was studied heuristically and algorithmically extensively in statistical physics; for a review see Fortunato (2010). However, the exact solution and understanding of algorithmic limitations in the stochastic block model came from the spin glass theory by Decelle et al. (2011a, 2011b). These works computed (nonrigorously) the asymptotically optimal performance and delimited sharply regions of parameters where this performance is reached by the belief propagation (BP) algorithm (Yedidia, Freeman, and Weiss, 2003). Second-order phase transitions appearing in the model separate a phase where clustering cannot be performed better than by random guessing from a region where it can be done efficiently with BP. First-order phase transitions and one of their spinodal lines then separate regions where clustering is impossible, possible but not doable with the BP algorithm, and easy with the BP algorithm. Decelle et al. (2011a, 2011b) also conjectured that when the BP algorithm is not able to reach the optimal performance on large instances of the model, then no other polynomial algorithm will. These works attracted a large amount of follow-up work in mathematics, statistics, machine learning, and computerscience communities. The statistical-physics understanding of the stochastic block model and the conjecture about belief propagation algorithm being optimal among all polynomial ones inspired the discovery of a new class of spectral algorithms for sparse data (i.e., when the matrix X is sparse) (Krzakala et al., 2013). Spectral algorithms are basic tools in data analysis (Ng, Jordan, and Weiss, 2002; Von Luxburg, 2007), based on the singular value decomposition of the matrix X or functions of X. Yet for sparse matrices X, the spectrum is known to have leading singular values with localized singular vectors unrelated to the latent underlying structure. A more robust spectral method is obtained by linearizing the belief propagation, thus obtaining a so-called nonbacktracking matrix (Krzakala et al., 2013). A variant on this spectral method based on algorithmic interpretation of the Hessian of the Bethe free energy also originated in physics (Saade, Krzakala, and Zdeborová, 2014). This line of statistical-physics-inspired research is merging into the mainstream in statistics and machine learning. This is largely thanks to recent progress in (a) our understanding of algorithmic limitations, due to the analysis of approximate message passing (AMP) algorithms (Rangan and Fletcher, 2012; Javanmard and Montanari, 2013; Matsushita and Tanaka, 2013; Bolthausen, 2014; Deshpande and Montanari, 2014) for low-rank matrix estimation that is a generalization of the Thouless-Anderson-Palmer equations (Thouless, Anderson, and Palmer, 1977) well known in the physics literature on spin glasses; and (b) progress in proving many of the corresponding results in a mathematically rigorous way. Some of the influential papers in this direction (related to low-rank matrix estimation) are Deshpande and Montanari (2014), Barbier et al. (2016), Lelarge and Miolane (2016), and Coja-Oghlan et al....
[...]
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994). The importance of these early works in physics is acknowledged in some of the landmark papers on the subject in statistics (Johnstone and Lu, 2009). However, the lack of mathematical rigor and limited understanding of algorithmic tractability caused the impact of these works in machine learning and statistics to remain limited. A resurrection of interest in the statistical-physics approach to low-rank matrix decompositions came with the study of the stochastic block model for detection of clusters or communities in sparse networks. The problem of community detection was studied heuristically and algorithmically extensively in statistical physics; for a review see Fortunato (2010). However, the exact solution and understanding of algorithmic limitations in the stochastic block model came from the spin glass theory by Decelle et al. (2011a, 2011b). These works computed (nonrigorously) the asymptotically optimal performance and delimited sharply regions of parameters where this performance is reached by the belief propagation (BP) algorithm (Yedidia, Freeman, and Weiss, 2003). Second-order phase transitions appearing in the model separate a phase where clustering cannot be performed better than by random guessing from a region where it can be done efficiently with BP. First-order phase transitions and one of their spinodal lines then separate regions where clustering is impossible, possible but not doable with the BP algorithm, and easy with the BP algorithm. Decelle et al. (2011a, 2011b) also conjectured that when the BP algorithm is not able to reach the optimal performance on large instances of the model, then no other polynomial algorithm will. These works attracted a large amount of follow-up work in mathematics, statistics, machine learning, and computerscience communities. The statistical-physics understanding of the stochastic block model and the conjecture about belief propagation algorithm being optimal among all polynomial ones inspired the discovery of a new class of spectral algorithms for sparse data (i.e., when the matrix X is sparse) (Krzakala et al., 2013). Spectral algorithms are basic tools in data analysis (Ng, Jordan, and Weiss, 2002; Von Luxburg, 2007), based on the singular value decomposition of the matrix X or functions of X. Yet for sparse matrices X, the spectrum is known to have leading singular values with localized singular vectors unrelated to the latent underlying structure. A more robust spectral method is obtained by linearizing the belief propagation, thus obtaining a so-called nonbacktracking matrix (Krzakala et al., 2013). A variant on this spectral method based on algorithmic interpretation of the Hessian of the Bethe free energy also originated in physics (Saade, Krzakala, and Zdeborová, 2014). This line of statistical-physics-inspired research is merging into the mainstream in statistics and machine learning. This is largely thanks to recent progress in (a) our understanding of algorithmic limitations, due to the analysis of approximate message passing (AMP) algorithms (Rangan and Fletcher, 2012; Javanmard and Montanari, 2013; Matsushita and Tanaka, 2013; Bolthausen, 2014; Deshpande and Montanari, 2014) for low-rank matrix estimation that is a generalization of the Thouless-Anderson-Palmer equations (Thouless, Anderson, and Palmer, 1977) well known in the physics literature on spin glasses; and (b) progress in proving many of the corresponding results in a mathematically rigorous way. Some of the influential papers in this direction (related to low-rank matrix estimation) are Deshpande and Montanari (2014), Barbier et al. (2016), Lelarge and Miolane (2016), and Coja-Oghlan et al. (2018) for the proof of the replica formula for the information-theoretically optimal performance....
[...]

Book•

Bayesian Reasoning and Machine Learning

[...]

David Barber¹•Institutions (1)

University College London¹

12 Mar 2012

TL;DR: Comprehensive and coherent, this hands-on text develops everything from basic reasoning to advanced techniques within the framework of graphical models, and develops analytical and problem-solving skills that equip them for the real world.

...read moreread less

Abstract: Machine learning methods extract value from vast data sets quickly and with modest resources They are established tools in a wide range of industrial applications, including search engines, DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly People who know the methods have their choice of rewarding jobs This hands-on text opens these opportunities to computer science students with modest mathematical backgrounds It is designed for final-year undergraduates and master's students with limited background in linear algebra and calculus Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models Students learn more than a menu of techniques, they develop analytical and problem-solving skills that equip them for the real world Numerous examples and exercises, both computer based and theoretical, are included in every chapter Resources for students and instructors, including a MATLAB toolbox, are available online

...read moreread less

1,474 citations

Cites background from "Pattern Recognition and Machine Lea..."

...here follows that presented in [183] and [42]....
[...]
...• Alternatively, consider the two outcome counts ]a = [52, 20, 28] and ]b = [44, 14, 42]....
[...]

Journal Article•DOI•

Review: A review of novelty detection

[...]

Marco A. F. Pimentel¹, David A. Clifton¹, Lei Clifton¹, Lionel Tarassenko¹•Institutions (1)

University of Oxford¹

01 Jun 2014-Signal Processing

TL;DR: This review aims to provide an updated and structured investigation of novelty detection research papers that have appeared in the machine learning literature during the last decade.

...read moreread less

1,425 citations

Cites background or methods from "Pattern Recognition and Machine Lea..."

...GMMs are typically classified as a parametric technique [26,24,41], because of the assumption that the data are generated from a weighted mixture of Gaussian distributions....
[...]
...GMMs estimate the probability density of the target class (here the normal class), given by a training set, typically using fewer kernels than the number of patterns in the training set [46]....
[...]
...In addition to DBNs, Bayesian networks are sometimes termed naïve Bayesian networks or Bayesian belief networks....
[...]
...Approaches to novelty detection include both Frequentist and Bayesian approaches, information theory, extreme value statistics, support vector methods, other kernel methods, and neural networks....
[...]
...Bayesian approaches have also been proposed for both offline and online changepoint detection schemes [153]....
[...]

Collapse

Pattern Recognition and Machine Learning

Citations

Cites methods from "Pattern Recognition and Machine Lea..."

Cites background from "Pattern Recognition and Machine Lea..."

Cites background or methods from "Pattern Recognition and Machine Lea..."

Related Papers (5)