Pattern Recognition and Machine Learning
Citations
1,546 citations
1,507 citations
1,504 citations
Cites methods from "Pattern Recognition and Machine Lea..."
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994). The importance of these early works in physics is acknowledged in some of the landmark papers on the subject in statistics (Johnstone and Lu, 2009). However, the lack of mathematical rigor and limited understanding of algorithmic tractability caused the impact of these works in machine learning and statistics to remain limited. A resurrection of interest in the statistical-physics approach to low-rank matrix decompositions came with the study of the stochastic block model for detection of clusters or communities in sparse networks. The problem of community detection was studied heuristically and algorithmically extensively in statistical physics; for a review see Fortunato (2010). However, the exact solution and understanding of algorithmic limitations in the stochastic block model came from the spin glass theory by Decelle et al. (2011a, 2011b). These works computed (nonrigorously) the asymptotically optimal performance and delimited sharply regions of parameters where this performance is reached by the belief propagation (BP) algorithm (Yedidia, Freeman, and Weiss, 2003). Second-order phase transitions appearing in the model separate a phase where clustering cannot be performed better than by random guessing from a region where it can be done efficiently with BP. First-order phase transitions and one of their spinodal lines then separate regions where clustering is impossible, possible but not doable with the BP algorithm, and easy with the BP algorithm. Decelle et al. (2011a, 2011b) also conjectured that when the BP algorithm is not able to reach the optimal performance on large instances of the model, then no other polynomial algorithm will. These works attracted a large amount of follow-up work in mathematics, statistics, machine learning, and computerscience communities. The statistical-physics understanding of the stochastic block model and the conjecture about belief propagation algorithm being optimal among all polynomial ones inspired the discovery of a new class of spectral algorithms for sparse data (i.e., when the matrix X is sparse) (Krzakala et al., 2013). Spectral algorithms are basic tools in data analysis (Ng, Jordan, and Weiss, 2002; Von Luxburg, 2007), based on the singular value decomposition of the matrix X or functions of X. Yet for sparse matrices X, the spectrum is known to have leading singular values with localized singular vectors unrelated to the latent underlying structure. A more robust spectral method is obtained by linearizing the belief propagation, thus obtaining a so-called nonbacktracking matrix (Krzakala et al., 2013). A variant on this spectral method based on algorithmic interpretation of the Hessian of the Bethe free energy also originated in physics (Saade, Krzakala, and Zdeborová, 2014). This line of statistical-physics-inspired research is merging into the mainstream in statistics and machine learning. This is largely thanks to recent progress in (a) our understanding of algorithmic limitations, due to the analysis of approximate message passing (AMP) algorithms (Rangan and Fletcher, 2012; Javanmard and Montanari, 2013; Matsushita and Tanaka, 2013; Bolthausen, 2014; Deshpande and Montanari, 2014) for low-rank matrix estimation that is a generalization of the Thouless-Anderson-Palmer equations (Thouless, Anderson, and Palmer, 1977) well known in the physics literature on spin glasses; and (b) progress in proving many of the corresponding results in a mathematically rigorous way. Some of the influential papers in this direction (related to low-rank matrix estimation) are Deshpande and Montanari (2014), Barbier et al....
[...]
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994)....
[...]
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994). The importance of these early works in physics is acknowledged in some of the landmark papers on the subject in statistics (Johnstone and Lu, 2009). However, the lack of mathematical rigor and limited understanding of algorithmic tractability caused the impact of these works in machine learning and statistics to remain limited. A resurrection of interest in the statistical-physics approach to low-rank matrix decompositions came with the study of the stochastic block model for detection of clusters or communities in sparse networks. The problem of community detection was studied heuristically and algorithmically extensively in statistical physics; for a review see Fortunato (2010). However, the exact solution and understanding of algorithmic limitations in the stochastic block model came from the spin glass theory by Decelle et al....
[...]
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994). The importance of these early works in physics is acknowledged in some of the landmark papers on the subject in statistics (Johnstone and Lu, 2009). However, the lack of mathematical rigor and limited understanding of algorithmic tractability caused the impact of these works in machine learning and statistics to remain limited. A resurrection of interest in the statistical-physics approach to low-rank matrix decompositions came with the study of the stochastic block model for detection of clusters or communities in sparse networks. The problem of community detection was studied heuristically and algorithmically extensively in statistical physics; for a review see Fortunato (2010). However, the exact solution and understanding of algorithmic limitations in the stochastic block model came from the spin glass theory by Decelle et al. (2011a, 2011b). These works computed (nonrigorously) the asymptotically optimal performance and delimited sharply regions of parameters where this performance is reached by the belief propagation (BP) algorithm (Yedidia, Freeman, and Weiss, 2003). Second-order phase transitions appearing in the model separate a phase where clustering cannot be performed better than by random guessing from a region where it can be done efficiently with BP. First-order phase transitions and one of their spinodal lines then separate regions where clustering is impossible, possible but not doable with the BP algorithm, and easy with the BP algorithm. Decelle et al. (2011a, 2011b) also conjectured that when the BP algorithm is not able to reach the optimal performance on large instances of the model, then no other polynomial algorithm will. These works attracted a large amount of follow-up work in mathematics, statistics, machine learning, and computerscience communities. The statistical-physics understanding of the stochastic block model and the conjecture about belief propagation algorithm being optimal among all polynomial ones inspired the discovery of a new class of spectral algorithms for sparse data (i.e., when the matrix X is sparse) (Krzakala et al., 2013). Spectral algorithms are basic tools in data analysis (Ng, Jordan, and Weiss, 2002; Von Luxburg, 2007), based on the singular value decomposition of the matrix X or functions of X. Yet for sparse matrices X, the spectrum is known to have leading singular values with localized singular vectors unrelated to the latent underlying structure. A more robust spectral method is obtained by linearizing the belief propagation, thus obtaining a so-called nonbacktracking matrix (Krzakala et al., 2013). A variant on this spectral method based on algorithmic interpretation of the Hessian of the Bethe free energy also originated in physics (Saade, Krzakala, and Zdeborová, 2014). This line of statistical-physics-inspired research is merging into the mainstream in statistics and machine learning. This is largely thanks to recent progress in (a) our understanding of algorithmic limitations, due to the analysis of approximate message passing (AMP) algorithms (Rangan and Fletcher, 2012; Javanmard and Montanari, 2013; Matsushita and Tanaka, 2013; Bolthausen, 2014; Deshpande and Montanari, 2014) for low-rank matrix estimation that is a generalization of the Thouless-Anderson-Palmer equations (Thouless, Anderson, and Palmer, 1977) well known in the physics literature on spin glasses; and (b) progress in proving many of the corresponding results in a mathematically rigorous way. Some of the influential papers in this direction (related to low-rank matrix estimation) are Deshpande and Montanari (2014), Barbier et al. (2016), Lelarge and Miolane (2016), and Coja-Oghlan et al....
[...]
...One can compute with the replica method what is the information-theoretically best error in estimation of u , and v the student can possibly achieve, as done decades ago for some special choices of r, Pout, Pu, and Pv in Biehl and Mietzner (1993), Barkai and Sompolinsky (1994), and Watkin and Nadal (1994). The importance of these early works in physics is acknowledged in some of the landmark papers on the subject in statistics (Johnstone and Lu, 2009). However, the lack of mathematical rigor and limited understanding of algorithmic tractability caused the impact of these works in machine learning and statistics to remain limited. A resurrection of interest in the statistical-physics approach to low-rank matrix decompositions came with the study of the stochastic block model for detection of clusters or communities in sparse networks. The problem of community detection was studied heuristically and algorithmically extensively in statistical physics; for a review see Fortunato (2010). However, the exact solution and understanding of algorithmic limitations in the stochastic block model came from the spin glass theory by Decelle et al. (2011a, 2011b). These works computed (nonrigorously) the asymptotically optimal performance and delimited sharply regions of parameters where this performance is reached by the belief propagation (BP) algorithm (Yedidia, Freeman, and Weiss, 2003). Second-order phase transitions appearing in the model separate a phase where clustering cannot be performed better than by random guessing from a region where it can be done efficiently with BP. First-order phase transitions and one of their spinodal lines then separate regions where clustering is impossible, possible but not doable with the BP algorithm, and easy with the BP algorithm. Decelle et al. (2011a, 2011b) also conjectured that when the BP algorithm is not able to reach the optimal performance on large instances of the model, then no other polynomial algorithm will. These works attracted a large amount of follow-up work in mathematics, statistics, machine learning, and computerscience communities. The statistical-physics understanding of the stochastic block model and the conjecture about belief propagation algorithm being optimal among all polynomial ones inspired the discovery of a new class of spectral algorithms for sparse data (i.e., when the matrix X is sparse) (Krzakala et al., 2013). Spectral algorithms are basic tools in data analysis (Ng, Jordan, and Weiss, 2002; Von Luxburg, 2007), based on the singular value decomposition of the matrix X or functions of X. Yet for sparse matrices X, the spectrum is known to have leading singular values with localized singular vectors unrelated to the latent underlying structure. A more robust spectral method is obtained by linearizing the belief propagation, thus obtaining a so-called nonbacktracking matrix (Krzakala et al., 2013). A variant on this spectral method based on algorithmic interpretation of the Hessian of the Bethe free energy also originated in physics (Saade, Krzakala, and Zdeborová, 2014). This line of statistical-physics-inspired research is merging into the mainstream in statistics and machine learning. This is largely thanks to recent progress in (a) our understanding of algorithmic limitations, due to the analysis of approximate message passing (AMP) algorithms (Rangan and Fletcher, 2012; Javanmard and Montanari, 2013; Matsushita and Tanaka, 2013; Bolthausen, 2014; Deshpande and Montanari, 2014) for low-rank matrix estimation that is a generalization of the Thouless-Anderson-Palmer equations (Thouless, Anderson, and Palmer, 1977) well known in the physics literature on spin glasses; and (b) progress in proving many of the corresponding results in a mathematically rigorous way. Some of the influential papers in this direction (related to low-rank matrix estimation) are Deshpande and Montanari (2014), Barbier et al. (2016), Lelarge and Miolane (2016), and Coja-Oghlan et al. (2018) for the proof of the replica formula for the information-theoretically optimal performance....
[...]
1,474 citations
Cites background from "Pattern Recognition and Machine Lea..."
...here follows that presented in [183] and [42]....
[...]
...• Alternatively, consider the two outcome counts ]a = [52, 20, 28] and ]b = [44, 14, 42]....
[...]
1,425 citations
Cites background or methods from "Pattern Recognition and Machine Lea..."
...GMMs are typically classified as a parametric technique [26,24,41], because of the assumption that the data are generated from a weighted mixture of Gaussian distributions....
[...]
...GMMs estimate the probability density of the target class (here the normal class), given by a training set, typically using fewer kernels than the number of patterns in the training set [46]....
[...]
...In addition to DBNs, Bayesian networks are sometimes termed naïve Bayesian networks or Bayesian belief networks....
[...]
...Approaches to novelty detection include both Frequentist and Bayesian approaches, information theory, extreme value statistics, support vector methods, other kernel methods, and neural networks....
[...]
...Bayesian approaches have also been proposed for both offline and online changepoint detection schemes [153]....
[...]