Pattern Recognition and Machine Learning
Citations
148 citations
Cites methods from "Pattern Recognition and Machine Lea..."
...Using these regularities, a computer can classify data into different categories.(5) In the context of neuroimaging, brain images are treated as spatial patterns and pattern-recognition approaches are used to identify statistical properties of the data that discriminate beAuthor Affiliations are listed at...
[...]
148 citations
148 citations
148 citations
Cites background or methods from "Pattern Recognition and Machine Lea..."
...Summaries may be found in, for example, Chapter 10 of Bishop (2006) and Ormerod and Wand (2010)....
[...]
..., Bishop 2006, Section 10.2.5), that different product restrictions lead to identical MFVB approximations. In keeping with the notational conventions declared in Section 2.2 we will, from now on, suppress the subscripts on the q density functions. The MFVB solutions can be shown to satisfy q(θi) ∝ exp{Eq(θ−i)logp(θi|x,θ−i)}, 1 ≤ i ≤ 6, (7) where θ−i denotes the set {θ1, . . . , θ6} with θi excluded. Note that the expectation operator Eq(θ−i) depends on the particular product density form being assumed. The optimal parameters in these q density functions can be determined by an iterative coordinate ascent scheme induced by (7) aimed at maximizing the lower bound on the marginal log-likelihood: logp(x; q) ≡ Eq(θ){logp(x,θ)− logq(θ)} ≤ logp(x). If it is assumed that each iteration entails unique maximization of logp(x; q) with respect to the current θi, and that the search is restricted to a compact set, then convergence to a local maximizer of logp(x; q) is guaranteed (Luenberger and Ye 2008, p. 253). Successive values of logp(x; q) can be used to monitor convergence. At convergence q(θi), 1 ≤ i ≤ 6, and logp(x; q) are, respectively, the minimum Kullback-Leibler approximations to the posterior densities p(θi|x), 1 ≤ i ≤ 6, and the marginal log-likelihood logp(x). The extension to general Bayesian models with arbitrary parameter vectors and latent variables is straightforward. Summaries may be found in, for example, Chapter 10 of Bishop (2006) and Ormerod and Wand (2010). As described in these references, directed acyclic graph (DAG) representations of Bayesian hierarchical models are very...
[...]
...It is also possible, due to the notion of induced factorizations (e.g., Bishop 2006, Section 10.2.5), that different product restrictions lead to identical MFVB approximations....
[...]
...This result is very well-known and forms the basis of normal mixture fitting via the Expectation-Maximization algorithm (e.g., Bishop 2006)....
[...]
148 citations