Showing papers by "Michael I. Jordan published in 2012"

PDF

Open Access

Posted Content•

Variational Bayesian Inference with Stochastic Search

[...]

John Paisley¹, David M. Blei², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Princeton University²

27 Jun 2012-arXiv: Learning

TL;DR: This work presents an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound and demonstrates the approach on two non-conjugate models: logistic regression and an approximation to the HDP.

...read moreread less

Abstract: Mean-field variational inference is a method for approximate Bayesian posterior inference. It approximates a full posterior distribution with a factorized set of distributions by maximizing a lower bound on the marginal likelihood. This requires the ability to integrate a sum of terms in the log joint likelihood using this factorized distribution. Often not all integrals are in closed form, which is typically handled by using a lower bound. We present an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound. This method uses control variates to reduce the variance of the stochastic search gradient, in which existing lower bounds can play an important role. We demonstrate the approach on two non-conjugate models: logistic regression and an approximation to the HDP.

...read moreread less

355 citations

Proceedings Article•

Revisiting k-means: New Algorithms via Bayesian Nonparametrics

[...]

Brian Kulis¹, Michael I. Jordan²•Institutions (2)

Ohio State University¹, University of California, Berkeley²

26 Jun 2012

TL;DR: In this article, the authors revisited the k-means clustering algorithm from a Bayesian nonparametric viewpoint and showed that a Gibbs sampling algorithm for the Dirichlet process mixture approaches a hard clustering in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-mean-like clustering objective that includes a penalty for the number of clusters.

...read moreread less

Abstract: Bayesian models offer great flexibility for clustering applications--Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. Inspired by the asymptotic connection between k-means and mixtures of Gaussians, we show that a Gibbs sampling algorithm for the Dirichlet process mixture approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like clustering objective that includes a penalty for the number of clusters. We generalize this analysis to the case of clustering multiple data sets through a similar asymptotic argument with the hierarchical Dirichlet process. We also discuss further extensions that highlight the benefits of our analysis: i) a spectral relaxation involving thresholded eigenvectors, and ii) a normalized cut graph clustering algorithm that does not fix the number of clusters in the graph.

...read moreread less

326 citations

Posted Content•

A Generalized Mean Field Algorithm for Variational Inference in Exponential Families

[...]

Eric P. Xing¹, Michael I. Jordan¹, Stuart Russell¹•Institutions (1)

University of California, Berkeley¹

19 Oct 2012-arXiv: Learning

TL;DR: The generalized mean field (GMF) algorithm as discussed by the authors is a generalization of mean field theory for approximate inference in complex exponential family models, which involves limiting the optimization over the class of cluster-factorizable distributions.

...read moreread less

Abstract: The mean field methods, which entail approximating intractable probability distributions variationally with distributions from a tractable family, enjoy high efficiency, guaranteed convergence, and provide lower bounds on the true likelihood. But due to requirement for model-specific derivation of the optimization equations and unclear inference quality in various models, it is not widely used as a generic approximate inference algorithm. In this paper, we discuss a generalized mean field theory on variational approximation to a broad class of intractable distributions using a rich set of tractable distributions via constrained optimization over distribution spaces. We present a class of generalized mean field (GMF) algorithms for approximate inference in complex exponential family models, which entails limiting the optimization over the class of cluster-factorizable distributions. GMF is a generic method requiring no model-specific derivations. It factors a complex model into a set of disjoint variable clusters, and uses a set of canonical fix-point equations to iteratively update the cluster distributions, and converge to locally optimal cluster marginals that preserve the original dependency structure within each cluster, hence, fully decomposed the overall inference problem. We empirically analyzed the effect of different tractable family (clusters of different granularity) on inference quality, and compared GMF with BP on several canonical models. Possible extension to higher-order MF approximation is also discussed.

...read moreread less

196 citations

Proceedings Article•

Variational Bayesian Inference with Stochastic Search

[...]

David M. Blei¹, Michael I. Jordan², John Paisley²•Institutions (2)

Princeton University¹, University of California, Berkeley²

26 Jun 2012

TL;DR: In this article, an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound is presented. But this method requires the ability to integrate a sum of terms in the log joint likelihood using this factorized distribution, which is typically handled by using a lower bound.

...read moreread less

179 citations

Posted Content•

The Big Data Bootstrap

[...]

Ariel Kleiner¹, Ameet Talwalkar¹, Purnamrita Sarkar¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

27 Jun 2012-arXiv: Learning

TL;DR: The Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality, is presented.

...read moreread less

Abstract: The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we present the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality. BLB is well suited to modern parallel and distributed computing architectures and retains the generic applicability, statistical efficiency, and favorable theoretical properties of the bootstrap. We provide the results of an extensive empirical and theoretical investigation of BLB's behavior, including a study of its statistical correctness, its large-scale implementation and performance, selection of hyperparameters, and performance on real data.

...read moreread less

128 citations

Posted Content•

Nonparametric Link Prediction in Dynamic Networks

[...]

Purnamrita Sarkar¹, Deepayan Chakrabarti², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Facebook²

27 Jun 2012-arXiv: Learning

TL;DR: A nonparametric link prediction algorithm for a sequence of graph snapshots over time that predicts links based on the features of its endpoints as well as those of the local neighborhood around the endpoints, and proves the consistency of the estimator, and gives a fast implementation based on locality-sensitive hashing.

...read moreread less

Abstract: We propose a non-parametric link prediction algorithm for a sequence of graph snapshots over time. The model predicts links based on the features of its endpoints, as well as those of the local neighborhood around the endpoints. This allows for different types of neighborhoods in a graph, each with its own dynamics (e.g, growing or shrinking communities). We prove the consistency of our estimator, and give a fast implementation based on locality-sensitive hashing. Experiments with simulated as well as five real-world dynamic graphs show that we outperform the state of the art, especially when sharp fluctuations or non-linearities are present.

...read moreread less

120 citations

Journal Article•DOI•

Beta Processes, Stick-Breaking and Power Laws

[...]

Tamara Broderick, Michael I. Jordan, Jim Pitman

01 Jun 2012-Bayesian Analysis

TL;DR: In this article, the authors derive a stick-breaking representation for the Dirichlet process from the characterization of the beta process as a completely random measure, which they use to derive a three-parameter generalization of the Beta process.

...read moreread less

Abstract: The beta-Bernoulli process provides a Bayesian nonparametric prior for models involving collections of binary-valued features. A draw from the beta process yields an infinite collection of probabilities in the unit interval, and a draw from the Bernoulli process turns these into binary-valued features. Recent work has provided stick-breaking representations for the beta process analogous to the well-known stick-breaking representation for the Dirichlet process. We derive one such stick-breaking representation directly from the characterization of the beta process as a completely random measure. This approach motivates a three-parameter generalization of the beta process, and we study the power laws that can be obtained from this generalized beta process. We present a posterior inference algorithm for the beta-Bernoulli process that exploits the stick-breaking representation, and we present experimental results for a discrete factor-analysis model.

...read moreread less

94 citations

Journal Article•DOI•

Phylogenetic inference via sequential Monte Carlo.

[...]

Alexandre Bouchard-Côté¹, Sriram Sankararaman², Michael I. Jordan³•Institutions (3)

University of British Columbia¹, Harvard University², University of California, Berkeley³

01 Jul 2012-Systematic Biology

TL;DR: This paper develops an extension of classical SMC based on partially ordered sets and shows how to apply this framework—which is referred to as PosetSMC—to phylogenetic analysis, and provides a theoretical treatment and empirical results that demonstrate that Poset SMC is a very promising alternative to MCMC.

...read moreread less

Abstract: Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.

...read moreread less

93 citations

Posted Content•

Privacy Aware Learning

[...]

John C. Duchi¹, Michael I. Jordan¹, Martin J. Wainwright¹•Institutions (1)

University of California, Berkeley¹

07 Oct 2012-arXiv: Machine Learning

TL;DR: In this article, the authors study statistical risk minimization problems under a privacy model in which the data is kept confidential even from the learner, and establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures.

...read moreread less

Abstract: We study statistical risk minimization problems under a privacy model in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures. As a consequence, we exhibit a precise tradeoff between the amount of privacy the data preserves and the utility, as measured by convergence rate, of any statistical estimator or learning procedure.

...read moreread less

76 citations

Proceedings Article•

Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

[...]

Ke Jiang¹, Brian Kulis¹, Michael I. Jordan²•Institutions (2)

Ohio State University¹, University of California, Berkeley²

03 Dec 2012

TL;DR: This paper derives novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models.

...read moreread less

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the k-means and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis.

...read moreread less

73 citations

Proceedings Article•

Privacy Aware Learning

[...]

Martin J. Wainwright¹, Michael I. Jordan¹, John C. Duchi¹•Institutions (1)

University of California, Berkeley¹

03 Dec 2012

TL;DR: In this article, the authors study statistical risk minimization problems under a version of privacy in which the data is kept confidential even from the learner, and establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures.

...read moreread less

Abstract: We study statistical risk minimization problems under a version of privacy in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation procedures. As a consequence, we exhibit a precise tradeoff between the amount of privacy the data preserves and the utility, measured by convergence rate, of any statistical estimator.

...read moreread less

Journal Article•DOI•

Ergodic Mirror Descent

[...]

John C. Duchi, Alekh Agarwal, Mikael Johansson, Michael I. Jordan

04 Dec 2012-Siam Journal on Optimization

TL;DR: It is shown that as long as the source of randomness is suitably ergodic---it converges quickly enough to a stationary distribution---the method enjoys strong convergence guarantees, both in expectation and with high probability.

...read moreread less

Abstract: We generalize stochastic subgradient descent methods to situations in which we do not receive independent samples from the distribution over which we optimize, instead receiving samples coupled over time. We show that as long as the source of randomness is suitably ergodic---it converges quickly enough to a stationary distribution---the method enjoys strong convergence guarantees, both in expectation and with high probability. This result has implications for stochastic optimization in high-dimensional spaces, peer-to-peer distributed optimization schemes, decision problems with dependent data, and stochastic optimization problems over combinatorial spaces.

...read moreread less

Journal Article•DOI•

EP-GIG priors and applications in bayesian sparse learning

[...]

Zhihua Zhang¹, Shusen Wang¹, Dehua Liu¹, Michael I. Jordan²•Institutions (2)

Zhejiang University¹, University of California, Berkeley²

01 Jan 2012-Journal of Machine Learning Research

TL;DR: This paper defines such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG), a variant of generalized hyperbolic distributions, and shows that these algorithms bear an interesting resemblance to iteratively reweighted l2 or l1 methods.

...read moreread less

Abstract: In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EP-GIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweighted l2 or l1 methods. Finally, we present two extensions for grouped variable selection and logistic regression.

...read moreread less

Proceedings Article•DOI•

Active spectral clustering via iterative uncertainty reduction

[...]

Fabian L. Wauthier¹, Nebojsa Jojic², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Microsoft²

12 Aug 2012

TL;DR: An active learning algorithm is proposed that incrementally measures only those similarities that are most likely to remove uncertainty in an intermediate clustering solution, and shows a significant improvement in performance compared to the alternatives.

...read moreread less

Abstract: Spectral clustering is a widely used method for organizing data that only relies on pairwise similarity measurements. This makes its application to non-vectorial data straight-forward in principle, as long as all pairwise similarities are available. However, in recent years, numerous examples have emerged in which the cost of assessing similarities is substantial or prohibitive. We propose an active learning algorithm for spectral clustering that incrementally measures only those similarities that are most likely to remove uncertainty in an intermediate clustering solution. In many applications, similarities are not only costly to compute, but also noisy. We extend our algorithm to maintain running estimates of the true similarities, as well as estimates of their accuracy. Using this information, the algorithm updates only those estimates which are relatively inaccurate and whose update would most likely remove clustering uncertainty. We compare our methods on several datasets, including a realistic example where similarities are expensive and noisy. The results show a significant improvement in performance compared to the alternatives.

...read moreread less

Proceedings Article•

The Big Data Bootstrap

[...]

Ariel Kleiner¹, Ameet Talwalkar¹, Purnamrita Sarkar¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

26 Jun 2012

TL;DR: The Bag of Little Bootstraps (BLB) as mentioned in this paper is a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality.

...read moreread less

Proceedings Article•

Ancestor Sampling for Particle Gibbs

[...]

Fredrik Lindsten¹, Thomas B. Schön¹, Michael I. Jordan²•Institutions (2)

Linköping University¹, University of California, Berkeley²

03 Dec 2012

TL;DR: This work presents a novel method in the family of particle MCMC methods that it refers to as particle Gibbs with ancestor sampling (PG-AS), and develops a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework.

...read moreread less

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models.

...read moreread less

Proceedings Article•

Nonparametric Link Prediction in Dynamic Networks

[...]

Purnamrita Sarkar¹, Deepayan Chakrabarti², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Facebook²

26 Jun 2012

TL;DR: In this article, the authors propose a nonparametric link prediction algorithm for a sequence of graph snapshots over time, which predicts links based on the features of its endpoints, as well as those of the local neighborhood around the endpoints.

...read moreread less

Abstract: We propose a nonparametric link prediction algorithm for a sequence of graph snapshots over time. The model predicts links based on the features of its endpoints, as well as those of the local neighborhood around the endpoints. This allows for different types of neighborhoods in a graph, each with its own dynamics (e.g, growing or shrinking communities). We prove the consistency of our estimator, and give a fast implementation based on locality-sensitive hashing. Experiments with simulated as well as five real-world dynamic graphs show that we outperform the state of the art, especially when sharp fluctuations or nonlinearities are present.

...read moreread less

Proceedings Article•

Stick-Breaking Beta Processes and the Poisson Process

[...]

John Paisley¹, David M. Blei², Michael I. Jordan³•Institutions (3)

Duke University¹, Princeton University², University of California, Berkeley³

21 Mar 2012

TL;DR: It is shown that the stick-breaking construction of the beta process due to Paisley et al. (2010) can be obtained from the characterization of the Beta process as a Poisson process, and this underlying representation is used to derive error bounds on truncated beta processes that are tighter than those in the literature.

...read moreread less

Abstract: We show that the stick-breaking construction of the beta process due to Paisley et al. (2010) can be obtained from the characterization of the beta process as a Poisson process. Specifically, we show that the mean measure of the underlying Poisson process is equal to that of the beta process. We use this underlying representation to derive error bounds on truncated beta processes that are tighter than those in the literature. We also develop a new MCMC inference algorithm for beta processes, based in part on our new Poisson process construction.

...read moreread less

Posted Content•

Active Learning for Crowd-Sourced Databases

[...]

Barzan Mozafari, Purnamrita Sarkar, Michael J. Franklin, Michael I. Jordan, Samuel Madden - Show less +1 more

17 Sep 2012-arXiv: Learning

TL;DR: Two new active learning algorithms are presented to combine humans and algorithms together in a crowd-sourced database, based on the theory of non-parametric bootstrap, which makes their results applicable to a broad class of machine learning models.

...read moreread less

Abstract: Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks where humans are more accurate than computers, e.g., labeling images, matching objects, or analyzing sentiment. However, relying solely on the crowd is often impractical even for data sets with thousands of items, due to time and cost constraints of acquiring human input (which cost pennies and minutes per label). In this paper, we propose algorithms for integrating machine learning into crowd-sourced databases, with the goal of allowing crowd-sourcing applications to scale, i.e., to handle larger datasets at lower costs. The key observation is that, in many of the above tasks, humans and machine learning algorithms can be complementary, as humans are often more accurate but slow and expensive, while algorithms are usually less accurate, but faster and cheaper. Based on this observation, we present two new active learning algorithms to combine humans and algorithms together in a crowd-sourced database. Our algorithms are based on the theory of non-parametric bootstrap, which makes our results applicable to a broad class of machine learning models. Our results, on three real-life datasets collected with Amazon's Mechanical Turk, and on 15 well-known UCI data sets, show that our methods on average ask humans to label one to two orders of magnitude fewer items to achieve the same accuracy as a baseline that labels random images, and two to eight times fewer questions than previous active learning schemes.

...read moreread less

Posted Content•

Ancestor Sampling for Particle Gibbs

[...]

Fredrik Lindsten, Michael I. Jordan, Thomas B. Schön

25 Oct 2012-arXiv: Computation

TL;DR: In this article, a particle Gibbs with ancestor sampling (PG-AS) method was proposed to improve the mixing of the particle MCMC kernel in a single forward sweep instead of using separate forward and backward sweeps.

...read moreread less

Journal Article•DOI•

Cluster and Feature Modeling from Combinatorial Stochastic Processes

[...]

Tamara Broderick¹, Michael I. Jordan¹, Jim Pitman¹•Institutions (1)

University of California, Berkeley¹

26 Jun 2012-arXiv: Statistics Theory

TL;DR: This paper develops analogous representations for the feature modeling problem of Bayesian nonparametric clustering, which include the beta process and the Indian buffet process as well as new representations that provide insight into the connections between these processes.

...read moreread less

Abstract: One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet process and the Chinese restaurant process. In this paper we provide a formal development of an analogous problem, called feature modeling, for associating data points with arbitrary nonnegative integer numbers of groups, now called features or topics. We review the existing combinatorial stochastic process representations for the clustering problem and develop analogous representations for the feature modeling problem. These representations include the beta process and the Indian buffet process as well as new representations that provide insight into the connections between these processes. We thereby bring the same level of completeness to the treatment of Bayesian nonparametric feature modeling that has previously been achieved for Bayesian nonparametric clustering.

...read moreread less

Proceedings Article•

Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

[...]

Andre Wibisono¹, Martin J. Wainwright¹, Michael I. Jordan¹, John C. Duchi¹•Institutions (1)

University of California, Berkeley¹

03 Dec 2012

TL;DR: This work considers derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates and shows that if pairs of function values are available, algorithms that use gradient estimates based on random perturbations suffer a factor of at most √d in convergence rate over traditional Stochastic gradient methods.

...read moreread less

Abstract: We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that use gradient estimates based on random perturbations suffer a factor of at most √d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problem-dependent quantities: they cannot be improved by more than constant factors.

...read moreread less

Posted Content•

Tree-dependent Component Analysis

[...]

Francis Bach¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

12 Dec 2012-arXiv: Learning

TL;DR: This paper presents a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, it is shown that the optimal transform is found by minimizing a contrast function based on mutual information.

...read moreread less

Abstract: We present a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a tree-structured graphical model. Treating the problem as a semiparametric statistical problem, we show that the optimal transform is found by minimizing a contrast function based on mutual information, a function that directly extends the contrast function used for classical ICA. We provide two approximations of this contrast function, one using kernel density estimation, and another using kernel generalized variance. This tree-dependent component analysis framework leads naturally to an efficient general multivariate density estimation technique where only bivariate density estimation needs to be performed.

...read moreread less

Posted Content•

Graph partition strategies for generalized mean field inference

[...]

Eric P. Xing¹, Michael I. Jordan¹, Stuart Russell¹•Institutions (1)

University of California, Berkeley¹

11 Jul 2012-arXiv: Learning

TL;DR: This paper presents a novel combination of graph partitioning algorithms with a generalized mean field (GMF) inference algorithm that optimizes over disjoint clustering of variables and performs inference using those clusters.

...read moreread less

Abstract: An autonomous variational inference algorithm for arbitrary graphical models requires the ability to optimize variational approximations over the space of model parameters as well as over the choice of tractable families used for the variational approximation. In this paper, we present a novel combination of graph partitioning algorithms with a generalized mean field (GMF) inference algorithm. This combination optimizes over disjoint clustering of variables and performs inference using those clusters. We provide a formal analysis of the relationship between the graph cut and the GMF approximation, and explore several graph partition strategies empirically. Our empirical results provide rather clear support for a weighted version of MinCut as a useful clustering algorithm for GMF inference, which is consistent with the implications from the formal analysis.

...read moreread less

A Million Cancer Genome Warehouse

[...]

David Haussler, David A. Patterson, Mark Diekhans, Armando Fox, Michael I. Jordan, Anthony D. Joseph, Singer Ma, Benedict Paten, Scott Shenker, Taylor Sittler, Ion Stoica - Show less +7 more

20 Nov 2012

TL;DR: The Million Cancer Genome Warehouse as mentioned in this paper is an example of an information commons and a computing system that will bring about precision medicine, coupling established clinical pathological indexes with state-of-the-art molecular profiling to create diagnostic, prognostic and therapeutic strategies precisely tailored to each patient's individual requirements.

...read moreread less

Abstract: : This white paper discusses the motivation and issues surrounding the development of a repository and associated computational infrastructure to house and process a million genomes to help battle cancer, which we call the Million Cancer Genome Warehouse It is proposed as an example of an information commons and a computing system that will bring about precision medicine, coupling established clinical pathological indexes with state-of-the-art molecular profiling to create diagnostic, prognostic, and therapeutic strategies precisely tailored to each patient's individual requirements The goal of the white paper is to stimulate discussion so as to help reach consensus about the need to construct a Million Cancer Genome Warehouse and what its nature should be To try to anticipate concerns, including thorough cost estimates, it covers topics as varied as high-level health policy issues to low-level details about statistical analysis, data formats and structures, software design, and hardware construction and cost

...read moreread less

Journal Article•DOI•

Matrix concentration inequalities via the method of exchangeable pairs

[...]

Lester Mackey, Michael I. Jordan, Richard Chen, Brendan Farrell, Joel A. Tropp - Show less +1 more

28 Jan 2012-arXiv: Probability

TL;DR: In this paper, a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein's method of exchangeable pairs is presented. But it is not a generalization of the classical inequalities due to Hoeffding, Bernstein, Khintchine and Rosenthal.

...read moreread less

Abstract: This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein's method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables.

...read moreread less

Journal Article•DOI•

The asymptotics of ranking algorithms

[...]

John C. Duchi, Lester Mackey, Michael I. Jordan

07 Apr 2012-arXiv: Statistics Theory

TL;DR: This work presents a new approach to supervised ranking based on aggregation of partial preferences, and develops $U$-statistic-based empirical risk minimization procedures that yield consistency results that parallel those available for classification.

...read moreread less

Abstract: We consider the predictive problem of supervised ranking, where the task is to rank sets of candidate items returned in response to queries Although there exist statistical procedures that come with guarantees of consistency in this setting, these procedures require that individuals provide a complete ranking of all items, which is rarely feasible in practice Instead, individuals routinely provide partial preference information, such as pairwise comparisons of items, and more practical approaches to ranking have aimed at modeling this partial preference data directly As we show, however, such an approach raises serious theoretical challenges Indeed, we demonstrate that many commonly used surrogate losses for pairwise comparison data do not yield consistency; surprisingly, we show inconsistency even in low-noise settings With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences, and we develop $U$-statistic-based empirical risk minimization procedures We present an asymptotic analysis of these new procedures, showing that they yield consistency results that parallel those available for classification We complement our theoretical results with an experiment studying the new procedures in a large-scale web-ranking task

...read moreread less

Posted Content•

Modeling Events with Cascades of Poisson Processes

[...]

Aleksandr Simma¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

15 Mar 2012-arXiv: Learning

TL;DR: In this article, a probabilistic model of events in continuous time is presented, in which each event triggers a Poisson process of successor events, and the ensemble of observed events is thereby modeled as a superposition of Poisson processes.

...read moreread less

Abstract: We present a probabilistic model of events in continuous time in which each event triggers a Poisson process of successor events. The ensemble of observed events is thereby modeled as a superposition of Poisson processes. Efficient inference is feasible under this model with an EM algorithm. Moreover, the EM algorithm can be implemented as a distributed algorithm, permitting the model to be applied to very large datasets. We apply these techniques to the modeling of Twitter messages and the revision history of Wikipedia.

...read moreread less

Proceedings Article•DOI•

Divide-and-conquer and statistical inference for big data

[...]

Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

12 Aug 2012

TL;DR: A new procedure, the "bag of little bootstraps," is presented, which circumvents this problem, inheriting the favorable theoretical properties of the bootstrap but also having a much more favorable computational profile.

...read moreread less

Abstract: I present some recent work on statistical inference for Big Data. Divide-and-conquer is a natural computational paradigm for approaching Big Data problems, particularly given recent developments in distributed and parallel computing, but some interesting challenges arise when applying divide-and-conquer algorithms to statistical inference problems. One interesting issue is that of obtaining confidence intervals in massive datasets.The bootstrap principle suggests resampling data to obtain fluctuations in the values of estimators, and thereby confidence intervals, but this is infeasible with massive data. Subsampling the data yields fluctuations on the wrong scale, which have to be corrected to provide calibrated statistical inferences. I present a new procedure, the "bag of little bootstraps," which circumvents this problem, inheriting the favorable theoretical properties of the bootstrap but also having a much more favorable computational profile. Another issue that I discuss is the problem of large-scale matrix completion. Here divide-and-conquer is a natural heuristic that works well in practice, but new theoretical problems arise when attempting to characterize the statistical performance of divide-and-conquer algorithms. Here the theoretical support is provided by concentration theorems for random matrices, and I present a new approach to this problem based on Stein's method1.

...read moreread less

Journal Article•DOI•

A Semiparametric Bayesian Approach to Wiener System Identification

[...]

Fredrik Lindsten¹, Thomas B. Schön¹, Michael I. Jordan²•Institutions (2)

Linköping University¹, University of California, Berkeley²

01 Jul 2012-IFAC Proceedings Volumes

TL;DR: In this article, a particle filter is used to generate a sample state trajectory in a Markov chain Monte Carlo sampler, which has been shown to be efficient even when we use very few particles in the PF.

...read moreread less