Showing papers by "Michael I. Jordan published in 2011"

PDF

Open Access

Journal Article•DOI•

Learning Dependency-Based Compositional Semantics

[...]

Percy Liang¹, Michael I. Jordan¹, Dan Klein¹•Institutions (1)

19 Jun 2011

TL;DR: A new semantic formalism, dependency-based compositional semantics (DCS) is developed and a log-linear distribution over DCS logical forms is defined and it is shown that the system obtains comparable accuracies to even state-of-the-art systems that do require annotated logical forms.

...read moreread less

Abstract: Compositional question answering begins by mapping questions to logical forms, but training a semantic parser to perform this mapping typically requires the costly annotation of the target logical forms. In this paper, we learn to map questions to answers via latent logical forms, which are induced automatically from question-answer pairs. In tackling this challenging learning problem, we introduce a new semantic representation which highlights a parallel between dependency syntax and efficient evaluation of logical forms. On two standard semantic parsing benchmarks (Geo and Jobs), our system obtains the highest published accuracies, despite requiring no annotated logical forms.

...read moreread less

651 citations

Proceedings Article•DOI•

Managing data transfers in computer clusters with orchestra

[...]

Mosharaf Chowdhury¹, Matei Zaharia¹, Justin Ma¹, Michael I. Jordan¹, Ion Stoica¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

15 Aug 2011

TL;DR: This work proposes a global management architecture and a set of algorithms that improve the transfer times of common communication patterns, such as broadcast and shuffle, and allow scheduling policies at the transfer level,such as prioritizing a transfer over other transfers.

...read moreread less

Abstract: Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.

...read moreread less

612 citations

Journal Article•DOI•

A Sticky HDP-HMM With Application to Speaker Diarization

[...]

Emily B. Fox¹, Erik B. Sudderth¹, Michael I. Jordan¹, Alan S. Willsky¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 2011-The Annals of Applied Statistics

TL;DR: In this article, a Bayesian nonparametric approach to speaker diarization is proposed, which builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al.

...read moreread less

Abstract: We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566–1581]. Although the basic HDP-HMM tends to over-segment the audio data—creating redundant states and rapidly switching among them—we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.

...read moreread less

289 citations

Journal Article•DOI•

Support union recovery in high-dimensional multivariate regression

[...]

Guillaume Obozinski, Martin J. Wainwright, Michael I. Jordan

01 Feb 2011-Annals of Statistics

TL;DR: In this article, the multivariate group Lasso was shown to exhibit a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter (n,p,s) := n/[2 (B )log(p s).

...read moreread less

Abstract: multivariate group Lasso, in which block regularization based on the ‘1/‘2 norm is used for support union recovery, or recovery of the set of s rows for which B is non-zero. Under high-dimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter (n,p,s) := n/[2 (B )log(p s)]. Here n is the sample size, and (B ) is a sparsity-overlap function measuring a combination of the sparsities and overlaps of the K-regression coecient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n,p,s) such that (n,p,s) exceeds a critical level u, and fails for sequences such that (n,p,s) lies below a critical level ‘. For the special case of the standard Gaussian ensemble, we show that ‘ = u so that the characterization is sharp. The sparsity-overlap function (B ) reveals that, if the design is uncorrelated on the active rows, ‘1/‘2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the regression vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.

...read moreread less

284 citations

A sticky HDP-HMM with application to speaker diarization

[...]

Emily B. Fox¹, Erik B. Sudderth¹, Michael I. Jordan¹, Alan S. Willsky¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 2011

TL;DR: An augmented HDP-HMM is described that provides effective control over the switching rate and makes it possible to treat emission distributions nonparametrically, and a sampling algorithm is developed that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence.

...read moreread less

Abstract: United States. Air Force Office of Scientific Research (Grant FA9550-06-1-0324); United States. Army Research Office (Grant W911NF-06-1-0076); United States. Air Force Office of Scientific Research (Grant FA9559-08-1-0180); United States. Defense Advanced Research Projects Agency. Information Processing Techniques Office (Contract FA8750-05-2-0249)

...read moreread less

274 citations

Journal Article•DOI•

Bayesian Nonparametric Inference of Switching Dynamic Linear Models

[...]

Emily B. Fox¹, Erik B. Sudderth², Michael I. Jordan³, Alan S. Willsky⁴•Institutions (4)

Duke University¹, Brown University², University of California, Berkeley³, Massachusetts Institute of Technology⁴

01 Apr 2011-IEEE Transactions on Signal Processing

TL;DR: In this article, a Bayesian nonparametric approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes, and additionally employs automatic relevance determination to infer a sparse set of dynamic dependencies allowing to learn SLDS with varying state dimension or switching VAR processes with varying autoregressive order.

...read moreread less

Abstract: Many complex dynamical phenomena can be effectively modeled by a system that switches among a set of conditionally linear dynamical modes. We consider two such models: the switching linear dynamical system (SLDS) and the switching vector autoregressive (VAR) process. Our Bayesian nonparametric approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes. We additionally employ automatic relevance determination to infer a sparse set of dynamic dependencies allowing us to learn SLDS with varying state dimension or switching VAR processes with varying autoregressive order. We develop a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences. The utility and flexibility of our model are demonstrated on synthetic data, sequences of dancing honey bees, the IBOVESPA stock index and a maneuvering target tracking application.

...read moreread less

249 citations

Posted Content•

A Scalable Bootstrap for Massive Data

[...]

Ariel Kleiner¹, Ameet Talwalkar¹, Purnamrita Sarkar¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

21 Dec 2011-arXiv: Methodology

TL;DR: The Bag of Little Bootstraps (BLB) as mentioned in this paper is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators.

...read moreread less

Abstract: The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets---which are increasingly prevalent---the computation of bootstrap-based quantities can be prohibitively demanding computationally. While variants such as subsampling and the $m$ out of $n$ bootstrap can be used in principle to reduce the cost of bootstrap computations, we find that these methods are generally not robust to specification of hyperparameters (such as the number of subsampled data points), and they often require use of more prior information (such as rates of convergence of estimators) than the bootstrap. As an alternative, we introduce the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators. BLB is well suited to modern parallel and distributed computing architectures and furthermore retains the generic applicability and statistical efficiency of the bootstrap. We demonstrate BLB's favorable statistical performance via a theoretical analysis elucidating the procedure's properties, as well as a simulation study comparing BLB to the bootstrap, the $m$ out of $n$ bootstrap, and subsampling. In addition, we present results from a large-scale distributed implementation of BLB demonstrating its computational superiority on massive data, a method for adaptively selecting BLB's hyperparameters, an empirical study applying BLB to several real datasets, and an extension of BLB to time series data.

...read moreread less

245 citations

Posted Content•

Divide-and-Conquer Matrix Factorization

[...]

Lester Mackey, Ameet Talwalkar, Michael I. Jordan

05 Jul 2011

TL;DR: The experiments with collaborative filtering, video background modeling, and simulated data demonstrate the near-linear to super-linear speed-ups attainable with DFC, and the analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm.

...read moreread less

Abstract: If learning methods are to scale to the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing. Inspired by the recent development of matrix factorization methods with rich theory but poor computational complexity and by the relative ease of mapping matrices onto distributed architectures, we introduce a scalable divide-and-conquer framework for noisy matrix factorization. We present a thorough theoretical analysis of this framework in which we characterize the statistical errors introduced by the "divide" step and control their magnitude in the "conquer" step, so that the overall algorithm enjoys high-probability estimation guarantees comparable to those of its base algorithm. We also present experiments in collaborative filtering and video background modeling that demonstrate the near-linear to superlinear speed-ups attainable with this approach.

...read moreread less

207 citations

Proceedings Article•DOI•

The SCADS director: scaling a distributed storage system under stringent performance requirements

[...]

Beth Trushkowsky¹, Peter Bodik¹, Armando Fox¹, Michael J. Franklin¹, Michael I. Jordan¹, David A. Patterson¹ - Show less +2 more•Institutions (1)

University of California, Berkeley¹

15 Feb 2011

TL;DR: This work design and evaluate the SCADS Director, a control framework that reconfigures the storage system on-the-fly in response to workload changes using a performance model of the system and demonstrates that such a framework can respond to both unexpected data hotspots and diurnal workload patterns without violating strict performance SLOs.

...read moreread less

Abstract: Elasticity of cloud computing environments provides an economic incentive for automatic resource allocation of stateful systems running in the cloud. However, these systems have to meet strict performance Service-Level Objectives (SLOs) expressed using upper percentiles of request latency, such as the 99th. Such latency measurements are very noisy, which complicates the design of the dynamic resource allocation. We design and evaluate the SCADS Director, a control framework that reconfigures the storage system on-the-fly in response to workload changes using a performance model of the system. We demonstrate that such a framework can respond to both unexpected data hotspots and diurnal workload patterns without violating strict performance SLOs.

...read moreread less

159 citations

Proceedings Article•

Bayesian Bias Mitigation for Crowdsourcing

[...]

Fabian L. Wauthier¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

12 Dec 2011

TL;DR: This work presents Bayesian Bias Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all three steps of data curation and learning and proposes a general approximation strategy for Markov chains to efficiently quantify the effect of a perturbation on the stationary distribution.

...read moreread less

Abstract: Biased labelers are a systemic problem in crowdsourcing, and a comprehensive toolbox for handling their responses is still being developed. A typical crowdsourcing application can be divided into three steps: data collection, data curation, and learning. At present these steps are often treated separately. We present Bayesian Bias Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all three. Most data curation methods account for the effects of labeler bias by modeling all labels as coming from a single latent truth. Our model captures the sources of bias by describing labelers as influenced by shared random effects. This approach can account for more complex bias patterns that arise in ambiguous or hard labeling tasks and allows us to merge data curation and learning into a single computation. Active learning integrates data collection with learning, but is commonly considered infeasible with Gibbs sampling inference. We propose a general approximation strategy for Markov chains to efficiently quantify the effect of a perturbation on the stationary distribution and specialize this approach to active learning. Experiments show BBMC to outperform many common heuristics.

...read moreread less

135 citations

Bayesian nonparametric latent feature models

[...]

Michael I. Jordan¹, Kurt T. Miller¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2011

TL;DR: This dissertation summarizes the work advancing the state of the art in all three of these areas of research in Warriors for Bayesian nonparametric latent feature models, presenting a non-exchangeable framework for generalizing and extending the original priors and introducing four concrete generalizations applicable when the authors have prior knowledge about object relationships that can be captured either via a tree or chain.

...read moreread less

Abstract: Priors for Bayesian nonparametric latent feature models were originally developed a little over five years ago, sparking interest in a new type of Bayesian nonparametric model. Since then, there have been three main areas of research for people interested in these priors: extensions/generalizations of the priors, inference algorithms, and applications. This dissertation summarizes our work advancing the state of the art in all three of these areas. In the first area, we present a non-exchangeable framework for generalizing and extending the original priors, allowing more prior knowledge to be used in nonparametric priors. Within this framework, we introduce four concrete generalizations that are applicable when we have prior knowledge about object relationships that can be captured either via a tree or chain. We discuss how to develop and derive these priors as well as how to perform posterior inference in models using them. In the area of inference algorithms, we present the first variational approximation for one class of these priors, demonstrating in what regimes they might be preferred over more traditional MCMC approaches. Finally, we present an application of basic nonparametric latent features models to link prediction as well as applications of our non-exchangeable priors to tree-structured choice models and human genomic data.

...read moreread less

Posted Content•

Learning Dependency-Based Compositional Semantics

[...]

Percy Liang¹, Michael I. Jordan¹, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

30 Sep 2011-arXiv: Artificial Intelligence

TL;DR: In this article, a dependency-based compositional semantics (DCS) formalism is proposed to learn a semantic parser from question-answer pairs instead, where the logical form is modeled as a latent variable.

...read moreread less

Abstract: Suppose we want to build a system that answers a natural language question by representing its semantics as a logical form and computing the answer given a structured database of facts. The core part of such a system is the semantic parser that maps questions to logical forms. Semantic parsers are typically trained from examples of questions annotated with their target logical forms, but this type of annotation is expensive. Our goal is to learn a semantic parser from question-answer pairs instead, where the logical form is modeled as a latent variable. Motivated by this challenging learning problem, we develop a new semantic formalism, dependency-based compositional semantics (DCS), which has favorable linguistic, statistical, and computational properties. We define a log-linear distribution over DCS logical forms and estimate the parameters using a simple procedure that alternates between beam search and numerical optimization. On two standard semantic parsing benchmarks, our system outperforms all existing state-of-the-art systems, despite using no annotated logical forms.

...read moreread less

Posted Content•

Revisiting k-means: New Algorithms via Bayesian Nonparametrics

[...]

Brian Kulis¹, Michael I. Jordan²•Institutions (2)

Ohio State University¹, University of California, Berkeley²

02 Nov 2011-arXiv: Learning

TL;DR: This paper shows that a Gibbs sampling algorithm for the Dirichlet process mixture approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like clustering objective that includes a penalty for the number of clusters.

...read moreread less

Abstract: Bayesian models offer great flexibility for clustering applications---Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. Inspired by the asymptotic connection between k-means and mixtures of Gaussians, we show that a Gibbs sampling algorithm for the Dirichlet process mixture approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like clustering objective that includes a penalty for the number of clusters. We generalize this analysis to the case of clustering multiple data sets through a similar asymptotic argument with the hierarchical Dirichlet process. We also discuss further extensions that highlight the benefits of our analysis: i) a spectral relaxation involving thresholded eigenvectors, and ii) a normalized cut graph clustering algorithm that does not fix the number of clusters in the graph.

...read moreread less

Posted Content•

Ergodic Mirror Descent

[...]

John C. Duchi, Alekh Agarwal, Mikael Johansson, Michael I. Jordan

24 May 2011-arXiv: Optimization and Control

TL;DR: It is shown that as long as the source of randomness is suitably ergodic — it converges quickly enough to a stationary distribution — the method enjoys strong convergence guarantees, both in expectation and with high probability.

...read moreread less

Abstract: We generalize stochastic subgradient descent methods to situations in which we do not receive independent samples from the distribution over which we optimize, but instead receive samples that are coupled over time. We show that as long as the source of randomness is suitably ergodic---it converges quickly enough to a stationary distribution---the method enjoys strong convergence guarantees, both in expectation and with high probability. This result has implications for stochastic optimization in high-dimensional spaces, peer-to-peer distributed optimization schemes, decision problems with dependent data, and stochastic optimization problems over combinatorial spaces.

...read moreread less

Journal Article•DOI•

Genome-scale phylogenetic function annotation of large and diverse protein families.

[...]

Barbara E. Engelhardt¹, Michael I. Jordan, John R. Srouji, Steven E. Brenner•Institutions (1)

University of California, Berkeley¹

22 Jul 2011-Genome Research

TL;DR: A revised approach (SIFTER version 2.0) that enables annotations on a genomic scale and is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses.

...read moreread less

Abstract: The Statistical Inference of Function Through Evolutionary Relationships (SIFTER) framework uses a statistical graphical model that applies phylogenetic principles to automate precise protein function prediction. Here we present a revised approach (SIFTER version 2.0) that enables annotations on a genomic scale. SIFTER 2.0 produces equivalently precise predictions compared to the earlier version on a carefully studied family and on a collection of 100 protein families. We have added an approximation method to SIFTER 2.0 and show a 500-fold improvement in speed with minimal impact on prediction results in the functionally diverse sulfotransferase protein family. On the Nudix protein family, previously inaccessible to the SIFTER framework because of the 66 possible molecular functions, SIFTER achieved 47.4% accuracy on experimental data (where BLAST achieved 34.0%). Finally, we used SIFTER to annotate all of the Schizosaccharomyces pombe proteins with experimental functional characterizations, based on annotations from proteins in 46 fungal genomes. SIFTER precisely predicted molecular function for 45.5% of the characterized proteins in this genome, as compared with four current function prediction methods that precisely predicted function for 62.6%, 30.6%, 6.0%, and 5.7% of these proteins. We use both precision-recall curves and ROC analyses to compare these genome-scale predictions across the different methods and to assess performance on different types of applications. SIFTER 2.0 is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses. The code for SIFTER and protein family data are available at http://sifter.berkeley.edu.

...read moreread less

Proceedings Article•

Nonparametric Bayesian Co-clustering Ensembles.

[...]

Pu Wang¹, Kathryn B. Laskey¹, Carlotta Domeniconi¹, Michael I. Jordan²•Institutions (2)

George Mason University¹, University of California, Berkeley²

01 Jan 2011

TL;DR: A nonparametric Bayesian approach to co-clustering ensembles is presented and a Mondrian Process as a prior distribution over partitions of the data matrix is employed to model non-independence of row and column clusters.

...read moreread less

Abstract: A nonparametric Bayesian approach to co-clustering ensembles is presented. Similar to clustering ensembles, coclustering ensembles combine various base co-clustering results to obtain a more robust consensus co-clustering. To avoid pre-specifying the number of co-clusters, we specify independent Dirichlet process priors for the row and column clusters. Thus, the numbers of rowand column-clusters are unbounded a priori; the actual numbers of clusters can be learned a posteriori from observations. Next, to model non-independence of rowand column-clusters, we employ a Mondrian Process as a prior distribution over partitions of the data matrix. As a result, the co-clusters are not restricted to a regular grid partition, but form nested partitions with varying resolutions. The empirical evaluation demonstrates the effectiveness of nonparametric Bayesian co-clustering ensembles and their advantages over traditional co-clustering methods.

...read moreread less

Proceedings Article•

Dimensionality Reduction for Spectral Clustering

[...]

Donglin Niu¹, Jennifer G. Dy¹, Michael I. Jordan²•Institutions (2)

Northeastern University¹, University of California, Berkeley²

14 Jun 2011

TL;DR: This work introduces an augmented form of spectral clustering in which an explicit projection operator is incorporated in the relaxed optimization functional and optimize this functional over both the projection and the spectral embedding.

...read moreread less

Abstract: Spectral clustering is a flexible clustering methodology that is applicable to a variety of data types and has the particular virtue that it makes few assumptions on cluster shapes. It has become popular in a variety of application areas, particularly in computational vision and bioinformatics. The approach appears, however, to be particularly sensitive to irrelevant and noisy dimensions in the data. We thus introduce an approach that automatically learns the relevant dimensions and spectral clustering simultaneously. We pursue an augmented form of spectral clustering in which an explicit projection operator is incorporated in the relaxed optimization functional. We optimize this functional over both the projection and the spectral embedding. Experiments on simulated and real data show that this approach yields significant improvements in the performance of spectral clustering.

...read moreread less

Journal Article•DOI•

Bayesian Generalized Kernel Mixed Models

[...]

Zhihua Zhang¹, Guang Dai, Michael I. Jordan²•Institutions (2)

Zhejiang University¹, University of California, Berkeley²

01 Feb 2011-Journal of Machine Learning Research

TL;DR: A mixture of a point-mass distribution and Silverman's g-prior on the regression vector of a generalized kernel model (GKM) allows a fraction of the components of the regressionvector to be zero and leads to a flexible approximation method for GPs.

...read moreread less

Abstract: We propose a fully Bayesian methodology for generalized kernel mixed models (GKMMs), which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel. We place a mixture of a point-mass distribution and Silverman's g-prior on the regression vector of a generalized kernel model (GKM). This mixture prior allows a fraction of the components of the regression vector to be zero. Thus, it serves for sparse modeling and is useful for Bayesian computation. In particular, we exploit data augmentation methodology to develop a Markov chain Monte Carlo (MCMC) algorithm in which the reversible jump method is used for model selection and a Bayesian model averaging method is used for posterior prediction. When the feature basis expansion in the reproducing kernel Hilbert space is treated as a stochastic process, this approach can be related to the Karhunen-Loeve expansion of a Gaussian process (GP). Thus, our sparse modeling framework leads to a flexible approximation method for GPs.

...read moreread less

Journal Article•DOI•

Bayesian inference for queueing networks and modeling of internet services

[...]

Charles Sutton, Michael I. Jordan¹•Institutions (1)

University of Edinburgh¹

01 Mar 2011-The Annals of Applied Statistics

TL;DR: In this paper, a Bayesian perspective on queueing models in which the arrival and departure times that are not observed are treated as latent variables is developed, where the posterior distribution over missing data and model parameters using Markov chain Monte Carlo.

...read moreread less

Abstract: Modern Internet services, such as those at Google, Yahoo!, and Amazon, handle billions of requests per day on clusters of thousands of computers. Because these services operate under strict performance requirements, a statistical understanding of their performance is of great practical interest. Such services are modeled by networks of queues, where each queue models one of the computers in the system. A key challenge is that the data are incomplete, because recording detailed information about every request to a heavily used system can require unacceptable overhead. In this paper we develop a Bayesian perspective on queueing models in which the arrival and departure times that are not observed are treated as latent variables. Underlying this viewpoint is the observation that a queueing model defines a deterministic transformation between the data and a set of independent variables called the service times. With this viewpoint in hand, we sample from the posterior distribution over missing data and model parameters using Markov chain Monte Carlo. We evaluate our framework on data from a benchmark Web application. We also present a simple technique for selection among nested queueing models. We are unaware of any previous work that considers inference in networks of queues in the presence of missing data.

...read moreread less

Posted Content•

Joint Modeling of Multiple Related Time Series via the Beta Process

[...]

Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky

17 Nov 2011-arXiv: Methodology

TL;DR: This work proposes a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series, and uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth and death proposals.

...read moreread less

Abstract: We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our approach is based on the discovery of a set of latent, shared dynamical behaviors. Using a beta process prior, the size of the set and the sharing pattern are both inferred from data. We develop efficient Markov chain Monte Carlo methods based on the Indian buffet process representation of the predictive distribution of the beta process, without relying on a truncated model. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth and death proposals. We examine the benefits of our proposed feature-based model on several synthetic datasets, and also demonstrate promising results on unsupervised segmentation of visual motion capture data.

...read moreread less

Proceedings Article•

A Unified Probabilistic Model for Global and Local Unsupervised Feature Selection

[...]

Yue Guan¹, Michael I. Jordan², Jennifer G. Dy¹•Institutions (2)

Northeastern University¹, University of California, Berkeley²

28 Jun 2011

TL;DR: This paper presents a unified probabilistic model that can perform both global and local feature selection for clustering, based on a hierarchical beta-Bernoulli prior combined with a Dirichlet process mixture model.

...read moreread less

Abstract: Existing algorithms for joint clustering and feature selection can be categorized as either global or local approaches. Global methods select a single cluster-independent subset of features, whereas local methods select cluster-specific subsets of features. In this paper, we present a unified probabilistic model that can perform both global and local feature selection for clustering. Our approach is based on a hierarchical beta-Bernoulli prior combined with a Dirichlet process mixture model. We obtain global or local feature selection by adjusting the variance of the beta prior. We provide a variational inference algorithm for our model. In addition to simultaneously learning the clusters and features, this Bayesian formulation allows us to learn both the number of clusters and the number of features to retain. Experiments on synthetic and real data show that our unified model can find global and local features and cluster data as well as competing methods of each type.

...read moreread less

Posted Content•

Nonparametric Link Prediction in Large Scale Dynamic Networks

[...]

Purnamrita Sarkar, Deepayan Chakrabarti, Michael I. Jordan

06 Sep 2011-arXiv: Machine Learning

TL;DR: A nonparametric approach to link prediction in large-scale dynamic networks using graph-based features of pairs of nodes as well as those of their local neighborhoods to predict whether those nodes will be linked at each time step is proposed.

...read moreread less

Abstract: We propose a nonparametric approach to link prediction in large-scale dynamic networks. Our model uses graph-based features of pairs of nodes as well as those of their local neighborhoods to predict whether those nodes will be linked at each time step. The model allows for different types of evolution in different parts of the graph (e.g, growing or shrinking communities). We focus on large-scale graphs and present an implementation of our model that makes use of locality-sensitive hashing to allow it to be scaled to large problems. Experiments with simulated data as well as five real-world dynamic graphs show that we outperform the state of the art, especially when sharp fluctuations or nonlinearities are present. We also establish theoretical properties of our estimator, in particular consistency and weak convergence, the latter making use of an elaboration of Stein's method for dependency graphs.

...read moreread less

Journal Article•DOI•

Learning Low-Dimensional Signal Models

[...]

Lawrence Carin¹, Richard G. Baraniuk², Volkan Cevher¹, David B. Dunson³, Michael I. Jordan⁴, Guillermo Sapiro⁵, Michael B. Wakin² - Show less +3 more•Institutions (5)

University of Maryland, College Park¹, Rice University², Duke University³, University of California, Berkeley⁴, University of Minnesota⁵

17 Feb 2011-IEEE Signal Processing Magazine

TL;DR: This article investigates the challenge of developing appropriate data models for dimensionality-reduced or incomplete data extraction from the perspective of nonparametric Bayesian analysis.

...read moreread less

Abstract: Sampling, coding, and streaming even the most essential data, e.g., in medical imaging and weather-monitoring applications, produce a data deluge that severely stresses the avail able analog-to-digital converter, communication bandwidth, and digital-storage resources. Surprisingly, while the ambient data dimension is large in many problems, the relevant information in the data can reside in a much lower dimensional space. This observation has led to several important theoretical and algorithmic developments under different low-dimensional modeling frameworks, such as compressive sensing (CS), matrix completion, and general factor-model representations. These approaches have enabled new measurement systems, tools, and methods for information extraction from dimensionality-reduced or incomplete data. A key aspect of maximizing the potential of such techniques is to develop appropriate data models. In this article, we investigate this challenge from the perspective of nonparametric Bayesian analysis.

...read moreread less

Journal Article•DOI•

Cluster Forests

[...]

Donghui Yan, Aiyou Chen, Michael I. Jordan

14 Apr 2011-Computational Statistics & Data Analysis

TL;DR: In this article, a clustering ensemble method called cluster forests (CF) is proposed, which randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset.

...read moreread less

Proceedings Article•

Divide-and-Conquer Matrix Factorization

[...]

Lester Mackey¹, Michael I. Jordan¹, Ameet Talwalkar¹•Institutions (1)

University of California, Berkeley¹

12 Dec 2011

TL;DR: Divide-Factor-Combine (DFC) as mentioned in this paper divides a large-scale matrix factorization task into smaller subproblems, solves each subproblem in parallel using an arbitrary base matrix factorisation algorithm, and combines the sub-problem solutions using techniques from randomized matrix approximation.

...read moreread less

Abstract: This work introduces Divide-Factor-Combine (DFC), a parallel divide-and-conquer framework for noisy matrix factorization. DFC divides a large-scale matrix factorization task into smaller subproblems, solves each subproblem in parallel using an arbitrary base matrix factorization algorithm, and combines the sub-problem solutions using techniques from randomized matrix approximation. Our experiments with collaborative filtering, video background modeling, and simulated data demonstrate the near-linear to super-linear speed-ups attainable with this approach. Moreover, our analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm.

...read moreread less

Proceedings Article•DOI•

Supervised hierarchical Pitman-Yor process for natural scene segmentation

[...]

Alex Shyr¹, Trevor Darrell¹, Michael I. Jordan¹, Raquel Urtasun•Institutions (1)

University of California, Berkeley¹

20 Jun 2011

TL;DR: This paper adds label information into the previously unsupervised model by adding constraints on the parameter space during the variational learning phase and evaluates the effectiveness of the formulation on the La-belMe natural scene dataset.

...read moreread less

Abstract: From conventional wisdom and empirical studies of annotated data, it has been shown that visual statistics such as object frequencies and segment sizes follow power law distributions. Previous work has shown that both kinds of power-law behavior can be captured by using a hierarchical Pitman-Yor process prior within a nonparametric Bayesian approach to scene segmentation. In this paper, we add label information into the previously unsupervised model. Our approach exploits the labelled data by adding constraints on the parameter space during the variational learning phase. We evaluate our formulation on the La-belMe natural scene dataset, and show the effectiveness of our approach.

...read moreread less

Posted Content•

An Analysis of the Convergence of Graph Laplacians

[...]

Daniel Ting¹, Ling Huang², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Intel²

28 Jan 2011-arXiv: Machine Learning

TL;DR: In this paper, the authors generalize the analysis of graph Laplacians to include previously unstudied graphs including kNN graphs and introduce a kernel-free framework to analyze graph constructions with shrinking neighborhoods.

...read moreread less

Abstract: Existing approaches to analyzing the asymptotics of graph Laplacians typically assume a well-behaved kernel function with smoothness assumptions. We remove the smoothness assumption and generalize the analysis of graph Laplacians to include previously unstudied graphs including kNN graphs. We also introduce a kernel-free framework to analyze graph constructions with shrinking neighborhoods in general and apply it to analyze locally linear embedding (LLE). We also describe how for a given limiting Laplacian operator desirable properties such as a convergent spectrum and sparseness can be achieved choosing the appropriate graph construction.

...read moreread less

Book•

Six Sigma for Sustainability

[...]

Daniel Probst, Michael I. Jordan, Thomas D. McCarty

17 Aug 2011

Proceedings Article•

Ergodic Subgradient Descent

[...]

John C. Duchi¹, Alekh Agarwal¹, Mikael Johansson², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Royal Institute of Technology²

01 Jan 2011

Book•

Six sigma for sustainability : how organizations design and deploy winning environmental programs

[...]

Thomas D. McCarty, Michael I. Jordan, Daniel Probst

01 Jan 2011

TL;DR: In this paper, the authors present a business case template and examples for the Six Sigma Sustainability Project and a high-level process map for energy conservation in an office facility.

...read moreread less

Abstract: Chapter 1. Developing the Business Case Chapter 2. Sustainability and the Collaborative Management Model Chapter 3. The Sustainability Transfer Function Chapter 4. Sustainability Measurement and Reporting Chapter 5. Transformational Change and the Power of Teams Chapter 6. Sustainability and Real Estate Chapter 7. Six Sigma Sustainability Project Examples Chapter 8. Design for Six Sigma Chapter 9. Stakeholder Management Conclusion: Letters to Tomorrow's Corporate Leaders Appendix A: Business Case Template and Examples Appendix B: Sustainability Transfer Function Appendix C: Sample Energy Conservation Opportunity Evaluation Checklist for an Office Building Assessment Appendix D: Sample High-Level Process Map for Energy Conservation in an Office Facility Appendix E: Sample Functional Performance Criteria for Enterprise Carbon Accounting Software Index

...read moreread less