Showing papers by "Michael I. Jordan published in 2010"

PDF

Open Access

Journal Article•DOI•

Convex and Semi-Nonnegative Matrix Factorizations

[...]

Chris Ding¹, Tao Li², Michael I. Jordan³•Institutions (3)

University of Texas at Arlington¹, University of Miami², University of California, Berkeley³

01 Jan 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work considers factorizations of the form X = FGT, and focuses on algorithms in which G is restricted to containing nonnegative entries, but allowing the data matrix X to have mixed signs, thus extending the applicable range of NMF methods.

...read moreread less

Abstract: We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = FGT, we focus on algorithms in which G is restricted to containing nonnegative entries, but allowing the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods.

...read moreread less

1,226 citations

Proceedings Article•

Detecting Large-Scale System Problems by Mining Console Logs

[...]

Wei Xu¹, Ling Huang², Armando Fox¹, David A. Patterson¹, Michael I. Jordan¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Intel²

21 Jun 2010

TL;DR: This work first parse console logs by combining source code analysis with information retrieval to create composite features, and then analyzes these features using machine learning to detect operational problems to automatically detect system runtime problems.

...read moreread less

Abstract: Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We use a combination of program analysis and information retrieval techniques to transform free-text console logs into numerical features, which captures sequences of events in the system. We then analyze these features using machine learning to detect operational problems. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. In addition, we extend our methods to online problem detection where the sequences of events are continuously generated as data streams.

...read moreread less

771 citations

Journal Article•DOI•

Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization

[...]

XuanLong Nguyen¹, Martin J. Wainwright², Michael I. Jordan²•Institutions (2)

University of Michigan¹, University of California, Berkeley²

01 Nov 2010-IEEE Transactions on Information Theory

TL;DR: This work develops and analyzes M-estimation methods for divergence functionals and the likelihood ratios of two probability distributions based on a nonasymptotic variational characterization of f -divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization.

...read moreread less

Abstract: We develop and analyze M-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a nonasymptotic variational characterization of f -divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these estimators. Given conditions only on the ratios of densities, we show that our estimators can achieve optimal minimax rates for the likelihood ratio and the divergence functionals in certain regimes. We derive an efficient optimization algorithm for computing our estimates, and illustrate their convergence behavior and practical viability by simulations.

...read moreread less

729 citations

Journal Article•DOI•

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

[...]

David M. Blei¹, Thomas L. Griffiths², Michael I. Jordan²•Institutions (2)

Princeton University¹, University of California, Berkeley²

08 Feb 2010-Journal of the ACM

TL;DR: The nested Chinese restaurant process (nCRP) as discussed by the authors is a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees, and it can be used as a prior distribution in a Bayesian nonparametric model of document collections.

...read moreread less

Abstract: We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

...read moreread less

613 citations

Journal Article•DOI•

Joint covariate selection and joint subspace selection for multiple classification problems

[...]

Guillaume Obozinski¹, Ben Taskar², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, University of Pennsylvania²

01 Apr 2010-Statistics and Computing

TL;DR: A blockwise path-following scheme that approximately traces the regularization path and theoretical results showing that this random projection approach converges to the solution yielded by trace-norm regularization are presented.

...read moreread less

Abstract: We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification problems. By penalizing the sum of ? 2 norms of the blocks of coefficients associated with each covariate across different classification problems, similar sparsity patterns in all models are encouraged. To take computational advantage of the sparsity of solutions at high regularization levels, we propose a blockwise path-following scheme that approximately traces the regularization path. As the regularization coefficient decreases, the algorithm maintains and updates concurrently a growing set of covariates that are simultaneously active for all problems. We also show how to use random projections to extend this approach to the problem of joint subspace selection, where multiple predictors are found in a common low-dimensional subspace. We present theoretical results showing that this random projection approach converges to the solution yielded by trace-norm regularization. Finally, we present a variety of experimental results exploring joint covariate selection and joint subspace selection, comparing the path-following approach to competing algorithms in terms of prediction accuracy and running time.

...read moreread less

536 citations

Proceedings Article•DOI•

Characterizing, modeling, and generating workload spikes for stateful services

[...]

Peter Bodik¹, Armando Fox¹, Michael J. Franklin¹, Michael I. Jordan¹, David A. Patterson¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

10 Jun 2010

TL;DR: This paper proposes and validate a model of stateful spikes that allows us to synthesize volume and data spikes and could thus be used by both cloud computing users and providers to stress-test their infrastructure.

...read moreread less

Abstract: Evaluating the resiliency of stateful Internet services to significant workload spikes and data hotspots requires realistic workload traces that are usually very difficult to obtain. A popular approach is to create a workload model and generate synthetic workload, however, there exists no characterization and model of stateful spikes. In this paper we analyze five workload and data spikes and find that they vary significantly in many important aspects such as steepness, magnitude, duration, and spatial locality. We propose and validate a model of stateful spikes that allows us to synthesize volume and data spikes and could thus be used by both cloud computing users and providers to stress-test their infrastructure.

...read moreread less

214 citations

Journal Article•DOI•

Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

[...]

Daniel Ting¹, Guoli Wang², Maxim V. Shapovalov², Rajib Mitra², Michael I. Jordan¹, Roland L. Dunbrack² - Show less +2 more•Institutions (2)

University of California, Berkeley¹, Fox Chase Cancer Center²

29 Apr 2010-PLOS Computational Biology

TL;DR: Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities, and are shown to improve protein loop conformation prediction significantly.

...read moreread less

Abstract: Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.

...read moreread less

188 citations

Proceedings Article•

Tree-Structured Stick Breaking for Hierarchical Data

[...]

Zoubin Ghahramani¹, Michael I. Jordan², Ryan P. Adams³•Institutions (3)

University of Cambridge¹, University of California, Berkeley², University of Toronto³

06 Dec 2010

TL;DR: This paper uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable, and applies the method to hierarchical clustering of images and topic modeling of text data.

...read moreread less

Abstract: Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data.

...read moreread less

171 citations

Proceedings Article•

Multiple Non-Redundant Spectral Clustering Views

[...]

Donglin Niu¹, Jennifer G. Dy¹, Michael I. Jordan²•Institutions (2)

Northeastern University¹, University of California, Berkeley²

21 Jun 2010

TL;DR: This work introduces a novel method that can provide several non-redundant clustering solutions to the user by augmenting a spectral clustering objective function to incorporate dimensionality reduction and multiple views and to penalize for redundancy between the views.

...read moreread less

Abstract: Many clustering algorithms only find one clustering solution. However, data can often be grouped and interpreted in many different ways. This is particularly true in the high-dimensional setting where different subspaces reveal different possible groupings of the data. Instead of committing to one clustering solution, here we introduce a novel method that can provide several non-redundant clustering solutions to the user. Our approach simultaneously learns non-redundant subspaces that provide multiple views and finds a clustering solution in each view. We achieve this by augmenting a spectral clustering objective function to incorporate dimensionality reduction and multiple views and to penalize for redundancy between the views.

...read moreread less

145 citations

Proceedings Article•

An Analysis of the Convergence of Graph Laplacians

[...]

Daniel Ting¹, Ling Huang², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Intel²

21 Jun 2010

TL;DR: A kernel-free framework is introduced to analyze graph constructions with shrinking neighborhoods in general and apply it to analyze locally linear embedding (LLE) and how desirable properties such as a convergent spectrum and sparseness can be achieved by choosing the appropriate graph construction.

...read moreread less

Abstract: Existing approaches to analyzing the asymptotics of graph Laplacians typically assume a well-behaved kernel function with smoothness assumptions. We remove the smoothness assumption and generalize the analysis of graph Laplacians to include previously unstudied graphs including kNN graphs. We also introduce a kernel-free framework to analyze graph constructions with shrinking neighborhoods in general and apply it to analyze locally linear embedding (LLE). We also describe how, for a given limit operator, desirable properties such as a convergent spectrum and sparseness can be achieved by choosing the appropriate graph construction.

...read moreread less

132 citations

Proceedings Article•

On the Consistency of Ranking Algorithms

[...]

John C. Duchi¹, Lester Mackey¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

21 Jun 2010

TL;DR: A new value-regularized linear loss is presented, establish its consistency under reasonable assumptions on noise, and show that it outperforms conventional ranking losses in a collaborative filtering experiment.

...read moreread less

Abstract: We present a theoretical analysis of supervised ranking, providing necessary and sufficient conditions for the asymptotic consistency of algorithms based on minimizing a surrogate loss function. We show that many commonly used surrogate losses are inconsistent; surprisingly, we show inconsistency even in low-noise settings. We present a new value-regularized linear loss, establish its consistency under reasonable assumptions on noise, and show that it outperforms conventional ranking losses in a collaborative filtering experiment.

...read moreread less

Proceedings Article•

Modeling events with cascades of poisson processes

[...]

Aleksandr Simma¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

08 Jul 2010

TL;DR: In this article, a probabilistic model of events in continuous time is presented, in which each event triggers a Poisson process of successor events, and the ensemble of observed events is thereby modeled as a superposition of Poisson processes.

...read moreread less

Abstract: We present a probabilistic model of events in continuous time in which each event triggers a Poisson process of successor events. The ensemble of observed events is thereby modeled as a superposition of Poisson processes. Efficient inference is feasible under this model with an EM algorithm. Moreover, the EM algorithm can be implemented as a distributed algorithm, permitting the model to be applied to very large datasets. We apply these techniques to the modeling of Twitter messages and the revision history of Wikipedia.

...read moreread less

Proceedings Article•

Learning Programs: A Hierarchical Bayesian Approach

[...]

Percy Liang¹, Michael I. Jordan¹, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

21 Jun 2010

TL;DR: A nonparametric hierarchical Bayesian prior over programs which shares statistical strength across multiple tasks is introduced and an MCMC algorithm is provided that can perform safe program transformations on this representation to reveal shared inter-program substructures.

...read moreread less

Abstract: We are interested in learning programs for multiple related tasks given only a few training examples per task. Since the program for a single task is underdetermined by its data, we introduce a nonparametric hierarchical Bayesian prior over programs which shares statistical strength across multiple tasks. The key challenge is to parametrize this multi-task sharing. For this, we introduce a new representation of programs based on combinatory logic and provide an MCMC algorithm that can perform safe program transformations on this representation to reveal shared inter-program substructures.

...read moreread less

Journal Article•DOI•

Bayesian Nonparametric Methods for Learning Markov Switching Processes

[...]

Emily B. Fox¹, Erik B. Sudderth², Michael I. Jordan³, Alan S. Willsky•Institutions (3)

Duke University¹, Brown University², University of California, Berkeley³

18 Oct 2010-IEEE Signal Processing Magazine

TL;DR: A Bayesian nonparametric approach to learning Markov switching processes requires one to make fewer assumptions about the underlying dynamics, and thereby allows the data to drive the complexity of the inferred model.

...read moreread less

Abstract: In this article, we explored a Bayesian nonparametric approach to learning Markov switching processes. This framework requires one to make fewer assumptions about the underlying dynamics, and thereby allows the data to drive the complexity of the inferred model. We began by examining a Bayesian nonparametric HMM, the sticky HDPHMM, that uses a hierarchical DP prior to regularize an unbounded mode space. We then considered extensions to Markov switching processes with richer, conditionally linear dynamics, including the HDP-AR-HMM and HDP-SLDS. We concluded by considering methods for transferring knowledge among multiple related time series. We argued that a featural representation is more appropriate than a rigid global clustering, as it encourages sharing of behaviors among objects while still allowing sequence-specific variability. In this context, the beta process provides an appealing alternative to the DP.

...read moreread less

Journal Article•DOI•

Regularized Discriminant Analysis, Ridge Regression and Beyond

[...]

Zhihua Zhang, Guang Dai, Congfu Xu, Michael I. Jordan

01 Mar 2010-Journal of Machine Learning Research

TL;DR: This paper uncovers a general relationship between regularized discriminant analysis and ridge regression and yields variations on conventional FDA based on the pseudoinverse and a direct equivalence to an ordinary least squares estimator.

...read moreread less

Abstract: Fisher linear discriminant analysis (FDA) and its kernel extension--kernel discriminant analysis (KDA)--are well known methods that consider dimensionality reduction and classification jointly. While widely deployed in practical problems, there are still unresolved issues surrounding their efficient implementation and their relationship with least mean squares procedures. In this paper we address these issues within the framework of regularized estimation. Our approach leads to a flexible and efficient implementation of FDA as well as KDA. We also uncover a general relationship between regularized discriminant analysis and ridge regression. This relationship yields variations on conventional FDA based on the pseudoinverse and a direct equivalence to an ordinary least squares estimator.

...read moreread less

Journal Article•DOI•

Active site prediction using evolutionary and structural information

[...]

Sriram Sankararaman¹, Fei Sha², Jack F. Kirsch², Michael I. Jordan², Kimmen Sjölander² - Show less +1 more•Institutions (2)

University of California, Berkeley¹, University of Southern California²

01 Mar 2010-Bioinformatics

TL;DR: A new method is presented, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites.

...read moreread less

Abstract: Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites. Results: In cross-validation experiments on two benchmark datasets from the Catalytic Site Atlas and CATRES resources containing a total of 437 manually curated enzymes spanning 487 SCOP families, Discern increases catalytic site recall between 12% and 20% over methods that combine information from both sequence and structure, and by ≥50% over methods that make use of sequence conservation signal only. Controlled experiments show that Discern's improvement in catalytic residue prediction is derived from the combination of three ingredients: the use of the INTREPID phylogenomic method to extract conservation information; the use of 3D structure data, including features computed for residues that are proximal in the structure; and a statistical regularization procedure to prevent overfitting. Contact: kimmen@berkeley.edu Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

Book Chapter•DOI•

Mixed Membership Matrix Factorization

[...]

Lester Mackey¹, David J. Weiss², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, University of Pennsylvania²

21 Jun 2010

TL;DR: In this article, a fully Bayesian framework for integrating discrete mixed membership and continuous latent factor models into unified Mixed Membership Matrix Factorization (M3F) models is developed, and two M3F models, derived Gibbs sampling inference procedures, are introduced and validated on the EachMovie, MovieLens, and Netflix Prize collaborative filtering datasets.

...read moreread less

Abstract: Discrete mixed membership modeling and continuous latent factor modeling (also known as matrix factorization) are two popular, complementary approaches to dyadic data analysis. In this work, we develop a fully Bayesian framework for integrating the two approaches into unified Mixed Membership Matrix Factorization (M3F) models. We introduce two M3F models, derive Gibbs sampling inference procedures, and validate our methods on the EachMovie, MovieLens, and Netflix Prize collaborative filtering datasets. We find that, even when fitting fewer parameters, the M3F models outperform state-of-the-art latent factor approaches on all benchmarks, yielding the greatest gains in accuracy on sparsely-rated, high-variance items.

...read moreread less

Experience mining Google's production console logs

[...]

Wei Xu¹, Ling Huang², Armando Fox¹, David A. Patterson¹, Michael I. Jordan¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Intel²

03 Oct 2010

TL;DR: The early experience shows that the techniques, including source code based log parsing, state and sequence based feature creation and problem detection, work well on this production data set.

...read moreread less

Abstract: We describe our early experience in applying our console log mining techniques [19, 20] to logs from production Google systems with thousands of nodes. This data set is five orders of magnitude in size and contains almost 20 times as many messages types as the Hadoop data set we used in [19]. It also has many properties that are unique to large scale production deployments (e.g., the system stays on for several months and multiple versions of the software can run concurrently). Our early experience shows that our techniques, including source code based log parsing, state and sequence based feature creation and problem detection, work well on this production data set. We also discuss our experience in using our log parser to assist the log sanitization.

...read moreread less

Proceedings Article•

Unsupervised Kernel Dimension Reduction

[...]

Meihong Wang¹, Fei Sha¹, Michael I. Jordan²•Institutions (2)

University of Southern California¹, University of California, Berkeley²

06 Dec 2010

TL;DR: Kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses and the resulting compact representation yields meaningful and appealing visualization and clustering of data.

...read moreread less

Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

...read moreread less

Proceedings Article•

Type-Based MCMC

[...]

Percy Liang¹, Michael I. Jordan¹, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

02 Jun 2010

TL;DR: A type-based sampler, which updates a block of variables, identified by a type, which spans multiple sentences, is introduced, which shows improvements on part-of-speech induction, word segmentation, and learning tree-substitution grammars.

...read moreread less

Abstract: Most existing algorithms for learning latent-variable models---such as EM and existing Gibbs samplers---are token-based, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper, we introduce a type-based sampler, which updates a block of variables, identified by a type, which spans multiple sentences. We show improvements on part-of-speech induction, word segmentation, and learning tree-substitution grammars.

...read moreread less

Journal Article•DOI•

Bayesian inference for queueing networks and modeling of internet services

[...]

Charles Sutton, Michael I. Jordan¹•Institutions (1)

University of Edinburgh¹

19 Jan 2010-arXiv: Machine Learning

TL;DR: A Bayesian perspective on queueing models in which the arrival and departure times that are not observed are treated as latent variables is developed and sampled from the posterior distribution over missing data and model parameters using Markov chain Monte Carlo.

...read moreread less

Abstract: Modern Internet services, such as those at Google, Yahoo!, and Amazon, handle billions of requests per day on clusters of thousands of computers. Because these services operate under strict performance requirements, a statistical understanding of their performance is of great practical interest. Such services are modeled by networks of queues, where each queue models one of the computers in the system. A key challenge is that the data are incomplete, because recording detailed information about every request to a heavily used system can require unacceptable overhead. In this paper we develop a Bayesian perspective on queueing models in which the arrival and departure times that are not observed are treated as latent variables. Underlying this viewpoint is the observation that a queueing model defines a deterministic transformation between the data and a set of independent variables called the service times. With this viewpoint in hand, we sample from the posterior distribution over missing data and model parameters using Markov chain Monte Carlo. We evaluate our framework on data from a benchmark Web application. We also present a simple technique for selection among nested queueing models. We are unaware of any previous work that considers inference in networks of queues in the presence of missing data.

...read moreread less

Dissertation•

Automating datacenter operations using machine learning

[...]

David A. Patterson¹, Michael I. Jordan¹, Peter Bodik¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2010

TL;DR: This dissertation argues that SML is a useful tool for simplifying and automating datacenter operations and demonstrates application of SML to three important problems in this area: characterization and synthesis of workload spikes, dynamic resource allocation in stateful systems, and quick and accurate identification of recurring performance problems.

...read moreread less

Abstract: Today's Internet datacenters run many complex and large-scale Web applications that are very difficult to manage. The main challenges are understanding user workloads and application performance, and quickly identifying and resolving performance problems. Statistical Machine Learning (SML) provides a methodology for quickly processing the large quantities of monitoring data generated by these applications, finding repeating patterns in their behavior, and building accurate models of their performance. This dissertation argues that SML is a useful tool for simplifying and automating datacenter operations and demonstrates application of SML to three important problems in this area: characterization and synthesis of workload spikes, dynamic resource allocation in stateful systems, and quick and accurate identification of recurring performance problems.

...read moreread less

Proceedings Article•DOI•

Sufficient dimension reduction for visual sequence classification

[...]

Alex Shyr¹, Raquel Urtasun, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

13 Jun 2010

TL;DR: A novel sequence kernel dimension reduction approach (S-KDR), which does not make strong assumptions on the distribution of the input data, which is demonstrated to be effective on several tasks involving the discrimination of human gesture and motion categories, as well as on a database of dynamic textures.

...read moreread less

Abstract: When classifying high-dimensional sequence data, traditional methods (e.g., HMMs, CRFs) may require large amounts of training data to avoid overfitting. In such cases dimensionality reduction can be employed to find a low-dimensional representation on which classification can be done more efficiently. Existing methods for supervised dimensionality reduction often presume that the data is densely sampled so that a neighborhood graph structure can be formed, or that the data arises from a known distribution. Sufficient dimension reduction techniques aim to find a low dimensional representation such that the remaining degrees of freedom become conditionally independent of the output values. In this paper we develop a novel sequence kernel dimension reduction approach (S-KDR). Our approach does not make strong assumptions on the distribution of the input data. Spatial, temporal and periodic information is combined in a principled manner, and an optimal manifold is learned for the end-task. We demonstrate the effectiveness of our approach on several tasks involving the discrimination of human gesture and motion categories, as well as on a database of dynamic textures.

...read moreread less

Journal Article•DOI•

Feature space resampling for protein conformational search.

[...]

Ben Blum¹, Michael I. Jordan¹, David Baker²•Institutions (2)

University of California, Berkeley¹, University of Washington²

01 May 2010-Proteins

TL;DR: The advantages of this approach are that features from many different input structures can be combined simultaneously without producing atomic clashes or otherwise physically inviable models, and that the features being recombined have a relatively high chance of being correct.

...read moreread less

Abstract: De novo protein structure prediction requires location of the lowest energy state of the polypeptide chain among a vast set of possible conformations. Powerful approaches include conformational space annealing, in which search progressively focuses on the most promising regions of conformational space, and genetic algorithms, in which features of the best conformations thus far identified are recombined. We describe a new approach that combines the strengths of these two approaches. Protein conformations are projected onto a discrete feature space which includes backbone torsion angles, secondary structure, and beta pairings. For each of these there is one "native" value: the one found in the native structure. We begin with a large number of conformations generated in independent Monte Carlo structure prediction trajectories from Rosetta. Native values for each feature are predicted from the frequencies of feature value occurrences and the energy distribution in conformations containing them. A second round of structure prediction trajectories are then guided by the predicted native feature distributions. We show that native features can be predicted at much higher than background rates, and that using the predicted feature distributions improves structure prediction in a benchmark of 28 proteins. The advantages of our approach are that features from many different input structures can be combined simultaneously without producing atomic clashes or otherwise physically inviable models, and that the features being recombined have a relatively high chance of being correct.

...read moreread less

Journal Article•DOI•

Major Advances and Emerging Developments of Graphical Models [From the Guest Editors]

[...]

Michael I. Jordan, Erik B. Sudderth, Martin J. Wainwright, Alan S. Willsky

18 Oct 2010-IEEE Signal Processing Magazine

TL;DR: Graphical models, referred to in various guises as Markov random fields, Bayesian networks, factor graphs, influence diagrams, decision networks, or structured stochastic systems, are a powerful and elegant marriage of graph theory, probability theory, and decision theory.

...read moreread less

Abstract: Graphical models, referred to in various guises as Markov random fields (MRFs), Bayesian networks, factor graphs, influence diagrams, decision networks, or structured stochastic systems, are a powerful and elegant marriage of graph theory, probability theory, and decision theory. They yield a unifying perspective on many long-standing and emerging frameworks for modeling complex phenomena, as well as methods for processing complex sources of data and signals. Such models are of particular importance in areas of signal processing that overlap with machine learning, time-series analysis, spatial statistics, and optimization.

...read moreread less

Proceedings Article•

Heavy-Tailed Process Priors for Selective Shrinkage

[...]

Fabian L. Wauthier¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

06 Dec 2010

TL;DR: It is shown that heavy-tailed stochastic processes, which are constructed from Gaussian processes via a copula, can be used to improve robustness of regression and classification estimators to outliers by selectively shrinking them more strongly in sparse regions than in dense regions.

...read moreread less

Abstract: Heavy-tailed distributions are often used to enhance the robustness of regression and classification methods to outliers in output space. Often, however, we are confronted with "outliers" in input space, which are isolated observations in sparsely populated regions. We show that heavy-tailed stochastic processes (which we construct from Gaussian processes via a copula), can be used to improve robustness of regression and classification estimators to such outliers by selectively shrinking them more strongly in sparse regions than in dense regions. We carry out a theoretical analysis to show that selective shrinkage occurs when the marginals of the heavy-tailed process have sufficiently heavy tails. The analysis is complemented by experiments on biological data which indicate significant improvements of estimates in sparse regions while producing competitive results in dense regions.

...read moreread less

Modeling events in time using cascades of poisson processes

[...]

Michael I. Jordan¹, Aleksandr Simma¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2010

TL;DR: This thesis presents techniques for modeling the temporal dynamics of events by making each event induce an inhomogeneous Poisson process of others following it and provides techniques for parameterizing these processes and present efficient, scalable techniques for inference.

...read moreread less

Abstract: For many applications, the data of interest can be best thought of as events—entities that occur at a particular moment in time, have features and may in turn trigger the occurrence of other events. This thesis presents techniques for modeling the temporal dynamics of events by making each event induce an inhomogeneous Poisson process of others following it. The collection of all events observed is taken to be a draw from the superposition of the induced Poisson processes, as well as a baseline process for some of the initial triggers. The magnitude and shape of the induced Poisson processes controls the number, timing and features of the triggered events. We provide techniques for parameterizing these processes and present efficient, scalable techniques for inference. The framework is then applied to three different domains that demonstrate the power of the approach. First, we consider the problem of identifying dependencies in a computer network through passive observation and provide a technique based on hypothesis testing for accurately discovering interactions between machines. Then, we look at the relationships between Twitter messages about stocks, using the application as a test-bed to experiment with different parameterizations of induced processes. Finally, we apply these tools to build a model of the revision history of Wikipedia, identifying how the community propagates edits from a page to its neighbors and demonstrating the scalability of our approach to very large datasets.

...read moreread less

Proceedings Article•

Variational Inference over Combinatorial Spaces

[...]

Alexandre Bouchard-Côté¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

06 Dec 2010

TL;DR: This work proposes a new framework that extends variational inference to a wide range of combinatorial spaces, based on a simple assumption: the existence of a tractable measure factorization, which it is shown holds in many examples.

...read moreread less

Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6].

...read moreread less

Proceedings Article•

Random Conic Pursuit for Semidefinite Programming

[...]

Ariel Kleiner¹, Ali Rahimi², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Intel²

06 Dec 2010

TL;DR: A novel algorithm that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone is presented, which is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting.

...read moreread less

Abstract: We present a novel algorithm, Random Conic Pursuit, that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm.

...read moreread less

Posted Content•

Tree-Structured Stick Breaking Processes for Hierarchical Data

[...]

Ryan P. Adams, Zoubin Ghahramani, Michael I. Jordan

05 Jun 2010-arXiv: Methodology

...read moreread less