scispace - formally typeset
Search or ask a question
Author

Michael I. Jordan

Other affiliations: Stanford University, Princeton University, Broad Institute  ...read more
Bio: Michael I. Jordan is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Computer science & Inference. The author has an hindex of 176, co-authored 1016 publications receiving 216204 citations. Previous affiliations of Michael I. Jordan include Stanford University & Princeton University.


Papers
More filters
Journal ArticleDOI
TL;DR: An approximate message passing (AMP) algorithm for recovering the signal in the random dense setting and reveals sharp phase transition phenomena where the behavior of AMP changes from exact recovery to weak correlation with the signal.
Abstract: We consider the problem of decoding a discrete signal of categorical variables from the observation of several histograms of pooled subsets of it. We present an Approximate Message Passing (AMP) algorithm for recovering the signal in the random dense setting where each observed histogram involves a random subset of entries of size proportional to n. We characterize the performance of the algorithm in the asymptotic regime where the number of observations $m$ tends to infinity proportionally to n, by deriving the corresponding State Evolution (SE) equations and studying their dynamics. We initiate the analysis of the multi-dimensional SE dynamics by proving their convergence to a fixed point, along with some further properties of the iterates. The analysis reveals sharp phase transition phenomena where the behavior of AMP changes from exact recovery to weak correlation with the signal as m/n crosses a threshold. We derive formulae for the threshold in some special cases and show that they accurately match experimental behavior.

3 citations

Posted Content
TL;DR: In this paper, Dvurechensky et al. improved the complexity bound of a greedy variant of the Sinkhorn algorithm, known as \textit{Greenkhorn} algorithm, from $\widetilde{O}(n^2\varepsilon^{-3})$ to $n$ atoms.
Abstract: We present several new complexity results for the algorithms that approximately solve the optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. First, we improve the complexity bound of a greedy variant of the Sinkhorn algorithm, known as \textit{Greenkhorn} algorithm, from $\widetilde{O}(n^2\varepsilon^{-3})$ to $\widetilde{O}(n^2\varepsilon^{-2})$. Notably, this matches the best known complexity bound of the Sinkhorn algorithm and sheds the light to superior practical performance of the Greenkhorn algorithm. Second, we generalize an adaptive primal-dual accelerated gradient descent (APDAGD) algorithm~\citep{Dvurechensky-2018-Computational} with mirror mapping $\phi$ and prove that the resulting APDAMD algorithm achieves the complexity bound of $\widetilde{O}(n^2\sqrt{\delta}\varepsilon^{-1})$ where $\delta>0$ refers to the regularity of $\phi$. We demonstrate that the complexity bound of $\widetilde{O}(\min\{n^{9/4}\varepsilon^{-1}, n^2\varepsilon^{-2}\})$ is invalid for the APDAGD algorithm and establish a new complexity bound of $\widetilde{O}(n^{5/2}\varepsilon^{-1})$. Moreover, we propose a \textit{deterministic} accelerated Sinkhorn algorithm and prove that it achieves the complexity bound of $\widetilde{O}(n^{7/3}\varepsilon^{-4/3})$ by incorporating an estimate sequence. Therefore, the accelerated Sinkhorn algorithm outperforms the Sinkhorn and Greenkhorn algorithms in terms of $1/\varepsilon$ and the APDAGD and accelerated alternating minimization~\citep{Guminov-2021-Combination} algorithms in terms of $n$. Finally, we conduct experiments on synthetic data and real images with the proposed algorithms in the paper and demonstrate their efficiency via numerical results.

2 citations

Proceedings Article
25 Feb 2022
TL;DR: In this paper , the authors develop a new framework for off-policy evaluation with policy-dependent linear optimization responses, where causal outcomes introduce stochasticity in objective function coefficients, and a decisionmaker's utility depends on the policydependent optimization, which introduces a fundamental challenge of ''optimization'' bias even for the case of policy evaluation.
Abstract: The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an \textit{average} of individual causal outcomes across a population. In practice, various operational restrictions ensure that a decision-maker's utility is not realized as an \textit{average} but rather as an \textit{output} of a downstream decision-making problem (such as matching, assignment, network flow, minimizing predictive risk). In this work, we develop a new framework for off-policy evaluation with \textit{policy-dependent} linear optimization responses: causal outcomes introduce stochasticity in objective function coefficients. Under this framework, a decision-maker's utility depends on the policy-dependent optimization, which introduces a fundamental challenge of \textit{optimization} bias even for the case of policy evaluation. We construct unbiased estimators for the policy-dependent estimand by a perturbation method, and discuss asymptotic variance properties for a set of adjusted plug-in estimators. Lastly, attaining unbiased policy evaluation allows for policy optimization: we provide a general algorithm for optimizing causal interventions. We corroborate our theoretical results with numerical simulations.

2 citations

Proceedings ArticleDOI
01 Nov 2021
TL;DR: In this article, a sequential elimination with elastic resources (SEER) algorithm is proposed for hyperparameter tuning with elastic cluster size and time and monetary budgets, which is able to exploit the flexibility in resource allocation the elastic setting has to offer to avoid undesirable effects of sublinear scaling.
Abstract: Hyperparameter tuning is a necessary step in training and deploying machine learning models. Most prior work on hyperparameter tuning has studied methods for maximizing model accuracy under a time constraint, assuming a fixed cluster size. While this is appropriate in data center environments, the increased deployment of machine learning workloads in cloud settings necessitates studying hyperparameter tuning with an elastic cluster size and time and monetary budgets. While recent work has leveraged the elasticity of the cloud to minimize the execution cost of a pre-determined hyperparameter tuning job originally designed for fixed-cluster sizes, they do not aim to maximize accuracy. In this work, we aim to maximize accuracy given time and cost constraints. We introduce SEER---Sequential Elimination with Elastic Resources, an algorithm that tests different hyperparameter values in the beginning and maintains varying degrees of parallelism among the promising configurations to ensure that they are trained sufficiently before the deadline. Unlike fixed cluster size methods, it is able to exploit the flexibility in resource allocation the elastic setting has to offer in order to avoid undesirable effects of sublinear scaling. Furthermore, SEER can be easily integrated into existing systems and makes minimal assumptions about the workload. On a suite of benchmarks, we demonstrate that SEER outperforms both existing methods for hyperparameter tuning on a fixed cluster as well as naive extensions of these algorithms to the cloud setting.

2 citations

Book
13 Nov 1997
TL;DR: The philosophy of animism is also related to the beliefs of the ancient Bon religion of Tibet and the Wu priesthood of China, as well as to primitive Shinto belief, strands of which are still evident in modern Japan as discussed by the authors.
Abstract: INTRODUCTION The philosophies of the east have arisen, not unlike those of the occidental cultures, from a host of different influences and over a vast time-span that reaches back to the misty edge of prehistory. Beliefs of tremendous antiquity have been evidenced through archaeological discoveries spanning the regions of the globe from India to Japan. Some were clearly carried eastwards from the Cradle of Civilization in Mesopotamia whilst others can only have been derived independently, the product of more local minds and inspiration. In each case it is possible to detect beliefs and understandings of the world around us that have been part of a common frame existing virtually everywhere on the planet. These beliefs are represented in the tribal shamanism of nomadic hunting cultures and the philosophy of animism from which it derives. They can be discerned in the Vedic scriptures which form the bedrock of Hinduism, in the ancient Bon religion of Tibet and the archaic Wu priesthood of China, as well as in aspects of primitive Shinto belief, strands of which are still evident in modern Japan. Yet out of this common mould arose philosophies that are peculiar to the eastern mind and its attitude concerning life and death. It is almost inconceivable that a Gautama Buddha, a Lao Tzu, the founder of Chinese Tao ism, a Confucius, or a Hui Neng, the sixth Chinese Patriarch who gave the oriental world the concept of Zen, would have arisen in the western hemisphere. Most of the philosophies of the east not only claim great antiquity but also exert dynamic influences on modern life. Hinduism and Buddhism are a veritable part of the everyday round of living and dying amongst vast millions in the Indian subcontinent and much of South-East Asia. In Japan and Korea religious observances are an essential prerequisite to many of the activities of the secular world. This was also true in China until such traditions were driven out by the impositions of Communism, but these deep-rooted instincts may yet stage their come back as liberalization proceeds. One of the keys to the success and remarkable tenacity of the eastern philosophies has been the ability to adapt, to compromise, and to meld comfortably with the beliefs of others. Many of the older faiths, whose appeal was in danger of becoming passe, embraced Buddhism and Confucianism, the great driving forces of missionary zeal in the eastern hemisphere. They achieved this symbiosis unfettered by the restraint that has too often punished Judaism, Christianity and Islam, the great bastions of monotheism. It is these distinctions, in part, which stimulate us in the west with such a fascination and curiosity about the wisdom of the east, not so much alien as quintessentially exotic.

2 citations


Cited by
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations

Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations