scispace - formally typeset
Search or ask a question
Author

Michael I. Jordan

Other affiliations: Stanford University, Princeton University, Broad Institute  ...read more
Bio: Michael I. Jordan is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Computer science & Inference. The author has an hindex of 176, co-authored 1016 publications receiving 216204 citations. Previous affiliations of Michael I. Jordan include Stanford University & Princeton University.


Papers
More filters
Proceedings Article
01 Jan 1994
TL;DR: This paper presents a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, a theory of convergence for RL with arbitrary, but fixed, softstate aggregation, and a novel intuitive understanding of the effect of state aggregation on online RL.
Abstract: It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of the theory of reinforcement learning assumes lookup table representations. In this paper we address the pressing issue of combining function approximation and RL, and present 1) a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, 2) a theory of convergence for RL with arbitrary, but fixed, soft state aggregation, 3) a novel intuitive understanding of the effect of state aggregation on online RL, and 4) a new heuristic adaptive state aggregation algorithm that finds improved compact representations by exploiting the non-discrete nature of soft state aggregation. Preliminary empirical results are also presented.

343 citations

Journal ArticleDOI
TL;DR: In this article, the authors consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients, and show that if pairs of function values are available, algorithms for $d$ -dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most Ω( √ n/d) in convergence rate over traditional gradient methods.
Abstract: We consider derivative-free algorithms for stochastic and nonstochastic convex optimization problems that use only function values rather than gradients. Focusing on nonasymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for $d$ -dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most $\sqrt {d}$ in convergence rate over traditional stochastic gradient methods. We establish such results for both smooth and nonsmooth cases, sharpening previous analyses that suggested a worse dimension dependence, and extend our results to the case of multiple ( $ {m}\ge 2$ ) evaluations. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, establishing the sharpness of our achievable results up to constant (sometimes logarithmic) factors.

342 citations

Posted Content
TL;DR: This paper proves that an optimistic modification of Least-Squares Value Iteration (LSVI) achieves regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps, and is independent of the number of states and actions.
Abstract: Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of function approximation raises a fundamental set of challenges involving computational and statistical efficiency, especially given the need to manage the exploration/exploitation tradeoff. As a result, a core RL question remains open: how can we design provably efficient RL algorithms that incorporate function approximation? This question persists even in a basic setting with linear dynamics and linear rewards, for which only linear function approximation is needed. This paper presents the first provable RL algorithm with both polynomial runtime and polynomial sample complexity in this linear setting, without requiring a "simulator" or additional assumptions. Concretely, we prove that an optimistic modification of Least-Squares Value Iteration (LSVI)---a classical algorithm frequently studied in the linear setting---achieves $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret, where $d$ is the ambient dimension of feature space, $H$ is the length of each episode, and $T$ is the total number of steps. Importantly, such regret is independent of the number of states and actions.

337 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: This paper introduces Universal Domain Adaptation (UDA), a model that requires no prior knowledge on the label sets that outperforms the state of the art closed set, partial and open set domain adaptation methods in the novel UDA setting.
Abstract: Domain adaptation aims to transfer knowledge in the presence of the domain gap. Existing domain adaptation methods rely on rich prior knowledge about the relationship between the label sets of source and target domains, which greatly limits their application in the wild. This paper introduces Universal Domain Adaptation (UDA) that requires no prior knowledge on the label sets. For a given source label set and a target label set, they may contain a common label set and hold a private label set respectively, bringing up an additional category gap. UDA requires a model to either (1) classify the target sample correctly if it is associated with a label in the common label set, or (2) mark it as ``unknown'' otherwise. More importantly, a UDA model should work stably against a wide spectrum of commonness (the proportion of the common label set over the complete label set) so that it can handle real-world problems with unknown target label sets. To solve the universal domain adaptation problem, we propose Universal Adaptation Network (UAN). It quantifies sample-level transferability to discover the common label set and the label sets private to each domain, thereby promoting the adaptation in the automatically discovered common label set and recognizing the ``unknown'' samples successfully. A thorough evaluation shows that UAN outperforms the state of the art closed set, partial and open set domain adaptation methods in the novel UDA setting.

333 citations

Proceedings ArticleDOI
29 Mar 2009
TL;DR: A system that uses machine learning to accurately predict the performance metrics of database queries whose execution times range from milliseconds to hours, and was able to correctly identify both the short and long-running queries to inform workload management and capacity planning.
Abstract: One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every database vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.

329 citations


Cited by
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations

Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations