scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Submodularity and its applications in optimized information gathering

TL;DR: Algorithm which efficiently find provably near-optimal solutions to large, complex information gathering problems, including environmental monitoring using robotic sensors, activity recognition using a built sensing chair, a sensor placement challenge, and deciding which blogs to read on the Web are described.
Abstract: Where should we place sensors to efficiently monitor natural drinking water resources for contamination? Which blogs should we read to learn about the biggest stories on the Web? These problems share a fundamental challenge: How can we obtain the most useful information about the state of the world, at minimum cost?Such information gathering, or active learning, problems are typically NP-hard, and were commonly addressed using heuristics without theoretical guarantees about the solution quality. In this article, we describe algorithms which efficiently find provably near-optimal solutions to large, complex information gathering problems. Our algorithms exploit submodularity, an intuitive notion of diminishing returns common to many sensing problems: the more sensors we have already deployed, the less we learn by placing another sensor. In addition to identifying the most informative sensing locations, our algorithms can handle more challenging settings, where sensors need to be able to reliably communicate over lossy links, where mobile robots are used for collecting data, or where solutions need to be robust against adversaries and sensor failures.We also present results applying our algorithms to several real-world sensing tasks, including environmental monitoring using robotic sensors, activity recognition using a built sensing chair, a sensor placement challenge, and deciding which blogs to read on the Web.
Citations
More filters
Journal ArticleDOI

6,278 citations

Posted Content
TL;DR: Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular function and (2) the lovasz extension of sub-modular Functions provides a useful set of regularization functions for supervised and unsupervised learning as discussed by the authors.
Abstract: Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the lovasz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning. In this monograph, we present the theory of submodular functions from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems. In particular, we show how submodular function minimization is equivalent to solving a wide variety of convex optimization problems. This allows the derivation of new efficient algorithms for approximate and exact submodular function minimization with theoretical guarantees and good practical performance. By listing many examples of submodular functions, we review various applications to machine learning, such as clustering, experimental design, sensor placement, graphical model structure learning or subset selection, as well as a family of structured sparsity-inducing norms that can be derived and used from submodular functions.

336 citations

Book
21 Nov 2013
TL;DR: In Learning with Submodular Functions: A Convex Optimization Perspective, the theory of submodular functions is presented in a self-contained way from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems.
Abstract: Submodular functions are relevant to machine learning for at least two reasons: (1) some problems may be expressed directly as the optimization of submodular functions and (2) the Lovsz extension of submodular functions provides a useful set of regularization functions for supervised and unsupervised learning. In Learning with Submodular Functions: A Convex Optimization Perspective, the theory of submodular functions is presented in a self-contained way from a convex analysis perspective, presenting tight links between certain polyhedra, combinatorial optimization and convex optimization problems. In particular, it describes how submodular function minimization is equivalent to solving a wide variety of convex optimization problems. This allows the derivation of new efficient algorithms for approximate and exact submodular function minimization with theoretical guarantees and good practical performance. By listing many examples of submodular functions, it reviews various applications to machine learning, such as clustering, experimental design, sensor placement, graphical model structure learning or subset selection, as well as a family of structured sparsity-inducing norms that can be derived and used from submodular functions. Learning with Submodular Functions: A Convex Optimization Perspective is an ideal reference for researchers, scientists, or engineers with an interest in applying submodular functions to machine learning problems.

325 citations

Proceedings Article
25 Jan 2015
TL;DR: In this article, a linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint was proposed, which can achieve a (1 − 1/e − e) approximation guarantee to the optimum solution in time linear in the size of the data and independent of the cardinality constraints.
Abstract: Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm, STOCHASTIC-GREEDY, can achieve a (1 — 1/e — e) approximation guarantee, in expectation, to the optimum solution in time linear in the size of the data and independent of the cardinality constraint. We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster. More surprisingly, we observe that in many practical scenarios STOCHASTIC-GREEDY does not evaluate the whole fraction of data points even once and still achieves indistinguishable results compared to lazy greedy.

320 citations

Journal ArticleDOI
TL;DR: This work proposes three sampling-based motion planning algorithms for generating informative mobile robot trajectories, and provides analysis of the asymptotic optimality of these algorithms, and presents several conservative pruning strategies for modular, submodular, and time-varying information objectives.
Abstract: We propose three sampling-based motion planning algorithms for generating informative mobile robot trajectories. The goal is to find a trajectory that maximizes an information quality metric (e.g. variance reduction, information gain, or mutual information) and also falls within a pre-specified budget constraint (e.g. fuel, energy, or time). Prior algorithms have employed combinatorial optimization techniques to solve these problems, but existing techniques are typically restricted to discrete domains and often scale poorly in the size of the problem. Our proposed rapidly exploring information gathering (RIG) algorithms combine ideas from sampling-based motion planning with branch and bound techniques to achieve efficient information gathering in continuous space with motion constraints. We provide analysis of the asymptotic optimality of our algorithms, and we present several conservative pruning strategies for modular, submodular, and time-varying information objectives. We demonstrate that our proposed techniques find optimal solutions more quickly than existing combinatorial solvers, and we provide a proof-of-concept field implementation on an autonomous surface vehicle performing a wireless signal strength monitoring task in a lake.

316 citations


Cites background from "Submodularity and its applications ..."

  • ...…motion planning, one key property of submodular objectives is that the amount of information gathered in the future is dependent on the prior trajectory, whereas with modular objectives it is not (see Krause and Guestrin (2011) for a more thorough discussion of submodularity and its properties)....

    [...]

References
More filters
Book
01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

45,034 citations

Book
01 Jan 1991
TL;DR: In this paper, the authors present a survey of statistics for spatial data in the field of geostatistics, including spatial point patterns and point patterns modeling objects, using Lattice Data and spatial models on lattices.
Abstract: Statistics for Spatial Data GEOSTATISTICAL DATA Geostatistics Spatial Prediction and Kriging Applications of Geostatistics Special Topics in Statistics for Spatial Data LATTICE DATA Spatial Models on Lattices Inference for Lattice Models SPATIAL PATTERNS Spatial Point Patterns Modeling Objects References Author Index Subject Index.

8,631 citations

Book
01 Dec 1986
TL;DR: Introduction and Preliminaries.
Abstract: Introduction and Preliminaries. Problems, Algorithms, and Complexity. LINEAR ALGEBRA. Linear Algebra and Complexity. LATTICES AND LINEAR DIOPHANTINE EQUATIONS. Theory of Lattices and Linear Diophantine Equations. Algorithms for Linear Diophantine Equations. Diophantine Approximation and Basis Reduction. POLYHEDRA, LINEAR INEQUALITIES, AND LINEAR PROGRAMMING. Fundamental Concepts and Results on Polyhedra, Linear Inequalities, and Linear Programming. The Structure of Polyhedra. Polarity, and Blocking and Anti--Blocking Polyhedra. Sizes and the Theoretical Complexity of Linear Inequalities and Linear Programming. The Simplex Method. Primal--Dual, Elimination, and Relaxation Methods. Khachiyana s Method for Linear Programming. The Ellipsoid Method for Polyhedra More Generally. Further Polynomiality Results in Linear Programming. INTEGER LINEAR PROGRAMMING. Introduction to Integer Linear Programming. Estimates in Integer Linear Programming. The Complexity of Integer Linear Programming. Totally Unimodular Matrices: Fundamental Properties and Examples. Recognizing Total Unimodularity. Further Theory Related to Total Unimodularity. Integral Polyhedra and Total Dual Integrality. Cutting Planes. Further Methods in Integer Linear Programming. References. Indexes.

7,005 citations

Journal ArticleDOI

6,278 citations

Journal ArticleDOI
TL;DR: It is shown that a “greedy” heuristic always produces a solution whose value is at least 1 −[(K − 1/K]K times the optimal value, which can be achieved for eachK and has a limiting value of (e − 1)/e, where e is the base of the natural logarithm.
Abstract: LetN be a finite set andz be a real-valued function defined on the set of subsets ofN that satisfies z(S)+z(T)źz(SźT)+z(SźT) for allS, T inN. Such a function is called submodular. We consider the problem maxSźN{a(S):|S|≤K,z(S) submodular}. Several hard combinatorial optimization problems can be posed in this framework. For example, the problem of finding a maximum weight independent set in a matroid, when the elements of the matroid are colored and the elements of the independent set can have no more thanK colors, is in this class. The uncapacitated location problem is a special case of this matroid optimization problem. We analyze greedy and local improvement heuristics and a linear programming relaxation for this problem. Our results are worst case bounds on the quality of the approximations. For example, whenz(S) is nondecreasing andz(0) = 0, we show that a "greedy" heuristic always produces a solution whose value is at least 1 ź[(K ź 1)/K]K times the optimal value. This bound can be achieved for eachK and has a limiting value of (e ź 1)/e, where e is the base of the natural logarithm.

4,103 citations

Trending Questions (1)
What are the advantages and disadvantages of using optimization algorithms for intelligent sensing?

The provided paper does not explicitly mention the advantages and disadvantages of using optimization algorithms for intelligent sensing.