scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Artificial Intelligence in 2014"


Posted Content
TL;DR: This work proposes a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework.
Abstract: We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision. We combine discrete reasoning with uncertain predictions by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework. Our approach can handle human questions of high complexity about realistic scenes and replies with range of answer like counts, object classes, instances and lists of them. The system is directly trained from question-answer pairs. We establish a first benchmark for this task that can be seen as a modern attempt at a visual turing test.

519 citations


Posted Content
TL;DR: This paper proposes an operational framework of CIoT, which mainly characterizes the interactions among five fundamental cognitive tasks: perception-action cycle, massive data analytics, semantic derivation and knowledge discovery, intelligent decision-making, and on-demand service provisioning, and provides a systematic tutorial on key enabling techniques involved in the cognitive tasks.
Abstract: Current research on Internet of Things (IoT) mainly focuses on how to enable general objects to see, hear, and smell the physical world for themselves, and make them connected to share the observations. In this paper, we argue that only connected is not enough, beyond that, general objects should have the capability to learn, think, and understand both physical and social worlds by themselves. This practical need impels us to develop a new paradigm, named Cognitive Internet of Things (CIoT), to empower the current IoT with a `brain' for high-level intelligence. Specifically, we first present a comprehensive definition for CIoT, primarily inspired by the effectiveness of human cognition. Then, we propose an operational framework of CIoT, which mainly characterizes the interactions among five fundamental cognitive tasks: perception-action cycle, massive data analytics, semantic derivation and knowledge discovery, intelligent decision-making, and on-demand service provisioning. Furthermore, we provide a systematic tutorial on key enabling techniques involved in the cognitive tasks. In addition, we also discuss the design of proper performance metrics on evaluating the enabling techniques. Last but not least, we present the research challenges and open issues ahead. Building on the present work and potentially fruitful future studies, CIoT has the capability to bridge the physical world (with objects, resources, etc.) and the social world (with human demand, social behavior, etc.), and enhance smart resource allocation, automatic network operation, and intelligent service provisioning.

389 citations


Posted Content
TL;DR: The findings demonstrate that bandit algorithms are attractive alternatives to current adaptive treatment allocation strategies and may guide the design of subsequent empirical evaluations.
Abstract: Although many algorithms for the multi-armed bandit problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important observations can be made from our results. Firstly, simple heuristics such as epsilon-greedy and Boltzmann exploration outperform theoretically sound algorithms on most settings by a significant margin. Secondly, the performance of most algorithms varies dramatically with the parameters of the bandit problem. Our study identifies for each algorithm the settings where it performs well, and the settings where it performs poorly. Thirdly, the algorithms' performance relative each to other is affected only by the number of bandit arms and the variance of the rewards. This finding may guide the design of subsequent empirical evaluations. In the second part of the paper, we turn our attention to an important area of application of bandit algorithms: clinical trials. Although the design of clinical trials has been one of the principal practical problems motivating research on multi-armed bandits, bandit algorithms have never been evaluated as potential treatment allocation strategies. Using data from a real study, we simulate the outcome that a 2001-2002 clinical trial would have had if bandit algorithms had been used to allocate patients to treatments. We find that an adaptive trial would have successfully treated at least 50% more patients, while significantly reducing the number of adverse effects and increasing patient retention. At the end of the trial, the best treatment could have still been identified with a high level of statistical confidence. Our findings demonstrate that bandit algorithms are attractive alternatives to current adaptive treatment allocation strategies.

250 citations


Posted Content
TL;DR: In this article, a gradient-based distributed policy search method for cooperative games is proposed and compared to the notion of local optimum to that of Nash equilibrium, which is a reasonable alternative to value-based methods for partially observable environments.
Abstract: Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.

246 citations


Posted Content
TL;DR: Stochastic regeneration linear runtime scaling in cases where many previous approaches scaled quadratically is shown, and how to use stochastic regeneration and the SPI to implement general-purpose inference strategies such as Metropolis-Hastings, Gibbs sampling, and blocked proposals based on particle Markov chain Monte Carlo and mean-field variational inference techniques are shown.
Abstract: We describe Venture, an interactive virtual machine for probabilistic programming that aims to be sufficiently expressive, extensible, and efficient for general-purpose use. Like Church, probabilistic models and inference problems in Venture are specified via a Turing-complete, higher-order probabilistic language descended from Lisp. Unlike Church, Venture also provides a compositional language for custom inference strategies built out of scalable exact and approximate techniques. We also describe four key aspects of Venture's implementation that build on ideas from probabilistic graphical models. First, we describe the stochastic procedure interface (SPI) that specifies and encapsulates primitive random variables. The SPI supports custom control flow, higher-order probabilistic procedures, partially exchangeable sequences and ``likelihood-free'' stochastic simulators. It also supports external models that do inference over latent variables hidden from Venture. Second, we describe probabilistic execution traces (PETs), which represent execution histories of Venture programs. PETs capture conditional dependencies, existential dependencies and exchangeable coupling. Third, we describe partitions of execution histories called scaffolds that factor global inference problems into coherent sub-problems. Finally, we describe a family of stochastic regeneration algorithms for efficiently modifying PET fragments contained within scaffolds. Stochastic regeneration linear runtime scaling in cases where many previous approaches scaled quadratically. We show how to use stochastic regeneration and the SPI to implement general-purpose inference strategies such as Metropolis-Hastings, Gibbs sampling, and blocked proposals based on particle Markov chain Monte Carlo and mean-field variational inference techniques.

237 citations


Posted Content
TL;DR: Memory networks as discussed by the authors combine inference components with a long-term memory component, which can be read and written to, with the goal of using it for predicting the output of a textual response.
Abstract: We describe a new class of learning models called memory networks Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly The long-term memory can be read and written to, with the goal of using it for prediction We investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response We evaluate them on a large-scale QA task, and a smaller, but more complex, toy task generated from a simulated world In the latter, we show the reasoning power of such models by chaining multiple supporting sentences to answer questions that require understanding the intension of verbs

198 citations


Posted Content
TL;DR: In this article, a bandit algorithm for smooth trees (BAST) is proposed, which takes into account ac- tual smoothness of the rewards for perform- ing efficient "cuts" of sub-optimal branches with high confidence.
Abstract: Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to re- turn rapidly a good value, and improve preci- sion if more time is provided. The UCT algo- rithm [8], a tree search method based on Up- per Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is "over-optimistic" in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT us- ing a confidence sequence that scales expo- nentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account ac- tual smoothness of the rewards for perform- ing efficient "cuts" of sub-optimal branches with high confidence. Finally, we present an incremental tree expansion which applies when the full tree is too big (possibly in- finite) to be entirely represented and show that with high probability, only the optimal branches are indefinitely developed. We illus- trate these methods on a global optimization problem of a continuous function, given noisy values.

196 citations


Posted Content
TL;DR: This report surveys all of the publicly available AC transmission system test cases, to the best of the authors' knowledge, and finds that many of the traditional test cases are missing key network operation constraints, such as line thermal limits and generator capability curves.
Abstract: In recent years the power systems research community has seen an explosion of work applying operations research techniques to challenging power network optimization problems. Regardless of the application under consideration, all of these works rely on power system test cases for evaluation and validation. However, many of the well established power system test cases were developed as far back as the 1960s with the aim of testing AC power flow algorithms. It is unclear if these power flow test cases are suitable for power system optimization studies. This report surveys all of the publicly available AC transmission system test cases, to the best of our knowledge, and assess their suitability for optimization tasks. It finds that many of the traditional test cases are missing key network operation constraints, such as line thermal limits and generator capability curves. To incorporate these missing constraints, data driven models are developed from a variety of publicly available data sources. The resulting extended test cases form a compressive archive, NESTA, for the evaluation and validation of power system optimization algorithms.

164 citations


Posted Content
TL;DR: In this paper, a Bayesian framework for learning falling rule lists is proposed, which does not rely on traditional greedy decision tree learning methods and is inspired by healthcare applications where patients would be stratified into risk sets and the highest at-risk patients should be considered first.
Abstract: Falling rule lists are classification models consisting of an ordered list of if-then rules, where (i) the order of rules determines which example should be classified by each rule, and (ii) the estimated probability of success decreases monotonically down the list. These kinds of rule lists are inspired by healthcare applications where patients would be stratified into risk sets and the highest at-risk patients should be considered first. We provide a Bayesian framework for learning falling rule lists that does not rely on traditional greedy decision tree learning methods.

146 citations


Book ChapterDOI
TL;DR: When you made this decision, you were likely relying on a behavioural “proxy”, an internal motivation that abstracts the problem of evaluating a decision impact on your overall life, but evaluating it in regard to some simple fitness function.
Abstract: Is it better for you to own a corkscrew or not? If asked, you as a human being would likely say “yes”, but more importantly, you are somehow able to make this decision. You are able to decide this, even if your current acute problems or task do not include opening a wine bottle. Similarly, it is also unlikely that you evaluated several possible trajectories your life could take and looked at them with and without a corkscrew, and then measured your survival or reproductive fitness in each. When you, as a human cognitive agent, made this decision, you were likely relying on a behavioural “proxy”, an internal motivation that abstracts the problem of evaluating a decision impact on your overall life, but evaluating it in regard to some simple fitness function. One example would be the idea of curiosity, urging you to act so that your experience new sensations and learn about the environment. On average, this should lead to better and richer models of the world, which give you a better chance of reaching your ultimate goals of survival and reproduction.

146 citations


Posted Content
TL;DR: In this paper, a general class of causal models, defined in terms of a collection of equations, as defined by Pearl, are axiomatized and the complexity of decision procedures is examined.
Abstract: Causal models defined in terms of a collection of equations, as defined by Pearl, are axiomatized here. Axiomatizations are provided for three successively more general classes of causal models: (1) the class of recursive theories (those without feedback), (2) the class of theories where the solutions to the equations are unique, (3) arbitrary theories (where the equations may not have solutions and, if they do, they are not necessarily unique). It is shown that to reason about causality in the most general third class, we must extend the language used by Galles and Pearl. In addition, the complexity of the decision procedures is examined for all the languages and classes of models considered.

Posted Content
TL;DR: A knowledge engine, which learns and shares knowledge representations, for robots to carry out a variety of tasks, is introduced and its use in three important research areas: grounding natural language, perception, and planning, which are the key building blocks for many robotic tasks.
Abstract: In this paper we introduce a knowledge engine, which learns and shares knowledge representations, for robots to carry out a variety of tasks. Building such an engine brings with it the challenge of dealing with multiple data modalities including symbols, natural language, haptic senses, robot trajectories, visual features and many others. The \textit{knowledge} stored in the engine comes from multiple sources including physical interactions that robots have while performing tasks (perception, planning and control), knowledge bases from the Internet and learned representations from several robotics research groups. We discuss various technical aspects and associated challenges such as modeling the correctness of knowledge, inferring latent information and formulating different robotic tasks as queries to the knowledge engine. We describe the system architecture and how it supports different mechanisms for users and robots to interact with the engine. Finally, we demonstrate its use in three important research areas: grounding natural language, perception, and planning, which are the key building blocks for many robotic tasks. This knowledge engine is a collaborative effort and we call it RoboBrain.

Posted Content
TL;DR: This paper first derive a formula for computing the gradient of this risk-sensitive objective function, then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction.
Abstract: In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.

Journal ArticleDOI
TL;DR: A comparative study is done between Fuzzy clustering algorithm and hard clustering technique based on fuzzy logic.
Abstract: Data clustering is an important area of data mining. This is an unsupervised study where data of similar types are put into one cluster while data of another types are put into different cluster. Fuzzy C means is a very important clustering technique based on fuzzy logic. Also we have some hard clustering techniques available like K-means among the popular ones. In this paper a comparative study is done between Fuzzy clustering algorithm and hard clustering algorithm

Journal ArticleDOI
TL;DR: It is shown how to find a minimum weight loop cutset in a Bayesian network with high probability and empirically that a variant of this algorithm often finds a loop cut set that is closer to the minimum weightloop cutset than the ones found by the best deterministic algorithms known.
Abstract: We show how to find a minimum loop cutset in a Bayesian network with high probability. Finding such a loop cutset is the first step in Pearl's method of conditioning for inference. Our random algorithm for finding a loop cutset, called "Repeated WGuessI", outputs a minimum loop cutset, after O(c 6^k k n) steps, with probability at least 1-(1 over{6^k})^{c 6^k}), where c>1 is a constant specified by the user, k is the size of a minimum weight loop cutset, and n is the number of vertices. We also show empirically that a variant of this algorithm, called WRA, often finds a loop cutset that is closer to the minimum loop cutset than the ones found by the best deterministic algorithms known.

Posted Content
TL;DR: In this paper, the authors discuss the method of Bayesian regression and its efficacy for predicting price variation of Bitcoin, a recently popularized virtual, cryptographic currency, and devise a simple strategy for trading Bitcoin.
Abstract: In this paper, we discuss the method of Bayesian regression and its efficacy for predicting price variation of Bitcoin, a recently popularized virtual, cryptographic currency. Bayesian regression refers to utilizing empirical data as proxy to perform Bayesian inference. We utilize Bayesian regression for the so-called "latent source model". The Bayesian regression for "latent source model" was introduced and discussed by Chen, Nikolov and Shah (2013) and Bresler, Chen and Shah (2014) for the purpose of binary classification. They established theoretical as well as empirical efficacy of the method for the setting of binary classification. In this paper, instead we utilize it for predicting real-valued quantity, the price of Bitcoin. Based on this price prediction method, we devise a simple strategy for trading Bitcoin. The strategy is able to nearly double the investment in less than 60 day period when run against real data trace.

Posted Content
TL;DR: In this article, the authors present a dataset of question-answering tasks based on real-world indoor images that establishes a visual turing challenge and evaluate different algorithms on this open task.
Abstract: As language and visual understanding by machines progresses rapidly, we are observing an increasing interest in holistic architectures that tightly interlink both modalities in a joint learning and inference process. This trend has allowed the community to progress towards more challenging and open tasks and refueled the hope at achieving the old AI dream of building machines that could pass a turing test in open domains. In order to steadily make progress towards this goal, we realize that quantifying performance becomes increasingly difficult. Therefore we ask how we can precisely define such challenges and how we can evaluate different algorithms on this open tasks? In this paper, we summarize and discuss such challenges as well as try to give answers where appropriate options are available in the literature. We exemplify some of the solutions on a recently presented dataset of question-answering task based on real-world indoor images that establishes a visual turing challenge. Finally, we argue despite the success of unique ground-truth annotation, we likely have to step away from carefully curated dataset and rather rely on 'social consensus' as the main driving force to create suitable benchmarks. Providing coverage in this inherently ambiguous output space is an emerging challenge that we face in order to make quantifiable progress in this area.

Posted Content
TL;DR: In this article, the Sure Thing Principle is derived in this setting and a characterization of generalized qualitative probability that includes and blends both traditional qualitative probability and the ranked structures used in logical approaches.
Abstract: Preferences among acts are analyzed in the style of L. Savage, but as partially ordered. The rationality postulates considered are weaker than Savage's on three counts. The Sure Thing Principle is derived in this setting. The postulates are shown to lead to a characterization of generalized qualitative probability that includes and blends both traditional qualitative probability and the ranked structures used in logical approaches.

Posted Content
TL;DR: In this article, a technique for order preference by similarity to ideal solution (TOPSIS) approach and extend the TOPSIS method to MCDM problem with single valued neutrosophic information is presented.
Abstract: The process of multiple criteria decision making (MCDM) is of determining the best choice among all of the probable alternatives. The problem of supplier selection on which decision maker has usually vague and imprecise knowledge is a typical example of multi criteria group decision-making problem. The conventional crisp techniques has not much effective for solving MCDM problems because of imprecise or fuzziness nature of the linguistic assessments. To find the exact values for MCDM problems is both difficult and impossible in more cases in real world. So, it is more reasonable to consider the values of alternatives according to the criteria as single valued neutrosophic sets (SVNS). This paper deal with the technique for order preference by similarity to ideal solution (TOPSIS) approach and extend the TOPSIS method to MCDM problem with single valued neutrosophic information. The value of each alternative and the weight of each criterion are characterized by single valued neutrosophic numbers. Here, the importance of criteria and alternatives is identified by aggregating individual opinions of decision makers (DMs) via single valued neutrosophic weighted averaging (IFWA) operator. The proposed method is, easy use, precise and practical for solving MCDM problem with single valued neutrosophic data. Finally, to show the applicability of the developed method, a numerical experiment for supplier choice is given as an application of single valued neutrosophic TOPSIS method at end of this paper.

Posted Content
TL;DR: In this article, the authors present an approach that works with a black box oracle for weights of assignments and requires only an NP-oracle (in practice, a SAT-solver) to solve both the counting and sampling problems.
Abstract: Given a CNF formula and a weight for each assignment of values to variables, two natural problems are weighted model counting and distribution-aware sampling of satisfying assignments. Both problems have a wide variety of important applications. Due to the inherent complexity of the exact versions of the problems, interest has focused on solving them approximately. Prior work in this area scaled only to small problems in practice, or failed to provide strong theoretical guarantees, or employed a computationally-expensive maximum a posteriori probability (MAP) oracle that assumes prior knowledge of a factored representation of the weight distribution. We present a novel approach that works with a black-box oracle for weights of assignments and requires only an {\NP}-oracle (in practice, a SAT-solver) to solve both the counting and sampling problems. Our approach works under mild assumptions on the distribution of weights of satisfying assignments, provides strong theoretical guarantees, and scales to problems involving several thousand variables. We also show that the assumptions can be significantly relaxed while improving computational efficiency if a factored representation of the weights is known.

Journal ArticleDOI
TL;DR: A survey of multi-objective methods for sequential decision-making problems with multiple objectives can be found in this article, where the authors identify three distinct scenarios in which converting such a problem to a singleobjective one is impossible, infeasible, or undesirable.
Abstract: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.

Journal ArticleDOI
TL;DR: This work provides an investigation in terms of different semantics proposed for abstract argumentation frameworks, a nonmonotonic yet simple formalism which received increasing interest within the last decade.
Abstract: Translations between different nonmonotonic formalisms always have been an important topic in the field, in particular to understand the knowledge-representation capabilities those formalisms offer. We provide such an investigation in terms of different semantics proposed for abstract argumentation frameworks, a nonmonotonic yet simple formalism which received increasing interest within the last decade. Although the properties of these different semantics are nowadays well understood, there are no explicit results about intertranslatability. We provide such translations wrt. different properties and also give a few novel complexity results which underlie some negative results.

Posted Content
TL;DR: In this article, a D numbers extended consistent fuzzy preference relation (D-CFPR) is proposed to overcome the weakness of CFPR in dealing with uncertain and incomplete information, which can be seen as an extension of the classical CFPR.
Abstract: How to express an expert's or a decision maker's preference for alternatives is an open issue. Consistent fuzzy preference relation (CFPR) is with big advantages to handle this problem due to it can be construed via a smaller number of pairwise comparisons and satisfies additive transitivity property. However, the CFPR is incapable of dealing with the cases involving uncertain and incomplete information. In this paper, a D numbers extended consistent fuzzy preference relation (D-CFPR) is proposed to overcome the weakness. The D-CFPR extends the classical CFPR by using a new model of expressing uncertain information called D numbers. The D-CFPR inherits the merits of classical CFPR and can be totally reduced to the classical CFPR. This study can be integrated into our previous study about D-AHP (D numbers extended AHP) model to provide a systematic solution for multi-criteria decision making (MCDM).

Journal ArticleDOI
TL;DR: A method for using standard techniques from satisfiability checking to automatically verify and discover theorems in an area of economic theory known as ranking sets of objects, which has important applications in social choice theory and decision making under uncertainty.
Abstract: We present a method for using standard techniques from satisfiability checking to automatically verify and discover theorems in an area of economic theory known as ranking sets of objects. The key question in this area, which has important applications in social choice theory and decision making under uncertainty, is how to extend an agents preferences over a number of objects to a preference relation over nonempty sets of such objects. Certain combinations of seemingly natural principles for this kind of preference extension can result in logical inconsistencies, which has led to a number of important impossibility theorems. We first prove a general result that shows that for a wide range of such principles, characterised by their syntactic form when expressed in a many-sorted first-order logic, any impossibility exhibited at a fixed (small) domain size will necessarily extend to the general case. We then show how to formulate candidates for impossibility theorems at a fixed domain size in propositional logic, which in turn enables us to automatically search for (general) impossibility theorems using a SAT solver. When applied to a space of 20 principles for preference extension familiar from the literature, this method yields a total of 84 impossibility theorems, including both known and nontrivial new results.

Posted Content
TL;DR: In this article, the authors present two alternative hybrid metaheuristic algorithms for the CluVRP, based on an Iterated Local Search algorithm, in which only feasible solutions are explored and problem-specific local search moves are utilized.
Abstract: The Clustered Vehicle Routing Problem (CluVRP) is a variant of the Capacitated Vehicle Routing Problem in which customers are grouped into clusters. Each cluster has to be visited once, and a vehicle entering a cluster cannot leave it until all customers have been visited. This article presents two alternative hybrid metaheuristic algorithms for the CluVRP. The first algorithm is based on an Iterated Local Search algorithm, in which only feasible solutions are explored and problem-specific local search moves are utilized. The second algorithm is a Hybrid Genetic Search, for which the shortest Hamiltonian path between each pair of vertices within each cluster should be precomputed. Using this information, a sequence of clusters can be used as a solution representation and large neighborhoods can be efficiently explored by means of bi-directional dynamic programming, sequence concatenations, by using appropriate data structures. Extensive computational experiments are performed on benchmark instances from the literature, as well as new large scale ones. Recommendations on promising algorithm choices are provided relatively to average cluster size.

Journal ArticleDOI
TL;DR: The paradigmatic results demonstrate that empowerment can be used as a suitable generic intrinsic motivation to not only generate actions in given static environments, but also to modify existing environmental conditions.
Abstract: One aspect of intelligence is the ability to restructure your own environment so that the world you live in becomes more beneficial to you. In this paper we investigate how the information-theoretic measure of agent empowerment can provide a task-independent, intrinsic motivation to restructure the world. We show how changes in embodiment and in the environment change the resulting behaviour of the agent and the artefacts left in the world. For this purpose, we introduce an approximation of the established empowerment formalism based on sparse sampling, which is simpler and significantly faster to compute for deterministic dynamics. Sparse sampling also introduces a degree of randomness into the decision making process, which turns out to beneficial for some cases. We then utilize the measure to generate agent behaviour for different agent embodiments in a Minecraft-inspired three dimensional block world. The paradigmatic results demonstrate that empowerment can be used as a suitable generic intrinsic motivation to not only generate actions in given static environments, as shown in the past, but also to modify existing environmental conditions. In doing so, the emerging strategies to modify an agent's environment turn out to be meaningful to the specific agent capabilities, i.e., de facto to its embodiment.

Posted Content
TL;DR: It is shown that tractable inference in probabilistic models with high treewidth and millions of variables can be explained with the notion of finite (partial) exchangeability.
Abstract: Exchangeability is a central notion in statistics and probability theory. The assumption that an infinite sequence of data points is exchangeable is at the core of Bayesian statistics. However, finite exchangeability as a statistical property that renders probabilistic inference tractable is less well-understood. We develop a theory of finite exchangeability and its relation to tractable probabilistic inference. The theory is complementary to that of independence and conditional independence. We show that tractable inference in probabilistic models with high treewidth and millions of variables can be understood using the notion of finite (partial) exchangeability. We also show that existing lifted inference algorithms implicitly utilize a combination of conditional independence and partial exchangeability.

Proceedings ArticleDOI
Guido Governatori1
TL;DR: A novel deontic logic contrary-to-duty/derived permission paradox based on the interaction of obligations, permissions and contrary- to-duty obligations is presented.
Abstract: In this paper we discuss some reasons why temporal logic might not be suitable to model real life norms. To show this, we present a novel deontic logic contrary-to-duty/derived permission paradox based on the interaction of obligations, permissions and contrary-to-duty obligations. The paradox is inspired by real life norms.

Posted Content
TL;DR: In this article, the authors argue that in some situations, it is better to ignore information if the uncertainty is represented by a set of probability measures rather than by a single piece of information.
Abstract: It is commonly-accepted wisdom that more information is better, and that information should never be ignored. Here we argue, using both a Bayesian and a non-Bayesian analysis, that in some situations you are better off ignoring information if your uncertainty is represented by a set of probability measures. These include situations in which the information is relevant for the prediction task at hand. In the non-Bayesian analysis, we show how ignoring information avoids dilation, the phenomenon that additional pieces of information sometimes lead to an increase in uncertainty. In the Bayesian analysis, we show that for small sample sizes and certain prediction tasks, the Bayesian posterior based on a noninformative prior yields worse predictions than simply ignoring the given information.

Posted Content
TL;DR: This work considers the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes, and shows that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance.
Abstract: We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy Iteration (CPI), a natural adaptation of the Policy Search by Dynamic Programming algorithm to the infinite-horizon case (PSDP$_\infty$), and the recently proposed Non-Stationary Policy iteration (NSPI(m)). For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API/API($\alpha$), but this comes at the cost of a relative---exponential in $\frac{1}{\epsilon}$---increase of the number of iterations. 2) PSDP$_\infty$ enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a number of iterations similar to that of API. 3) Contrary to API that requires a constant memory, the memory needed by CPI and PSDP$_\infty$ is proportional to their number of iterations, which may be problematic when the discount factor $\gamma$ is close to 1 or the approximation error $\epsilon$ is close to $0$; we show that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance. Simulations with these schemes confirm our analysis.