scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Artificial Intelligence Research in 2011"


Journal ArticleDOI
TL;DR: In this article, the concept of adaptive submodularity is introduced, which generalizes submodular set functions to adaptive policies and provides performance guarantees for both stochastic maximization and coverage, and can be exploited to speed up the greedy algorithm by using lazy evaluations.
Abstract: Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations.

570 citations


Journal ArticleDOI
TL;DR: It is shown that the Nash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem and proposed an extensive-form game model that makes the defender's uncertainty about the attacker's ability to observe explicit.
Abstract: There has been significant recent interest in game-theoretic approaches to security, with much of the recent research focused on utilizing the leader-follower Stackelberg game model. Among the major applications are the ARMOR program deployed at LAX Airport and the IRIS program in use by the US Federal Air Marshals (FAMS). The foundational assumption for using Stackelberg games is that security forces (leaders), acting first, commit to a randomized strategy; while their adversaries (followers) choose their best response after surveillance of this randomized strategy. Yet, in many situations, a leader may face uncertainty about the follower's surveillance capability. Previous work fails to address how a leader should compute her strategy given such uncertainty. We provide five contributions in the context of a general class of security games. First, we show that the Nash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem. Second, under a natural restriction on security games, any Stackelberg strategy is also a Nash equilibrium strategy; and furthermore, the solution is unique in a class of security games of which ARMOR is a key exemplar. Third, when faced with a follower that can attack multiple targets, many of these properties no longer hold. Fourth, we show experimentally that in most (but not all) games where the restriction does not hold, the Stackelberg strategy is still a Nash equilibrium strategy, but this is no longer true when the attacker can attack multiple targets. Finally, as a possible direction for future research, we propose an extensive-form game model that makes the defender's uncertainty about the attacker's ability to observe explicit.

263 citations


Journal ArticleDOI
TL;DR: The complexity of possible/necessary winner problems for the following common voting rules are completely characterized: a class of positional scoring rules (including Borda), Copeland, maximin, Bucklin, ranked pairs, voting trees, and plurality with runoff.
Abstract: Usually a voting rule requires agents to give their preferences as linear orders. However, in some cases it is impractical for an agent to give a linear order over all the alternatives. It has been suggested to let agents submit partial orders instead. Then, given a voting rule, a profile of partial orders, and an alternative (candidate) c, two important questions arise: first, is it still possible for c to win, and second, is c guaranteed to win? These are the possible winner and necessary winner problems, respectively. Each of these two problems is further divided into two sub-problems: determining whether c is a unique winner (that is, c is the only winner), or determining whether c is a co-winner (that is, c is in the set of winners). We consider the setting where the number of alternatives is unbounded and the votes are unweighted. We completely characterize the complexity of possible/necessary winner problems for the following common voting rules: a class of positional scoring rules (including Borda), Copeland, maximin, Bucklin, ranked pairs, voting trees, and plurality with runoff.

200 citations


Journal ArticleDOI
TL;DR: This work introduces MAPP, a tractable algorithm for multi-agent path planning on undirected graphs and presents a basic version and several extensions, which have low-polynomial worst-case upper bounds for the running time, the memory requirements, and the length of solutions.
Abstract: Multi-agent path planning is a challenging problem with numerous real-life applications. Running a centralized search such as A* in the combined state space of all units is complete and cost-optimal, but scales poorly, as the state space size is exponential in the number of mobile units. Traditional decentralized approaches, such as FAR and WHCA*, are faster and more scalable, being based on problem decomposition. However, such methods are incomplete and provide no guarantees with respect to the running time or the solution quality. They are not necessarily able to tell in a reasonable time whether they would succeed in finding a solution to a given instance. We introduce MAPP, a tractable algorithm for multi-agent path planning on undirected graphs. We present a basic version and several extensions. They have low-polynomial worst-case upper bounds for the running time, the memory requirements, and the length of solutions. Even though all algorithmic versions are incomplete in the general case, each provides formal guarantees on problems it can solve. For each version, we discuss the algorithm's completeness with respect to clearly defined subclasses of instances. Experiments were run on realistic game grid maps. MAPP solved 99.86% of all mobile units, which is 18-22% better than the percentage of FAR and WHCA*. MAPP marked 98.82% of all units as provably solvable during the first stage of plan computation. Parts of MAPP's computation can be re-used across instances on the same map. Speed-wise, MAPP is competitive or significantly faster than WHCA*, depending on whether MAPP performs all computations from scratch. When data that MAPP can re-use are preprocessed offline and readily available, MAPP is slower than the very fast FAR algorithm by a factor of 2.18 on average. MAPP's solutions are on average 20% longer than FAR's solutions and 7-31% longer than WHCA*'s solutions.

170 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide a computationally feasible approximation to the AIXI agent by introducing a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm.
Abstract: This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a variety of stochastic and partially observable domains. We conclude by proposing a number of directions for future research.

169 citations


Journal ArticleDOI
TL;DR: This paper defines six novel nonconformity measures based on the k-Nearest Neighbours Regression algorithm and develops the corresponding CPs following both the original (transductive) and the inductive CP approaches.
Abstract: In this paper we apply Conformal Prediction (CP) to the k-Nearest Neighbours Regression (k-NNR) algorithm and propose ways of extending the typical nonconformity measure used for regression so far. Unlike traditional regression methods which produce point predictions, Conformal Predictors output predictive regions that satisfy a given confidence level. The regions produced by any Conformal Predictor are automatically valid, however their tightness and therefore usefulness depends on the nonconformity measure used by each CP. In effect a nonconformity measure evaluates how strange a given example is compared to a set of other examples based on some traditional machine learning algorithm. We define six novel nonconformity measures based on the k-Nearest Neighbours Regression algorithm and develop the corresponding CPs following both the original (transductive) and the inductive CP approaches. A comparison of the predictive regions produced by our measures with those of the typical regression measure suggests that a major improvement in terms of predictive region tightness is achieved by the new measures.

143 citations


Journal ArticleDOI
TL;DR: In this paper, the authors model and study the case in which the attacker launches a multipronged (i.e., multimode) attack and prove that for various election systems even such concerted, flexible attacks can be perfectly planned in deterministic polynomial time.
Abstract: In 1992, Bartholdi, Tovey, and Trick opened the study of control attacks on elections-- attempts to improve the election outcome by such actions as adding/deleting candidates or voters. That work has led to many results on how algorithms can be used to find attacks on elections and how complexity-theoretic hardness results can be used as shields against attacks. However, all the work in this line has assumed that the attacker employs just a single type of attack. In this paper, we model and study the case in which the attacker launches a multipronged (i.e., multimode) attack. We do so to more realistically capture the richness of real-life settings. For example, an attacker might simultaneously try to suppress some voters, attract new voters into the election, and introduce a spoiler candidate. Our model provides a unified framework for such varied attacks. By constructing polynomialtime multiprong attack algorithms we prove that for various election systems even such concerted, flexible attacks can be perfectly planned in deterministic polynomial time.

134 citations


Journal ArticleDOI
TL;DR: This work presents a novel, probabilistic framework for modeling articulated objects as kinematic graphs, and demonstrates that this approach has a broad set of applications, in particular for the emerging fields of mobile manipulation and service robotics.
Abstract: Robots operating in domestic environments generally need to interact with articulated objects, such as doors, cabinets, dishwashers or fridges. In this work, we present a novel, probabilistic framework for modeling articulated objects as kinematic graphs. Vertices in this graph correspond to object parts, while edges between them model their kinematic relationship. In particular, we present a set of parametric and non-parametric edge models and how they can robustly be estimated from noisy pose observations. We furthermore describe how to estimate the kinematic structure and how to use the learned kinematic models for pose prediction and for robotic manipulation tasks. We finally present how the learned models can be generalized to new and previously unseen objects. In various experiments using real robots with different camera systems as well as in simulation, we show that our approach is valid, accurate and efficient. Further, we demonstrate that our approach has a broad set of applications, in particular for the emerging fields of mobile manipulation and service robotics.

129 citations


Journal ArticleDOI
TL;DR: A polynomial-time algorithm for determining an optimal patrol under the Markovian strategy assumption for the robots, such that the probability of detecting the adversary in the patrol's weakest spot is maximized.
Abstract: The problem of adversarial multi-robot patrol has gained interest in recent years, mainly due to its immediate relevance to various security applications. In this problem, robots are required to repeatedly visit a target area in a way that maximizes their chances of detecting an adversary trying to penetrate through the patrol path. When facing a strong adversary that knows the patrol strategy of the robots, if the robots use a deterministic patrol algorithm, then in many cases it is easy for the adversary to penetrate undetected (in fact, in some of those cases the adversary can guarantee penetration). Therefore this paper presents a non-deterministic patrol framework for the robots. Assuming that the strong adversary will take advantage of its knowledge and try to penetrate through the patrol's weakest spot, hence an optimal algorithm is one that maximizes the chances of detection in that point. We therefore present a polynomial-time algorithm for determining an optimal patrol under the Markovian strategy assumption for the robots, such that the probability of detecting the adversary in the patrol's weakest spot is maximized. We build upon this framework and describe an optimal patrol strategy for several robotic models based on their movement abilities (directed or undirected) and sensing abilities (perfect or imperfect), and in different environment models - either patrol around a perimeter (closed polygon) or an open fence (open polyline).

116 citations


Journal ArticleDOI
TL;DR: An online, forward-search algorithm called the Posterior Belief Distribution (PBD) is presented, which leverages a novel method for calculating the posterior distribution over beliefs that result after a sequence of actions is taken, given the set of observation sequences that could be received during this process.
Abstract: Deciding how to act in partially observable environments remains an active area of research. Identifying good sequences of decisions is particularly challenging when good control performance requires planning multiple steps into the future in domains with many states. Towards addressing this challenge, we present an online, forward-search algorithm called the Posterior Belief Distribution (PBD). PBD leverages a novel method for calculating the posterior distribution over beliefs that result after a sequence of actions is taken, given the set of observation sequences that could be received during this process. This method allows us to efficiently evaluate the expected reward of a sequence of primitive actions, which we refer to as macro-actions. We present a formal analysis of our approach, and examine its performance on two very large simulation experiments: scientific exploration and a target monitoring domain. We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multistep lookahead is required to achieve good performance.

99 citations


Journal ArticleDOI
TL;DR: It is proved that, although the solver is not explicitly designed for it, with high probability it ends up behaving as width-k resolution after no more than O(n2k+2) conflicts and restarts, where n is the number of variables.
Abstract: We offer a new understanding of some aspects of practical SAT-solvers that are based on DPLL with unit-clause propagation, clause-learning, and restarts. We do so by analyzing a concrete algorithm which we claim is faithful to what practical solvers do. In particular, before making any new decision or restart, the solver repeatedly applies the unit-resolution rule until saturation, and leaves no component to the mercy of non-determinism except for some internal randomness. We prove the perhaps surprising fact that, although the solver is not explicitly designed for it, with high probability it ends up behaving as width-k resolution after no more than O(n2k+2) conflicts and restarts, where n is the number of variables. In other words, width-k resolution can be thought of as O(n2k+2) restarts of the unit-resolution rule with learning.

Journal ArticleDOI
Toby Walsh1
TL;DR: Empirical studies are shown to be useful in improving the understanding of computational complexity in voting and manipulation, and two settings are considered which represent the two types of complexity results: manipulation with unweighted votes by a single agent, and manipulation with weighted Votes by a coalition of agents.
Abstract: Voting is a simple mechanism to combine together the preferences of multiple agents. Unfortunately, agents may try to manipulate the result by mis-reporting their preferences. One barrier that might exist to such manipulation is computational complexity. In particular, it has been shown that it is NP-hard to compute how to manipulate a number of different voting rules. However, NP-hardness only bounds the worst-case complexity. Recent theoretical results suggest that manipulation may often be easy in practice. In this paper, we show that empirical studies are useful in improving our understanding of this issue. We consider two settings which represent the two types of complexity results that have been identified in this area: manipulation with unweighted votes by a single agent, and manipulation with weighted votes by a coalition of agents. In the first case, we consider Single Transferable Voting (STV), and in the second case, we consider veto voting. STV is one of the few voting rules used in practice where it is NP-hard to compute how a single agent can manipulate the result when votes are unweighted. It also appears one of the harder voting rules to manipulate since it involves multiple rounds. On the other hand, veto voting is one of the simplest representatives of voting rules where it is NP-hard to compute how a coalition of weighted agents can manipulate the result. In our experiments, we sample a number of distributions of votes including uniform, correlated and real world elections. In many of the elections in our experiments, it was easy to compute how to manipulate the result or to prove that manipulation was impossible. Even when we were able to identify a situation in which manipulation was hard to compute (e.g. when votes are highly correlated and the election is "hung"), we found that the computational difficulty of computing manipulations was somewhat precarious (e.g. with such "hung" elections, even a single uncorrelated voter was enough to make manipulation easy to compute).

Journal ArticleDOI
TL;DR: A cluster-ranking approach to coreference resolution is proposed, which combines the strengths of the mention-ranking model and the entity-mention model, and is therefore theoretically more appealing than both of these models.
Abstract: Traditional learning-based coreference resolvers operate by training the mention-pair model for determining whether two mentions are coreferent or not. Though conceptually simple and easy to understand, the mention-pair model is linguistically rather unappealing and lags far behind the heuristic-based coreference models proposed in the prestatistical NLP era in terms of sophistication. Two independent lines of recent research have attempted to improve the mention-pair model, one by acquiring the mention-ranking model to rank preceding mentions for a given anaphor, and the other by training the entity-mention model to determine whether a preceding cluster is coreferent with a given mention. We propose a cluster-ranking approach to coreference resolution, which combines the strengths of the mention-ranking model and the entity-mention model, and is therefore theoretically more appealing than both of these models. In addition, we seek to improve cluster rankers via two extensions: (1) lexicalization and (2) incorporating knowledge of anaphoricity by jointly modeling anaphoricity determination and coreference resolution. Experimental results on the ACE data sets demonstrate the superior performance of cluster rankers to competing approaches as well as the effectiveness of our two extensions.

Journal ArticleDOI
TL;DR: This paper proposes the use of software agents, residing on the users' smart meters, to automate and optimise the charging cycle of micro-storage devices in the home to minimise its costs, and develops a game-theoretic framework within which to analyse the competitive equilibria of an electricity grid populated by such agents.
Abstract: In this paper, we present a novel decentralised management technique that allows electricity micro-storage devices, deployed within individual homes as part of a smart electricity grid, to converge to profitable and efficient behaviours. Specifically, we propose the use of software agents, residing on the users' smart meters, to automate and optimise the charging cycle of micro-storage devices in the home to minimise its costs, and we present a study of both the theoretical underpinnings and the implications of a practical solution, of using software agents for such micro-storage management. First, by formalising the strategic choice each agent makes in deciding when to charge its battery, we develop a game-theoretic framework within which we can analyse the competitive equilibria of an electricity grid populated by such agents and hence predict the best consumption profile for that population given their battery properties and individual load profiles. Our framework also allows us to compute theoretical bounds on the amount of storage that will be adopted by the population. Second, to analyse the practical implications of micro-storage deployments in the grid, we present a novel algorithm that each agent can use to optimise its battery storage profile in order to minimise its owner's costs. This algorithm uses a learning strategy that allows it to adapt as the price of electricity changes in real-time, and we show that the adoption of these strategies results in the system converging to the theoretical equilibria. Finally, we empirically evaluate the adoption of our micro-storage management technique within a complex setting, based on the UK electricity market, where agents may have widely varying load profiles, battery types, and learning rates. In this case, our approach yields savings of up to 14% in energy cost for an average consumer using a storage device with a capacity of less than 4.5 kWh and up to a 7% reduction in carbon emissions resulting from electricity generation (with only domestic consumers adopting micro-storage and, commercial and industrial consumers not changing their demand). Moreover, corroborating our theoretical bound, an equilibrium is shown to exist where no more than 48% of households would wish to own storage devices and where social welfare would also be improved (yielding overall annual savings of nearly £1.5B).

Journal ArticleDOI
TL;DR: This study shows that GBF has several theoretical properties that enable MRE to automatically identify the most relevant target variables in forming its explanation and shows that CBF is able to capture well the explaining-away phenomenon that is often represented in Bayesian networks.
Abstract: A major inference task in Bayesian networks is explaining why some variables are observed in their particular states using a set of target variables. Existing methods for solving this problem often generate explanations that are either too simple (underspecified) or too complex (overspecified). In this paper, we introduce a method called Most Relevant Explanation (MRE) which finds a partial instantiation of the target variables that maximizes the generalized Bayes factor (GBF) as the best explanation for the given evidence. Our study shows that GBF has several theoretical properties that enable MRE to automatically identify the most relevant target variables in forming its explanation. In particular, conditional Bayes factor (CBF), defined as the GBF of a new explanation conditioned on an existing explanation, provides a soft measure on the degree of relevance of the variables in the new explanation in explaining the evidence given the existing explanation. As a result, MRE is able to automatically prune less relevant variables from its explanation. We also show that CBF is able to capture well the explaining-away phenomenon that is often represented in Bayesian networks. Moreover, we define two dominance relations between the candidate solutions and use the relations to generalize MRE to find a set of top explanations that is both diverse and representative. Case studies on several benchmark diagnostic Bayesian networks show that MRE is often able to find explanatory hypotheses that are not only precise but also concise.

Journal ArticleDOI
TL;DR: This paper investigates by how much a player can change his power, as measured by the Shapley-Shubik index or the Banzhaf index, by means of a false-name manipulation, i.e., splitting his weight among two or more identities.
Abstract: Weighted voting is a classic model of cooperation among agents in decision-making domains. In such games, each player has a weight, and a coalition of players wins the game if its total weight meets or exceeds a given quota. A player's power in such games is usually not directly proportional to his weight, and is measured by a power index, the most prominent among which are the Shapley-Shubik index and the Banzhaf index. In this paper, we investigate by how much a player can change his power, as measured by the Shapley-Shubik index or the Banzhaf index, by means of a false-name manipulation, i.e., splitting his weight among two or more identities. For both indices, we provide upper and lower bounds on the effect of weight-splitting. We then show that checking whether a beneficial split exists is NP-hard, and discuss efficient algorithms for restricted cases of this problem, as well as randomized algorithms for the general case. We also provide an experimental evaluation of these algorithms. Finally, we examine related forms of manipulative behavior, such as annexation, where a player subsumes other players, or merging, where several players unite into one. We characterize the computational complexity of such manipulations and provide limits on their effects. For the Banzhaf index, we describe a new paradox, which we term the Annexation Non-monotonicity Paradox.

Journal ArticleDOI
TL;DR: By summarizing structural reasons for analysis failure, TorchLight also provides diagnostic output indicating domain aspects that may cause local minima, and can distinguish "easy" domains from "hard" ones.
Abstract: The ignoring delete lists relaxation is of paramount importance for both satisficing and optimal planning. In earlier work, it was observed that the optimal relaxation heuristic h+ has amazing qualities in many classical planning benchmarks, in particular pertaining to the complete absence of local minima. The proofs of this are hand-made, raising the question whether such proofs can be lead automatically by domain analysis techniques. In contrast to earlier disappointing results - the analysis method has exponential runtime and succeeds only in two extremely simple benchmark domains - we herein answer this question in the afirmative. We establish connections between causal graph structure and h+ topology. This results in low-order polynomial time analysis methods, implemented in a tool we call TorchLight. Of the 12 domains where the absence of local minima has been proved, TorchLight gives strong success guarantees in 8 domains. Empirically, its analysis exhibits strong performance in a further 2 of these domains, plus in 4 more domains where local minima may exist but are rare. In this way, TorchLight can distinguish "easy" domains from "hard" ones. By summarizing structural reasons for analysis failure, TorchLight also provides diagnostic output indicating domain aspects that may cause local minima.

Journal ArticleDOI
TL;DR: In this article, the authors present a method for using standard techniques from satisfiability checking to automatically verify and discover theorems in an area of economic theory known as ranking sets of objects.
Abstract: We present a method for using standard techniques from satisfiability checking to automatically verify and discover theorems in an area of economic theory known as ranking sets of objects. The key question in this area, which has important applications in social choice theory and decision making under uncertainty, is how to extend an agent's preferences over a number of objects to a preference relation over nonempty sets of such objects. Certain combinations of seemingly natural principles for this kind of preference extension can result in logical inconsistencies, which has led to a number of important impossibility theorems. We first prove a general result that shows that for a wide range of such principles, characterised by their syntactic form when expressed in a many-sorted first-order logic, any impossibility exhibited at a fixed (small) domain size will necessarily extend to the general case. We then show how to formulate candidates for impossibility theorems at a fixed domain size in propositional logic, which in turn enables us to automatically search for (general) impossibility theorems using a SAT solver. When applied to a space of 20 principles for preference extension familiar from the literature, this method yields a total of 84 impossibility theorems, including both known and nontrivial new results.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the computational complexity of Circumscription when knowledge bases are formulated in DL-liteR, EL, and fragments thereof, and identify fragments whose complexity ranges from P to the second level of the polynomial hierarchy.
Abstract: Some of the applications of OWL and RDF (e.g. biomedical knowledge representation and semantic policy formulation) call for extensions of these languages with nonmonotonic constructs such as inheritance with overriding. Nonmonotonic description logics have been studied for many years, however no practical such knowledge representation languages exist, due to a combination of semantic difficulties and high computational complexity. Independently, low-complexity description logics such as DL-lite and EL have been introduced and incorporated in the OWL standard. Therefore, it is interesting to see whether the syntactic restrictions characterizing DL-lite and EL bring computational benefits to their nonmonotonic versions, too. In this paper we extensively investigate the computational complexity of Circumscription when knowledge bases are formulated in DL-liteR, EL, and fragments thereof. We identify fragments whose complexity ranges from P to the second level of the polynomial hierarchy, as well as fragments whose complexity raises to PSPACE and beyond.

Journal ArticleDOI
TL;DR: Surprisingly, FTVI also significantly outperforms popular 'heuristically-informed' MDP algorithms such as ILAO*, LRTDP, BRTDP and Bayesian-RTDP in many domains, sometimes by as much as two orders of magnitude.
Abstract: Value iteration is a powerful yet inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, ILAO* and variants of RTDP are state-of-the-art ones. These methods use reachability analysis and heuristic search to avoid some unnecessary backups. However, none of these approaches build the graphical structure of the state transitions in a pre-processing step or use the structural information to systematically decompose a problem, whereby generating an intelligent backup sequence of the state space. In this paper, we present two optimal MDP algorithms. The first algorithm, topological value iteration (TVI), detects the structure of MDPs and backs up states based on topological sequences. It (1) divides an MDP into strongly-connected components (SCCs), and (2) solves these components sequentially. TVI outperforms VI and other state-of-the-art algorithms vastly when an MDP has multiple, close-to-equal-sized SCCs. The second algorithm, focused topological value iteration (FTVI), is an extension of TVI. FTVI restricts its attention to connected components that are relevant for solving the MDP. Specifically, it uses a small amount of heuristic search to eliminate provably sub-optimal actions; this pruning allows FTVI to find smaller connected components, thus running faster. We demonstrate that FTVI outperforms TVI by an order of magnitude, averaged across several domains. Surprisingly, FTVI also significantly outperforms popular 'heuristically-informed' MDP algorithms such as ILAO*, LRTDP, BRTDP and Bayesian-RTDP in many domains, sometimes by as much as two orders of magnitude. Finally, we characterize the type of domains where FTVI excels -- suggesting a way to an informed choice of solver.

Journal ArticleDOI
TL;DR: This paper builds on a formal model that considers probability and certainty as two dimensions of trust and proposes a mechanism using which an agent can update the amount of trust it places in other agents on an ongoing basis.
Abstract: Leading agent-based trust models address two important needs. First, they show how an agent may estimate the trustworthiness of another agent based on prior interactions. Second, they show how agents may share their knowledge in order to cooperatively assess the trustworthiness of others. However, in real-life settings, information relevant to trust is usually obtained piecemeal, not all at once. Unfortunately, the problem of maintaining trust has drawn little attention. Existing approaches handle trust updates in a heuristic, not a principled, manner. This paper builds on a formal model that considers probability and certainty as two dimensions of trust. It proposes a mechanism using which an agent can update the amount of trust it places in other agents on an ongoing basis. This paper shows via simulation that the proposed approach (a) provides accurate estimates of the trustworthiness of agents that change behavior frequently; and (b) captures the dynamic behavior of the agents. This paper includes an evaluation based on a real dataset drawn from Amazon Marketplace, a leading e-commerce site.

Journal ArticleDOI
TL;DR: The Lemma-Lifting approach is presented, which combines an SMT solver with an external propositional core extractor and results in an unsatisfiable core for the original SMT problem, once the remaining theory lemmas are removed.
Abstract: The problem of finding small unsatisfiable cores for SAT formulas has recently received a lot of interest, mostly for its applications in formal verification. However, propositional logic is often not expressive enough for representing many interesting verification problems, which can be more naturally addressed in the framework of Satisfiability Modulo Theories, SMT. Surprisingly, the problem of finding unsatisfiable cores in SMT has received very little attention in the literature. In this paper we present a novel approach to this problem, called the Lemma-Lifting approach. The main idea is to combine an SMT solver with an external propositional core extractor. The SMT solver produces the theory lemmas found during the search, dynamically lifting the suitable amount of theory information to the Boolean level. The core extractor is then called on the Boolean abstraction of the original SMT problem and of the theory lemmas. This results in an unsatisfiable core for the original SMT problem, once the remaining theory lemmas are removed. The approach is conceptually interesting, and has several advantages in practice. In fact, it is extremely simple to implement and to update, and it can be interfaced with every propositional core extractor in a plug-and-play manner, so as to benefit for free of all unsat-core reduction techniques which have been or will be made available. We have evaluated our algorithm with a very extensive empirical test on SMT-LIB benchmarks, which confirms the validity and potential of this approach.

Journal ArticleDOI
TL;DR: Drake is designed to leverage the low latency made possible by a preprocessing step called compilation, while avoiding high memory costs through a compact representation, and to concisely record the implications of the discrete choices, exploiting the structure of the plan to avoid redundant reasoning or storage.
Abstract: This work presents Drake, a dynamic executive for temporal plans with choice. Dynamic plan execution strategies allow an autonomous agent to react quickly to unfolding events, improving the robustness of the agent. Prior work developed methods for dynamically dispatching Simple Temporal Networks, and further research enriched the expressiveness of the plans executives could handle, including discrete choices, which are the focus of this work. However, in some approaches to date, these additional choices induce significant storage or latency requirements to make flexible execution possible. Drake is designed to leverage the low latency made possible by a preprocessing step called compilation, while avoiding high memory costs through a compact representation. We leverage the concepts of labels and environments, taken from prior work in Assumption-based Truth Maintenance Systems (ATMS), to concisely record the implications of the discrete choices, exploiting the structure of the plan to avoid redundant reasoning or storage. Our labeling and maintenance scheme, called the Labeled Value Set Maintenance System, is distinguished by its focus on properties fundamental to temporal problems, and, more generally, weighted graph algorithms. In particular, the maintenance system focuses on maintaining a minimal representation of non-dominated constraints. We benchmark Drake's performance on random structured problems, and find that Drake reduces the size of the compiled representation by a factor of over 500 for large problems, while incurring only a modest increase in run-time latency, compared to prior work in compiled executives for temporal plans with discrete choices.

Journal ArticleDOI
TL;DR: This work formalizes what it means for a cloning manipulation to be successful, and characterize the preference profiles for which a successful cloning manipulation exists, and considers the model where there is a cost associated with producing each clone.
Abstract: We consider the problem of manipulating elections by cloning candidates. In our model, a manipulator can replace each candidate c by several clones, i.e., new candidates that are so similar to c that each voter simply replaces c in his vote with a block of these new candidates, ranked consecutively. The outcome of the resulting election may then depend on the number of clones as well as on how each voter orders the clones within the block. We formalize what it means for a cloning manipulation to be successful (which turns out to be a surprisingly delicate issue), and, for a number of common voting rules, characterize the preference profiles for which a successful cloning manipulation exists. We also consider the model where there is a cost associated with producing each clone, and study the complexity of finding a minimum-cost cloning manipulation. Finally, we compare cloning with two related problems: the problem of control by adding candidates and the problem of possible (co)winners when new alternatives can join.

Journal ArticleDOI
TL;DR: In this paper, the authors provide an investigation in terms of different semantics proposed for abstract argumentation frameworks, a non-monotonic yet simple formalism which received increasing interest within the last decade.
Abstract: Translations between different nonmonotonic formalisms always have been an important topic in the field, in particular to understand the knowledge-representation capabilities those formalisms offer. We provide such an investigation in terms of different semantics proposed for abstract argumentation frameworks, a nonmonotonic yet simple formalism which received increasing interest within the last decade. Although the properties of these different semantics are nowadays well understood, there are no explicit results about intertranslatability. We provide such translations wrt. different properties and also give a few novel complexity results which underlie some negative results.

Journal ArticleDOI
TL;DR: Monte-Carlo Restricted Nash Response (MCRNR), a sample-based algorithm for the computation of restricted Nash strategies that are robust bestresponse strategies that exploit non-NE opponents more than playing a NE and are not (overly) exploitable by other strategies.
Abstract: This article discusses two contributions to decision-making in complex partially observable stochastic games. First, we apply two state-of-the-art search techniques that use Monte-Carlo sampling to the task of approximating a Nash-Equilibrium (NE) in such games, namely Monte-Carlo Tree Search (MCTS) and Monte-Carlo Counterfactual Regret Minimization (MCCFR). MCTS has been proven to approximate a NE in perfect-information games. We show that the algorithm quickly finds a reasonably strong strategy (but not a NE) in a complex imperfect information game, i.e. Poker. MCCFR on the other hand has theoretical NE convergence guarantees in such a game. We apply MCCFR for the first time in Poker. Based on our experiments, we may conclude that MCTS is a valid approach if one wants to learn reasonably strong strategies fast, whereas MCCFR is the better choice if the quality of the strategy is most important. Our second contribution relates to the observation that a NE is not a best response against players that are not playing a NE. We present Monte-Carlo Restricted Nash Response (MCRNR), a sample-based algorithm for the computation of restricted Nash strategies. These are robust bestresponse strategies that (1) exploit non-NE opponents more than playing a NE and (2) are not (overly) exploitable by other strategies. We combine the advantages of two state-of-the-art algorithms, i.e. MCCFR and Restricted Nash Response (RNR). MCRNR samples only relevant parts of the game tree. We show that MCRNR learns quicker than standard RNR in smaller games. Also we show in Poker that MCRNR learns robust best-response strategies fast, and that these strategies exploit opponents more than playing a NE does.

Journal ArticleDOI
TL;DR: A novel solution for reducing node evaluations in heuristic planning based on machine learning is presented, and it is revealed that the heuristic planner using the learned classifiers solves larger problems than state-of-the-art planners.
Abstract: Current evaluation functions for heuristic planning are expensive to compute. In numerous planning problems these functions provide good guidance to the solution, so they are worth the expense. However, when evaluation functions are misguiding or when planning problems are large enough, lots of node evaluations must be computed, which severely limits the scalability of heuristic planners. In this paper, we present a novel solution for reducing node evaluations in heuristic planning based on machine learning. Particularly, we define the task of learning search control for heuristic planning as a relational classification task, and we use an off-the-shelf relational classification tool to address this learning task. Our relational classification task captures the preferred action to select in the different planning contexts of a specific planning domain. These planning contexts are defined by the set of helpful actions of the current state, the goals remaining to be achieved, and the static predicates of the planning task. This paper shows two methods for guiding the search of a heuristic planner with the learned classifiers. The first one consists of using the resulting classifier as an action policy. The second one consists of applying the classifier to generate lookahead states within a Best First Search algorithm. Experiments over a variety of domains reveal that our heuristic planner using the learned classifiers solves larger problems than state-of-the-art planners.

Journal ArticleDOI
TL;DR: A new diagnostic framework employing four new techniques, which scales to much larger systems with good performance in terms of diagnostic cost, and a novel cost estimation function which can be used to choose an abstraction of the system that is more likely to provide optimal average cost.
Abstract: When a system behaves abnormally, sequential diagnosis takes a sequence of measurements of the system until the faults causing the abnormality are identified, and the goal is to reduce the diagnostic cost, defined here as the number of measurements. To propose measurement points, previous work employs a heuristic based on reducing the entropy over a computed set of diagnoses. This approach generally has good performance in terms of diagnostic cost, but can fail to diagnose large systems when the set of diagnoses is too large. Focusing on a smaller set of probable diagnoses scales the approach but generally leads to increased average diagnostic costs. In this paper, we propose a new diagnostic framework employing four new techniques, which scales to much larger systems with good performance in terms of diagnostic cost. First, we propose a new heuristic for measurement point selection that can be computed efficiently, without requiring the set of diagnoses, once the system is modeled as a Bayesian network and compiled into a logical form known as d-DNNF. Second, we extend hierarchical diagnosis, a technique based on system abstraction from our previous work, to handle probabilities so that it can be applied to sequential diagnosis to allow larger systems to be diagnosed. Third, for the largest systems where even hierarchical diagnosis fails, we propose a novel method that converts the system into one that has a smaller abstraction and whose diagnoses form a superset of those of the original system; the new system can then be diagnosed and the result mapped back to the original system. Finally, we propose a novel cost estimation function which can be used to choose an abstraction of the system that is more likely to provide optimal average cost. Experiments with ISCAS-85 benchmark circuits indicate that our approach scales to all circuits in the suite except one that has a flat structure not susceptible to useful abstraction.

Journal ArticleDOI
TL;DR: A more appropriate notion of basic contraction is defined for the Horn case, influenced by the convexity property holding for full propositional logic and which is referred to as infra contraction, which shows that the construction method for Horn contraction for belief sets based on infra remainder sets corresponds exactly to Hansson's classical kernel contraction for beliefs sets, when restricted to Horn logic.
Abstract: Standard belief change assumes an underlying logic containing full classical propositional logic. However, there are good reasons for considering belief change in less expressive logics as well. In this paper we build on recent investigations by Delgrande on contraction for Horn logic. We show that the standard basic form of contraction, partial meet, is too strong in the Horn case. This result stands in contrast to Delgrande's conjecture that orderly maxichoice is the appropriate form of contraction for Horn logic. We then define a more appropriate notion of basic contraction for the Horn case, influenced by the convexity property holding for full propositional logic and which we refer to as infra contraction. The main contribution of this work is a result which shows that the construction method for Horn contraction for belief sets based on our infra remainder sets corresponds exactly to Hansson's classical kernel contraction for belief sets, when restricted to Horn logic. This result is obtained via a detour through contraction for belief bases. We prove that kernel contraction for belief bases produces precisely the same results as the belief base version of infra contraction. The use of belief bases to obtain this result provides evidence for the conjecture that Horn belief change is best viewed as a `hybrid' version of belief set change and belief base change. One of the consequences of the link with base contraction is the provision of a representation result for Horn contraction for belief sets in which a version of the Core-retainment postulate features.

Journal ArticleDOI
TL;DR: This paper summarizes recent work reported at ICAPS on applying artificial intelligence techniques to the control of production printing equipment and suggests that this general architecture may prove useful as more intelligent systems operate in continual, online settings.
Abstract: We present a case study of artificial intelligence techniques applied to the control of production printing equipment. Like many other real-world applications, this complex domain requires high-speed autonomous decision-making and robust continual operation. To our knowledge, this work represents the first successful industrial application of embedded domain-independent temporal planning. Our system handles execution failures and multi-objective preferences. At its heart is an on-line algorithm that combines techniques from state-space planning and partial-order scheduling. We suggest that this general architecture may prove useful in other applications as more intelligent systems operate in continual, on-line settings. Our system has been used to drive several commercial prototypes and has enabled a new product architecture for our industrial partner. When compared with state-of-the-art off-line planners, our system is hundreds of times faster and often finds better plans. Our experience demonstrates that domain-independent AI planning based on heuristic search can flexibly handle time, resources, replanning, and multiple objectives in a high-speed practical application without requiring hand-coded control knowledge.