scispace - formally typeset
Search or ask a question

Showing papers on "Admissible heuristic published in 2003"


Proceedings Article
09 Aug 2003
TL;DR: This paper introduces and analyzes three new HS/DP algorithms, one of which approximates the latter by enforcing the consistency of the value function over the likely' reachable states only, and leads to great time and memory savings, with no much apparent loss in quality, when transitions have probabilities that differ greatly in value.
Abstract: Recent algorithms like RTDP and LAO* combine the strength of Heuristic Search (HS) and Dynamic Programming (DP) methods by exploiting knowledge of the initial state and an admissible heuristic function for producing optimal policies without evaluating the entire space. In this paper, we introduce and analyze three new HS/DP algorithms. A first general algorithm schema that is a simple loop in which 'inconsistent' reachable states (i.e., with residuals greater than a given c) are found and updated until no such states are found, and serves to make explicit the basic idea underlying HS/DP algorithms, leaving other commitments aside. A second algorithm, that builds on the first and adds a labeling mechanism for detecting solved states based on Tarjan's strongly-connected components procedure, which is very competitive with existing approaches. And a third algorithm, that approximates the latter by enforcing the consistency of the value function over the likely' reachable states only, and leads to great time and memory savings, with no much apparent loss in quality, when transitions have probabilities that differ greatly in value.

120 citations


01 Jan 2003
TL;DR: It is demonstrated that looking ahead deeper can actually decrease the chances of taking the optimal action as well as the overall solution quality, known as lookahead pathology, and that selecting lookahead depth dynamically can greatly improve the solution quality.
Abstract: Admissible and consistent heuristic functions are usually preferred in single-agent heuristic search as they guarantee optimal solutions with complete search methods such as A* and IDA*. Real-time problems, however, frequently make a complete search intractable due to space and/or time limitations. For instance, a path-planning agent in a real-time strategy game may need to take an action before its complete search has the time to finish. In such cases, incomplete search techniques (such as RTA*, SRTA*, RTDP, DTA*) can be used. Such algorithms conduct a limited ply lookahead and then evaluate the states envisioned using a heuristic function. The action selected on the basis of such evaluations can be suboptimal due to the incompleteness of search and inaccuracies in the heuristic. It is usually believed that deeper lookahead increases the chances of taking the optimal action. We demonstrate that it is not necessarily the case and that selecting lookahead depth dynamically can significantly improve the performance. 1 Real-time Heuristic Search The basic framework of real-time heuristic search is as follows. The agent traverses a state space by taking an action in each state. Its goal is to reach one of the predetermined “goal” states. The standard assumption is that the state space, the action set, and the set of goal states are fixed. Thus, each problem instance can be described by the agent’s initial state. Note that such search framework can be easily extended to more general decision-making. One way of doing so is via Markov Decision Processes as it is usually done in the field of Reinforcement Learning [1]. Throughout the paper, we will assume that the agent has a perfect domain model of the environment but cannot always tell which states are better than others. Thus, it is forced to use a heuristic estimate (henceforth heuristic function) of the state quality or value. Often such heuristic function is an estimate on the distance between the state in question and the closest goal state. Complete search methods such as A* [8] and IDA* [10] produce optimal solutions when based on an admissible heuristic function. The primary drawbacks are the exponential running time and the necessity to wait until the search completes before the first action can be taken [11]. This limits the applicability of complete search in practice as the deliberation time per action can be severely limited [9], the domain model can be expensive, the goal states can be difficult to recognize [13]. Consequently, despite numerous advances in improving heuristic functions [12, 21], incomplete real-time/on-line search methods remain the practical choice for complex real-life problems. Various real-time search methods have been proposed including: RTA* [11], RTDP [1], SRTA* and DTA* [22]. Such algorithms base their decisions on heuristic information collected from a partial tree expansion (lookahead) prior to reaching the goal state. Since the heuristic function is generally inaccurate and the search is incomplete, suboptimal solutions can be produced even with admissible and consistent heuristics. It is widely believed that looking ahead deeper improves the solution quality (e.g., [11]). Consequently, a considerable amount of effort has been put into increasing the lookahead depth by using selective search (search extensions) and hardware/software optimizations. In this paper we demonstrate that looking ahead deeper can actually decrease the chances of taking the optimal action as well as the overall solution quality. This phenomenon is known as lookahead pathology. Additionally, we show that selecting the lookahead depth dynamically can greatly improve the solution quality. 2 Related Work & Our Novel Contributions Lookahead pathologies within the mini-max search in two-player games have been investigated extensively in the past. In [15, 16, 2, 3, 4, 5, 17], the primary cause of pathologies was deemed to be the independence of heuristic values of the leaf nodes. Such games were called non-incremental. Large branching factors were also considered contributing to a pathology. Later, [14] added non-inertness (i.e., a constant branching factor) to the list of suspects. More recent work considered single-agent heuristic search and demonstrated that pathologies are possible even with admissible and consistent heuristic functions [6]. In this paper we extend the previous efforts in the following ways: (i) several performance metrics (e.g., overall solution quality, total running time, etc.) are introduced for single-agent heuristic search and lookahead pathologies are shown for all of them; (ii) the demonstration is carried out in the standard testbed of the 8 puzzle as opposed to more contrived and artificial environments; (iii) finally we show that using meta-level control of the search via selecting the lookahead depth dynamically has a significant potential. 3 Lookahead Pathologies Throughout the paper we will assume the standard RTA* real-time heuristic search model [11] as it is a general and modular technique allowing for further extensions (e.g., SRTA* and DTA* [22] or RTDP [1]). Thus, we consider a single-agent heuristic search in a discrete state domain with a finite number of deterministic actions. The states (set S) and actions (set A) form a directed graph with certain specified vertices representing the goal states. The edges (actions) are weighed with action costs: c : A→ R. The agent is provided with a perfect domain model: δ : S ×A→ S. We define the true distance-to-goal function h∗(s) as the sum of action costs along the shortest path from state s to the closest goal state. Generally speaking, the agent uses an approximation h to the unavailable h∗. The approximation is typically inaccurate insomuch as: ∃s ∈ S [h∗(s) 6= h(s)] . For a fixed starting state s, function g(s′) is defined as the sum of action costs along the shortest path from s to s′. Finally, the sum of h or h∗ and g is typically denoted by f or f∗. It easy to see that f∗ remains constant along any optimal path from a fixed state s to the closest goal. Also note that, for any state s′, action a1 is inferior to action a2 iff f∗(δ(s′, a1)) > f∗(δ(s′, a2)). Located in the state s, the agent can use its perfect model δ to predict the states it will get to upon taking various sequences of actions. Two depth 2 lookahead trees are illustrated in Figure 1. RTA* defines policy π(s, p) as follows: (i) consider s’s immediate children {ci}; (ii) for each child ci construct the lookahead search trees of p plies deep by envisioning terminal states of all action sequences of p actions (whenever possible); (iii) evaluate the leaf nodes of each lookahead tree rooted in ci using the f function and select the minimum-valued state which becomes the lookaheadaugmented value of the ci; (iv) output the single action leading to the child ci with the minimum lookahead-augmented f -value (resolve ties randomly). Additionally, a hash table of all previously visited states is maintained. The f -value of a state x is drawn from the table whenever x has been previously visited. After each move, the hash-table value of the state s just left is updated with the second-best f -value among its children ci. This mechanism prevents infinite looping in RTA* under certain s

20 citations


Dissertation
01 Jan 2003
TL;DR: This dissertation describes an admissible heuristic for AO* that enables it to prune large parts of the search space and shows that policies with better performance on an independent test set are learned when the AO*, method is regularized in order to reduce overfitting.
Abstract: In its simplest form, the process of diagnosis is a decision-making process in which the diagnostician performs a sequence of tests culminating in a diagnostic decision. For example, a physician might perform a series of simple measurements (body temperature, weight, etc.) and laboratory measurements (white blood count, CT scan, MRI scan, etc.) in order to determine the disease of the patient. A diagnostic policy is a complete description of the decision-making actions of a diagnostician under all possible circumstances. This dissertation studies the problem of learning diagnostic policies from training examples. An optimal diagnostic policy is one that minimizes the expected total cost of diagnosing a patient, where the cost is composed of two components: (a) measurement costs (the costs of performing various diagnostic tests) and (b) misdiagnosis costs (the costs incurred when the patient is incorrectly diagnosed). The optimal policy must perform diagnostic tests until further measurements do not reduce the expected total cost of diagnosis. The dissertation investigates two families of algorithms for learning diagnostic policies: greedy methods and methods based on the AO* algorithm for systematic search. Previous work in supervised learning constructed greedy di agnostic policies that either ignored all costs or considered only measurement costs or only misdiagnosis costs. This research recognizes the practical importance of costs incurred by performing measurements and by making incorrect diagnoses and studies the tradeoff between them. This dissertation develops improved greedy methods. It also introduces a new family of learning algorithms based on systematic search. Systematic search has previously been regarded as computationally infeasible for learning diagnostic policies. However, this dissertation describes an admissible heuristic for AO* that enables it to prune large parts of the search space. In addition, the dissertation shows that policies with better performance on an independent test set are learned when the AO* method is regularized in order to reduce overfitting. Experimental studies on benchmark data sets show that in most cases the systematic search methods give better diagnostic policies than the greedy methods. Hence, these AO*-based methods are recommended for learning diagnostic policies that seek to minimize the expected total cost of diagnosis.

9 citations


01 Jan 2003
TL;DR: In this article, the authors investigated the feasibility of applying a guaranteed-accuracy heuristic learning algorithm to address the traveling salesman problem, where the tour configuration can be changed as the tour is built and a learning threshold approach was introduced to allow solution with known quality to be found.
Abstract: This research investigates the feasibility of applying a guaranteed-accuracy heuristic learning algorithm to address the traveling salesman problem. The research focuses on tour construction. The advantage of tour construction heuristic is its simplicity in concept. However, tour construction heuristic often results in a sub-optimal solution and the tour often needs to be improved after it is built by changing the tour configuration until a better solution is found. This paper will describe the application of a real time admissible heuristic learning algorithm that allows the tour configuration to change as the tour is built. A learning threshold approach will also be introduced to allow solution with known quality to be found. 1. Introduction In traveling salesman problem (TSP), a salesman is to complete a tour of a number of cities by visiting each city exactly once, and the tour must begin and end with the same city so that the total distance travelled is minimum. There are many industrial optimisation problems that can be formulated as TSP. Examples of such applications include transportation and logistics applications, scheduling and timetabling problems. TSP is inherently intractable. It belongs to a group of problems known as NP -complete in which no efficient algorithm could be constructed to find optimal solution in polynomial time (Garey and Johnson, 1979). The computation time required to solve the n-city TSP using an exhaustive enumerative approach becomes increasingly high as the size of the problem increases. The commonly used approach to solve TSP is the composite approach (Lawler et. al, 1985). In the composite approach, the tour is first constructed using tour construction heuristic and then tour improvement heuristic is applied to obtain a shorter tour. The main feature of tour improvement heuristics is its ability to change tour configuration using the edge-exchange process so that a shorter tour is found. However, in tour construction heuristics, where the tour is built from scratch and the city is added one at a time until a complete tour is found, part of the tour that is already built remains unchanged throughout the tour construction process and no attempt is made to change or undo part of the tour that has been built. Reinelt (1994) points out that in gen eral, constructing a tour using only tour construction heuristics alone will not lead to optimal solution. It often needs to be improved using tour improvement heuristics such as 2 -opt or 3 -opt heuristics, which made improvement to the tour by exchanging edges until a shorter tour is found. In addition, tour construction heuristic often relies on local knowledge to construct a tour. The selection and insertion criteria in various tour construction heuristics often rely on local distance information to deter mine which city is to be selected and added to the tour. This research is based on the following rationale: “If the tour configuration of the partially completed tour can be changed during the tour construction process, similar to the approach of tour improvement heuristics, then it is more likely to lead to optimal solution. This way