scispace - formally typeset
Search or ask a question
Topic

Trial and error

About: Trial and error is a research topic. Over the lifetime, 486 publications have been published within this topic receiving 13531 citations. The topic is also known as: generate and test & guess and check.


Papers
More filters
Journal ArticleDOI
TL;DR: A proposed schema and some detailed specifications for constructing a learning system by means of programming a computer are given, trying to separate learning processes and problem-solving techniques from specific problem content in order to achieve generality.
Abstract: This paper reports on a proposed schema and gives some detailed specifications for constructing a learning system by means of programming a computer. We have tried to separate learning processes and problem-solving techniques from specific problem content in order to achieve generality, i.e., in order to achieve a system capable of performing in a wide variety of learning and problem-solving situations. Behavior of the system is determined by both a direct and an indirect means. The former involves detailed, explicit specification of responses or response patterns in the form of built-in programs. The indirect means is by programs representing three mechanisms: a “community unit” (a program-providing mechanism), a planning mechanism, and an induction mechanism. These mechanisms have in common the following features: (1) a directly given repertory of response patterns; (2) general and less explicitly specified decision making rules and hierarchically distributed authority for decision making; (3) an ability to delegate some control over the system's behavior to the environment; and (4) a self-modifying ability which allows the decision-making rules and the repertory of response patterns to adapt and grow. In Part I of this paper, the community unit is described and an illustration of its operation is given. It is presented in a schematized framework as a team of routines connected by first and second-order feedback loops. The function of the community unit is to provide higher-level programs (its environment or customers) with programs capable of performing requested tasks, or to perform a customer-stipulated task by executing a program. If the community unit does not have a ready-made program in stock to fill a particular request, internal programming will be performed, i.e., the community unit will have to construct a program, and debug it, before outputting or executing it. The primary purpose of internal programming is to assist higher-level programs in performing tasks for which detailed preplanning by an external programmer is either impossible or impractical. Some heuristics are suggested for enabling the community unit to search for a usable sequence of operations more efficiently than if it were to search simply by exhaustive or random trial and error. These heuristics are of a step-by-step nature. For complex problems, however, such step-by-step heuristics alone will fail unless there is also a mechanism for analyzing problem structure and placing guideposts on the road to the goal. A planning mechanism capable of doing this is proposed in Part II. Under the control of a higher-level program which specifies the level of detail required in a plan being developed, this planning mechanism is to break up problems into a hierarchy of subproblems each by itself presumably easier to solve than the original problem. To manage classes of problems and to make efficient use of past experience, an induction mechanism is proposed in Part II. An illustration is given of the induction mechanism solving a specific sequence of tasks. The system is currently being programmed and tested in IPL-V on the Philco 2000 computer. The current stage of the programming effort is reported in an epilogue to Part II.

3,719 citations

Book ChapterDOI
01 Jun 1990
TL;DR: This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning.
Abstract: This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned model of the world. In this paper, I present and show results for two Dyna architectures. The Dyna-PI architecture is based on dynamic programming's policy iteration method and can be related to existing AI ideas such as evaluation functions and universal plans (reactive systems). Using a navigation task, results are shown for a simple Dyna-PI system that simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. The Dyna-Q architecture is based on Watkins's Q-learning, a new kind of reinforcement learning. Dyna-Q uses a less familiar set of data structures than does Dyna-PI, but is arguably simpler to implement and use. We show that Dyna-Q architectures are easy to adapt for use in changing environments.

1,592 citations

Proceedings ArticleDOI
01 May 2017
TL;DR: This article proposed an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and generalizes across targets and scenes.
Abstract: Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization. To address the second issue, we propose the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently. We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment.

1,394 citations

Journal ArticleDOI
TL;DR: In this article, two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance.

632 citations

Book ChapterDOI
01 Aug 2005
TL;DR: A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria, and demonstrates the different ingredients of the DMP approach in various examples.
Abstract: This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Model-based control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including self-improvement of the movement patterns towards energy efficiency through resonance tuning.

381 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
81% related
Software
130.5K papers, 2M citations
78% related
Fuzzy logic
151.2K papers, 2.3M citations
77% related
The Internet
213.2K papers, 3.8M citations
77% related
Optimization problem
96.4K papers, 2.1M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202134
202034
201937
201820
201720