scispace - formally typeset
Search or ask a question

Showing papers on "Control reconfiguration published in 1982"


Journal ArticleDOI
TL;DR: This paper discusses elections and reorganizations of active nodes in a distributed computing system after a failure, and two types of reasonable failure environments are studied.
Abstract: After a failure occurs in a distributed computing system, it is often necessary to reorganize the active nodes so that they can continue to perform a useful task. The first step in such a reorganization or reconfiguration is to elect a coordinator node to manage the operation. This paper discusses such elections and reorganizations. Two types of reasonable failure environments are studied. For each environment assertions which define the meaning of an election are presented. An election algorithm which satisfies the assertions is presented for each environment.

647 citations


PatentDOI
TL;DR: In this article, the authors monitor the control, in particular, software crashes in a multiprocessor machine control system to prevent machine malfunctions and monitor key operations such as the number of tasks to be completed by the control.

54 citations


Patent
18 Oct 1982
TL;DR: In this article, a reconfiguration of storage elements in order to permit the removal of one or more of the elements for servicing or other reasons is described, where the storage element that is to be taken off line contains material that is crucial to the continued operation of the system.
Abstract: This is a system which is used to perform reconfiguration of storage elements in order to permit removal of one or more of the elements for servicing or other reasons. If a storage element that is to be taken off line contains material that is crucial to the continued operation of the system, that material is copied to appropriate areas in other storage elements. After all crucial material has been copied to alternate locations, the original storage element can be taken off line for servicing or other purposes.

47 citations


Patent
03 Jun 1982
TL;DR: In this paper, a system for synchronizing remote or colocated oscillators is constructed with redundancy enabling a high degree of fault toleration, where each oscillator is part of a terminal which includes reconfiguration circuitry and each, in the absence of faults, is connected to each of the other terminals.
Abstract: A system for synchronizing remote or colocated oscillators is constructed with redundancy enabling a high degree of fault toleration. In the various embodiments of the system there is no predetermined hierarchy amongst the oscillators in contrast to the prior art master and slave type of synchronization. Averaging is not involved and the system settles to a state in which the oscillators are locked to one master, which is not predetermined. Each oscillator is part of a terminal which includes reconfiguration circuitry, and each, in the absence of faults, is connected to each of the other terminals. The terminals attempt to lock to another oscillator and reconfigure in accordance with an algorithm until stable synchronization is achieved. In one embodiment the properties of a phaselocked loop are utilized in a novel way by incorporating a variable delay in its feed back path. Using this circuitry enables one embodiment to achieve synchronization even when only indirect paths exist to the assumed master, and in the system temporary disconnection for reconfiguration is not necessary.

22 citations


Journal ArticleDOI
Clarke1, Nikolaou2
TL;DR: This paper investigates strategies for dynamically reconfiguring shared memory multiprocessor systems that are subject to common memory faults and unpredictable processor deaths and presents a general distributed algorithm which enables the processors in such a system to exchange the local information needed to reach a consensus on system reconfiguration.
Abstract: In this paper, we investigate strategies for dynamically reconfiguring shared memory multiprocessor systems that are subject to common memory faults and unpredictable processor deaths. These strategies aim at determining a communication page, i.e., a page of common memory that can be used by a group of processors for storing crucial common resources such as global locks for synchronization and global data structures for voting algorithms. To ensure system reliability, the reconfiguration strategies must be distributed so that each processor independently arrives at exactly the same choice. This type of reconfiguration strategy is currently used in the STAGE operating system on the PLURIBUS multiprocessor [5]. We analyze the weak points of the PLURIBUS algorithm and examine alternative strategies satisfying optimization criteria such as maximization of the number of processors and the number of common memory pages in the reconfigured system. We also present a general distributed algorithm which enables the processors in such a system to exchange the local information that is needed to reach a consensus on system reconfiguration.

15 citations


Proceedings ArticleDOI
07 Jun 1982
TL;DR: Concurrent reconfiguration techniques that perform fast reconfigurations of a multicomputer network into the following network structures: K-rooted trees, stars, and rings with selectable periods are introduced.
Abstract: This paper introduces concurrent reconfiguration techniques that perform fast reconfiguration of a multicomputer network into the following network structures: K-rooted trees, stars, and rings with selectable periods. These structures prove to be very efficient for high-speed, real-time applications. The techniques introduced are based on shift register theory and are performed by special shift registers residing in each network node and called shift registers with variable bias.The technique discussed in this paper are implemented in the system with dynamic architecture that is now under construction by Dynamic Computer Architecture, Inc.The time of the network reconfiguration equals that of one clock period, since to perform reconfiguration into a new network structure, each network node should execute only two logical operations---one-bit shift and mod 2 addition.

14 citations


Journal ArticleDOI
TL;DR: The predominant existing software specification and implementation techniques for sequential control are not adequate for the creation of correct software of the complexity required for redundant systems.
Abstract: Redundant control systems require more than a single redundant construct to serve the six basic functions of fault tolerance: test, detection, diagnosis, masking, reconfiguration, and recovery. Software usually constitutes or supports one or more such constructs. Additionally, software must be correct, since it is seldom, if ever, protected by redundancy. A redundant sequential control system requires intricate software constructs. The predominant existing software specification and implementation techniques for sequential control are not adequate for the creation of correct software of the complexity required for redundant systems. This complexity is illustrated by an example.

11 citations


Journal ArticleDOI
TL;DR: The use of fault tolerance for the computer control to provide a level of reliability previously unachievable with standard computer control systems is described.
Abstract: This paper discusses the operation of a chemical reactor and particularly the need for highly reliable instrumentation and control for critical processes. It describes the use of fault tolerance for the computer control to provide a level of reliability previously unachievable with standard computer control systems. The provision of highly reliable interface equipment to the reactor itself is described and approaches are presented for solving the problem of faults in the sensors and actuators. The paper discusses a specific example chemical reactor and the benefits that are achievable using a fault tolerant control computer system.

10 citations


01 Jan 1982
TL;DR: The authors show how the proposed system lends itself to an organic and well-structured organization of fault-treatment, and in its framework, several reconfiguration strategies of varying complexity are examined.
Abstract: Distributed systems are thought to be one of the best choices for the design of fault-tolerant systems. In this paper, the authors refer to a particular multimicroprocessor system, whose design embodies features intended to enhance fault-tolerance. A general structure for fault-handling is given; in its framework, several reconfiguration strategies of varying complexity are examined; their relative advantages and disadvantages are briefly discussed. The authors show how the proposed system lends itself to an organic and well-structured organization of fault-treatment. 15 references.

9 citations


Journal ArticleDOI
01 Apr 1982
TL;DR: The structure of the network controller given herein may offer guidance for development of controllers for other reconfigurable network architectures.
Abstract: The problems of resource allocation, configuration and reconfiguration and network control must be solved before reconfigurable array computers can be effectively utilized The interconnection networks proposed for these systems vary so that there has been no common or optimal solution proposed to these problems This paper defines and describes the objectives, design, implementation and use of a network controller for a reconfigurable array computer, the Texas Reconfigurable Array Computer (TRAC) The objectives for the network controller are defined by management of the system state, the requirements of the operating system for functionality and the interface the network presents to the operating system These objectives may be expected to have at least some commonality across most reconfigurable network architectures The structure of the network controller given herein may offer guidance for development of controllers for other reconfigurable network architectures

6 citations


Patent
04 Jun 1982
TL;DR: In this paper, a system for synchronizing remote or colocated oscillators is constructed with redundancy enabling a high degree of fault toleration, where each oscillator is part of a terminal which includes reconfiguration circuitry and each, in the absence of faults, is connected to each of the other terminals.
Abstract: A system for synchronising remote or colocated oscillators is constructed with redundancy enabling a high degree of fault toleration. In the various embodiments of the system there is no predetermined hierarchy amongst the oscillators in contrast to the prior art master and slave type of synchronisation. Averaging is not involved and the system settles to a state in which the oscillators are locked to one master, which is not predetermined. Each oscillator is part of a terminal which includes reconfiguration circuitry, and each, in the absence of faults, is connected to each of the other terminals. The terminals attempt to lock to another oscillator and reconfigure in accordance with an algorithm until stable synchronisation is achieved. In one embodiment the properties of a phase-locked loop are utilised in a novel way by incorporating a variable delay in its feed back path. Using this circuitry enables one embodiment to achieve synchronisation even when only indirect paths exist to the assumed master, and in the system temporary disconnection for reconfiguration is not necessary.

Patent
11 Jan 1982
TL;DR: In this paper, a microprocessor-controlled unit and auxiliary circuits specialized in diagnostic, without requiring the generation of artificial test traffic, are used to perform diagnostic, trouble localization and reconfiguration.
Abstract: Interconnection unit equipped with a microprocessor- controlled unit and auxiliary circuits specialized in diagnostic, allowing the functions of diagnostic, trouble localization and reconfiguration to be carried out by the single building blocks, without requiring the generation of artificial test traffic.

01 Feb 1982
TL;DR: In this article, the authors discuss several performance and implementation aspects of the operation of reaction wheel control systems which have their own internal feedback control system, and the reduction in wheel induced spacecraft attitude noise by employing wheel velocity or wheel attitude feedback is analysed in the continuous domain.
Abstract: The increasing power of on-board processors allows more freedom in the design and operation of reaction wheel control systems. This report discusses several performance and implementation aspects of the operation of wheel systems which have their own internal feedback control system. A model following control concept is found to be superior. The reduction in wheel induced spacecraft attitude noise by employing wheel velocity or wheel attitude feedback is analysed in the continuous domain. The digital wheel controller implementation aspects discussed include conventional z-plane PD type designs, use of a Kalman filter for wheel state observation, controller pole placement and linear quadratic optimal control design. For most designs a wheel sensor of the pulse counting type is assTomed. Finally, autonomous wheel failure detection and reconfiguration is dealt with.

Journal ArticleDOI
TL;DR: There is a need of research effort in the investigation of algorithms and system structures that make error/failure detection, reconfiguration, recovery, and restart of a system feasible with the least amount of interruptions.
Abstract: TO provide continuity of operations in automated systems, we need to develop techniques that can make them reliable. Many systems such as used in space programs, air traffic control, nuclear plant monitors, ballistic missile defense, etc., demand robust operation. In the past, research efforts have focused on the design and implementation of distributed systems used in such applications. We foresee a need of research effort in the investigation of algorithms and system structures that make error/failure detection, reconfiguration, recovery, and restart of a system feasible with the least amount of interruptions.

Journal Article
TL;DR: The PAVE LOW III helicopter as discussed by the authors is a modified HH-53H helicopter that has a low altitude, below 30.48 m (100 ft), and the desired night flying configuration is for the pilot to wear night vision goggles (NVGs) to fly the aircraft while the copilot, without NVGs, observes the video display and monitors the aircraft instruments.
Abstract: The PAVE LOW III aircraft is a modified HH-53H helicopter that has a low altitude--below 30.48 m (100 ft)--night/day rescue mission. The desired night flying configuration is for the pilot to wear night vision goggles (NVGs) to fly the aircraft while the copilot, without NVGs, observes the video display and monitors the aircraft instruments. The problems of NVG incompatibility in the cockpit were successfully countered using several light control techniques. The light control modifications were evaluated on the ground in the PAVE LOW III helicopter at Kirtland AFB in April, 1980, by PAVE LOW instructor pilots. The evaluation results were extremely positive. Language: en

01 Jan 1982
TL;DR: This dissertation investigates strategies for dynamically reconfiguring shared memory multiprocessor systems that are subject to common memory faults and unpredictable processor deaths and deals with fault-masking algorithms as applied to the development of network protocols with an underlying communication medium that may reorder, duplicate or lose messages.
Abstract: Depending upon the philosophy used to implement fault-tolerant systems, one can distinguish two classes of algorithms: reconfiguration algorithms and fault masking algorithms. The precise statement and analysis of the problems and the underlying assumptions associated with these classes of algorithms is the subject of this dissertation. The first part of the thesis investigates strategies for dynamically reconfiguring shared memory multiprocessor systems that are subject to common memory faults and unpredictable processor deaths. These strategies aim at determining a communication page, i.e., a page of common memory that can be used by a group of processors for storing crucial common resources such as global locks for synchronization and global data structures for voting algorithms. To insure system reliability, the reconfiguration strategies must be distributed so that each processor independently arrives at exactly the same choice. This type of reconfiguration strategy is currently used in the STAGE operating system on the PLURIBUS multiprocessor {24}. We analyze the weak points of the PLURIBUS algorithm and examine alternative strategies satisfying optimization criteria such as maximization of the number of processors and the number of common memory pages in the reconfigured system. We also present a general distributed algorithm which enables the processors in such a system to exchange the local information that is needed to reach a consensus on system reconfiguration. In the second part of the thesis, we deal with fault-masking algorithms as applied to the development of network protocols with an underlying communication medium that may reorder, duplicate or lose messages. In chapter (3) we present a simple network, whose communication medium is assumed to be reliable, and develop a strategy for the remote submission and processing of requests. We also show how to formally specify and verify the network behavior. In the final chapter we describe a more complex network model where the communication medium is no longer assumed to be reliable. We then show that despite the reordering, duplication or loss of messages, all requests are eventually processed exactly once at the remote site and that responses are received in the right order at their submission site.

01 Jan 1982
TL;DR: The fully reconfigurable multimicroprocessor is an experimental configuration designed specifically as a research tool for implementing and evaluating parallel-processing algorithms on various multiprocessor architectures under development at the Los Alamos National Laboratory.
Abstract: The fully reconfigurable multimicroprocessor is an experimental configuration designed specifically as a research tool for implementing and evaluating parallel-processing algorithms on various multiprocessor architectures. Basically, the system is a shared-memory MIMD (multiple instruction-multiple data stream) machine that supports reconfiguration between processor and memory nodes to permit experimentation on architectures sharing common memory, networks of processors with only local memory, etc. This experimental computer system is currently under development within the Computing Division at the Los Alamos National Laboratory.

Journal ArticleDOI
D. Russell1
TL;DR: A simple uniform mechanism for managing simple and reliable virtual circuits and for coping with failures of virtual circuits in a local area computer network is described.
Abstract: A virtual circuit (or path) is a full-duplex logical connection between two processes. Two varieties of idealized virtual circuits that differ in their reliability characteristics can be distinguished: simple virtual circuits and reliable virtual circuits. Simple virtual circuits have the basic characteristic that data are transmitted across the circuit with no errors as long as the circuit is operational; however, the circuit may fail at any time. Reliable virtual circuits transmit data reliably at all times, in spite of link or processor failures: any recovery and/or reconfiguration that takes place to reconstruct a failed circuit is totally invisible to the two endpoints; any recovery and/or reconfiguration of a failed endpoint (process or processor) is invisible to the nonfailed end. A simple uniform mechanism for managing simple and reliable virtual circuits and for coping with failures of virtual circuits in a local area computer network is described. The mechanism supports the movement of Virtual circuit endpoints to different machines (as part of reconfiguration following failure, for example), and is robust with respect to the failure of any of the software components involved in the establishment and management of simple virtual circuits.

Proceedings ArticleDOI
30 Jul 1982
TL;DR: A technique employing pooled standby with fault tolerance and reconfiguration is concluded to provide the most effective solution where size, weight, and power constraints are most severe.
Abstract: This paper investigates the problems of throughput and reliability encountered in designing multi-computer systems for processing real-time sensor data in the mid-1980 time period. The basic microcomputer and minicomputer building block characteristics are identified; characteristics of ring, crossbar, and banyan interconnection networks are quantified; and the form factors for the resulting multicomputer systems are estimated. Techniques for achieving ultra-reliable computing systems--triple-modular redundancy (TMR), dedicated switched-standby spares, pooled switched-standby spares, and hybrid redundancy--are reviewed and their resulting impact on system design is discussed. The hazard function and its impact on the reliability of systems that must remain dormant for considerable periods are discussed. A technique employing pooled standby with fault tolerance and reconfiguration is concluded to provide the most effective solution where size, weight, and power constraints are most severe.

Proceedings ArticleDOI
01 Oct 1982
TL;DR: A study aimed at improving on that fault isolation algorithm with respect to removing certain network partitioning restrictions, decreasing processing time and making it compatible with channel reconfiguration and other resource management options is reported on.
Abstract: An approach to the technical control of the digital transmission facilities of the defense communication system has been described by Jankaukas, Landesberg, Spector and Meyer (ref. 1). The status monitoring and performance assessment functions of that approach are interesting and the fault isolation algorithm is sound in principle. In this paper we report on the results of a study aimed at improving on that fault isolation algorithm with respect to removing certain network partitioning restrictions, decreasing processing time and making it compatible with channel reconfiguration and other resource management options. In our approach, the communications system is described in terms of two linked-list data bases, these being the equipment data base and the communication path data base. For fault isolation, the basic idea is that whenever an exception report is received, pointers from the alarmed piece of equipment point to the paths in which the equipment is located. For each such path, the equipment has an ordinal number stating its position in that path. For an alarmed equipment in a certain path, the equipment ordinal number is compared with the ordinal number of the most up-stream (in the transmission case) faulted equipment. Comparison of the ordinal numbers quickly indicates whether the alarm is real or sympathetic. This approach has been investigated using computer simulation. Channel recogfiguration and management of resources on the basis of needs and priorities have also been shown to be compatible with use of this algorithm.

Proceedings ArticleDOI
07 Jun 1982
TL;DR: The paper presents a detailed organization for the processor operating system and shows that the operating system must feature two types of distribution: functional or vertical, whereby it is distributed among functional units in accordance with the types of conflicts that should be resolved.
Abstract: The paper discusses the organization of a distributed operating system for dynamic architecture. It shows that the operating system must feature two types of distribution: (a) functional or vertical, whereby it is distributed among functional units in accordance with the types of conflicts that should be resolved; and (b) modular or horizontal, whereby it is distributed among modules performing the same functions.In a dynamic architecture there are three types of conflicts: memory, reconfiguration, and I/O. This leads to the division of OS into three subsystems: (1) a processor OS that resolves memory conflicts, (2) a monitor OS that resolves reconfiguration conflicts, and (c) an I/O OS that resolves all types of I/O conflicts. The paper presents a detailed organization for the processor operating system.

Journal ArticleDOI
TL;DR: Some details of a complete system software for the MULREG1 system are discussed, which was designed for process control purposes in an experimental environment and allows parallel execution of any number of application processes communicating through use of synchronization elements.