scispace - formally typeset
Search or ask a question

Showing papers on "Control reconfiguration published in 1986"


Journal ArticleDOI
TL;DR: This method systematically exploits type-specific properties of objects such as sets, queues, or directories to provide more effective replication and can realize a wider range of availability properties and more flexible reconfiguration than comparable replication methods.
Abstract: Replication can enhance the availability of data in distributed systems. This paper introduces a new method for managing replicated data. Unlike many methods that support replication only for uninterpreted files, this method systematically exploits type-specific properties of objects such as sets, queues, or directories to provide more effective replication. Each operation requires the cooperation of a certain number of sites for its successful completion. A quorum for an operation is any such set of sites. Necessary and sufficient constraints on quorum intersections are derived from an analysis of the data type's algebraic structure. A reconfiguration method is proposed that permits quorums to be changed dynamically. By taking advantage of type-specific properties in a general and systematic way, this method can realize a wider range of availability properties and more flexible reconfiguration than comparable replication methods.

263 citations


Journal ArticleDOI
01 May 1986
TL;DR: The paper presents the problem of fault tolerance in VLSI array structures, and two different approaches are presented, both based upon introduction of simple patterns of faults and by global reconfiguration techniques.
Abstract: The paper presents the problem of fault tolerance in VLSI array structures: its aim is to discuss architectures capable of surviving a number of random faults while keeping costs (in terms of added silicon area and of increased processing time) as low as possible Two different approaches are presented, both based upon introduction of simple patterns of faults and by global reconfiguration techniques (rather than one-to-one substitution of faulty elements by spare ones) Various solutions are compared, and relative performances are discussed in order to determine criteria for selecting the one most suitable to particular applications

170 citations


Proceedings ArticleDOI
02 Jul 1986
TL;DR: Two algorithms for spare allocation that are based on graph-theoretic analysis are presented, which provide highly efficient and flexible reconfiguration analysis and are shown to be NP-complete.
Abstract: The issue of yield degradation due to physical failures in large memory and processor arrays is of significant importance to semiconductor manufacturers. One method of increasing the yield for iterated arrays of memory cells or processing elements is by incorporating spare rows and columns in the die or wafer which can be programmed into the array. This paper addresses the issue of computer-aided design approaches to optimal reconfiguration of such arrays. The paper presents the first formal analysis of the problem. The complexity of optimal reconfiguration is shown to be NP-complete for rectangular arrays utilizing spare rows and columns. In contrast to previously proposed exhaustive search and greedy algorithms, this paper develops a heuristic branch and bound approach based on the complexity analysis, which allows for flexible and highly efficient reconfiguration. Initial screening is performed by a bipartite graph matching algorithm.

142 citations


Proceedings ArticleDOI
01 Dec 1986
TL;DR: In this paper, a new control concept for industrial robots (IR) is presented which is based on predictive control principles, which can be easily implemented even for elastic mechanical IR structures, and both an excellent performance and robustness with respect to uncertainties or variations of the model parameters can be provided.
Abstract: In this paper a new control concept for industrial robots (IR) will be presented which is based on predictive control principles. It can be easily implemented even for elastic mechanical IR structures. Both an excellent performance and robustness with respect to uncertainties or variations of the model parameters can be provided. The proposed algorithm has been practically applied for the point-to-point and path control of a conventional IR (KUKA 160) with elastic mechanical multibody structure. The performance quality will be demonstrated by experimental and simulation results.

68 citations


Journal ArticleDOI
TL;DR: A new modular, fault-tolerant scheme is proposed for the binary tree architecture that uses redundant modular fault- tolerant building blocks to construct the complete binary tree.
Abstract: A new modular, fault-tolerant scheme is proposed for the binary tree architecture. The approach uses redundant modular fault- tolerant building blocks to construct the complete binary tree. The restructuring operation is local to each faulty module. The proposed scheme is shown to be more reliable and easier to implement than the existing fault-tolerant schemes.

63 citations


Patent
12 Dec 1986
TL;DR: In this article, the authors present a communication system that enables communications between reconfigurable data terminals and a variety of connected computers, having dissimilar operating parameters, by automatically reconfiguring the operating parameters of the calling terminal to match those of the called computer.
Abstract: The disclosed communication system enables communications between reconfigurable data terminals and a variety of connected computers, having dissimilar operating parameters, by automatically reconfiguring the operating parameters of the calling terminal to match those of the called computer. The system controller, in response to receiving the dialed number of the called computer from the calling terminal, accesses its memory for the operating parameters of the called computer and transmits these operating parameters to reconfigure the calling terminal. Another embodiment enables the operating parameters of a terminal to be reconfigured in any of a number of operating modes in response to a reconfiguration request signal sent to the system controller from the terminal.

60 citations


Proceedings ArticleDOI
04 Apr 1986
TL;DR: A new methodology to incorporate fault tolerance capability into processor arrays which have been proposed for these problems by using special properities of eigenvalues and singular values to achieve the error detection without encoding the input data.
Abstract: The computations of eigenvalues and singular values are key to applications including signal and image processing. Since large amounts of computation are needed for these algorithms, and since many digital signal processing applications have real-time requirements, many different special-purpose processor array structures have been proposed to solve these two algorithms. This paper develops a new methodology to incorporate fault tolerance capability into processor arrays which have been proposed for these problems. In the first part of this paper, earlier techniques of algorithm-based fault tolerance are applied to QR factorization and QR iteration. This technique encodes input data at a high level by using the specific property of each algorithm and checks the output data before they leave the systems. In the second part of the paper, special properities of eigenvalues and singular values are used to achieve the error detection without encoding the input data. Fault location and reconfiguration are performed only after an erroneous signal has been detected. The introduced overhead is extremely low in terms of both hardware and time redundancy.

46 citations


Journal ArticleDOI
Yyanney1, Hayes
TL;DR: In this article, a methodology for characterizing dynamic distributed recovery in fault-tolerant multiprocessor systems is developed using graph theory, which is intended for systems with no central supervisor, since each processor is assumed to have only a limited amount of information about the system as a whole.
Abstract: A methodology for characterizing dynamic distributed recovery in fault-tolerant multiprocessor systems is developed using graph theory. Distributed recovery, which is intended for systems with no central supervisor, depends on the cooperation of a set of processors to execute the recovery function, since each processor is assumed to have only a limited amount of information about the system as a whole. Facility graphs, whose nodes denote the system components (processors), and whose edges denote interconnection between components, are used to represent multiprocessor systems, and error conditions. A general distributed recovery strategy R, which allows global recovery to be achieved via a sequence of local actions, is given. R recovers the system in several steps in which different nodes successively act as the local supervisor. R is specialized for two important classes of systems: loop networks and tree networks. For each of these cases, fault-tolerant designs and their associated distributed recovery strategies, which allow recovery from up to k faults within a specified number of steps, are presented.

40 citations


Patent
02 Oct 1986
TL;DR: In this paper, a plurality of redundant channels in a system each contain a global image of all the configuration data bases in each of the channels in the system, each global image is updated periodically from each other channels via cross channel data links.
Abstract: A plurality of redundant channels in a system each contain a global image of all the configuration data bases in each of the channels in the system. Each global image is updated periodically from each of the other channels via cross channel data links. The global images of the local configuration data bases in each channel are separately symmetrized using a voting process to generate a system signal configuration data base which is not written into by any other routine and is available for indicating the status of the system within each channel. Equalization may be imposed on a suspect signal and a number of "chances" for that signal to "heal" itself are provided before excluding it from future votes. Reconfiguration is accomplished upon detecting a channel which is deemed invalid. A reset function is provided which permits an externally generated reset signal to permit a previously excluded channel to be reincluded within the system. The updating of global images and/or the symmetrization process may be accomplished at substantially the same time within a synchronized time frame common to all channels.

33 citations


Journal ArticleDOI
TL;DR: A hierarchically structured controller architecture which supports interactive programming and reconfiguration while the controller is running, intended for manipulator, robot, and work cell control as well as for research in automation programming and motion control.

26 citations


Journal ArticleDOI
TL;DR: The results of a performance evaluation of the Software-Implemented Fault-Tolerance (SIFT) computer system conducted in the NASA Avionics Integration Research Laboratory are presented and specific design changes are proposed that reduce this overhead burden significantly.
Abstract: The results of a performance evaluation of the Software-Implemented Fault-Tolerance (SIFT) computer system conducted in the NASA Avionics Integration Research Laboratory are presented. The essential system functions are described and compared to both earlier design proposals and subsequent design improvements. Using SIFT's specimen task load, the executive tasks, such as reconfiguration, clock synchronization, and interactive consistency, are found to consume significant computing resources. Together with other system overhead (e.g., voting and scheduling), the operating system overhead is in excess of 60%. The authors propose specific design changes that reduce this overhead burden significantly.

Journal ArticleDOI
TL;DR: This correspondence presents a collection of reconfiguration procedures for a multiprocessor which employs multistage interconnection networks that can be reconfigured in the form of the desired topologies without interfering with other subsystems.
Abstract: This correspondence presents a collection of reconfiguration procedures for a multiprocessor which employs multistage interconnection networks. These procedures are used to dynamically partitipn the multiprocessor into many subsystems, and reconfigure them to form a variety commonly used topologies to match task graphs. By examining the switching capability of the interconnection network, design rules for avoiding connection conflicts are exploited. Then, on the basis of these rules, parallel procedures are designed. With the procedures, a subsystem can be reconfigured in the form of the desired topologies without interfering with other subsystems. In addition, the reconfiguration of a subsystem can be accomplished in constant time, independently of subsystem size.

Journal ArticleDOI
TL;DR: Important techniques for dependability modeling are discussed, including methods of model construction and model solution, and some of the recent software packages for dependable analysis are compared.

Journal ArticleDOI
TL;DR: Upper and lower bounds for the probability of failure are derived for reliability models that are appropriate for a class of highly reliable systems where the individual components fail independently at a low constant rate and where faulty components are quickly removed from the system.

Journal ArticleDOI
TL;DR: The design and operation of a cell exploring potential solutions to some of the problems arising from the development of factory automation systems and further thoughts on their solution are presented.
Abstract: This paper explores some of the critical issues in the development of factory automation systems. Amongst the most significant of these can be considered the role of people, the generation of controls software, the reconfiguration of the manufacturing hardware and software to allow an increased variety of tasks to be carried out and automatic error recovery. This paper reports the design and operation of a cell exploring potential solutions to some of the problems arising from these issues and presents further thoughts on their solution.

DOI
01 Nov 1986
TL;DR: The paper presents the software architecture of a fault-tolerant distributed system which supports both cold and hot standby redundancy of selected software modules, and believes that this approach provides a simple, flexible, and practical approach for the provision of fault tolerance in distributed embedded systems.
Abstract: Many applications have a need for different degrees of fault tolerance in the same system. The paper presents the software architecture of a fault-tolerant distributed system which supports both cold and hot standby redundancy of selected software modules. Cold standby modules are created and activated by the system in order to replace failed modules, but no state information is preserved. Hot standby modules do preserve state information and provide transparent recovery from failures. A technique is used that allows modules to be programmed without fault-tolerance in mind; afterwards they can be transformed in order to achieve that capability. These two types of redundancy are supported by common mechanisms which provide for detection of failures and reconfiguration of the software modules of the application. Reconfiguration is also used to recover the reliability of the system by providing further standby modules to replace failed ones. We believe that this approach provides a simple, flexible, and practical approach for the provision of fault tolerance in distributed embedded systems.

Proceedings ArticleDOI
01 Dec 1986
TL;DR: The developed theory is shown to provide a new general conceptual approach for the design of linear tracking control systems and is applied in a design of nominal and decoupled flight control modes of the Control Augmentation System (CAS) for fighter aircrafts.
Abstract: A general new decoupling configuration of the two-degree-of-freedom tracking control system is presented. In this configuration the closed loop part, that satisfies the regulation requirements, and the prefilter part, that yields the desired command response, can be found independently one from the other. When implemented, each part of the controller does not interfere with the response characteristics that were shaped by the other part. The developed theory is shown to provide a new general conceptual approach for the design of linear tracking control systems. The theory is applied in a design of nominal and decoupled flight control modes of the Control Augmentation System (CAS) for fighter aircrafts.

Journal ArticleDOI
TL;DR: Assuming a horizontally distributed computing system, formulations of the edge-failures and node-failure recovery problems from the standpoint of load-leveling are presented and simple algorithms are proposed for these problems.
Abstract: Assuming a horizontally distributed computing system, formulations of the edge-failure and node-failure recovery problems from the standpoint of load-leveling are presented. The conditions for the existence of solutions to these problems are examined and simple algorithms are proposed for these problems. In connection with the node-failure recovery problem, the concept of a node-failure metric to characterize different possible solutions is introduced, exploiting the notion of the strength of processors. A possible application of the recovery methods in the context of reconfiguration of distributed database systems is suggested.

Proceedings ArticleDOI
Anurag Kumar1
01 Dec 1986
TL;DR: In this paper, the authors studied adaptive control techniques for controlling the flow of work from the peripheral processors (PPs) to the central processor (CP) of a distributed system with a star topology.
Abstract: We study adaptive control techniques for controlling the flow of work from the Peripheral Processors (PPs) to the Central Processor (CP) of a distributed system with a star topology We consider two classes of mechanisms for controlling the flow of jobs from the PPs to the CP: (i) Proportional control: a certain proportion of the load offered by each PP is admitted, and (ii) Threshold control: there is a maximum rate at which jobs from any PP can be admitted The problem is to obtain good algorithms for dynamically adjusting the control level, in order to meet certain CP performance objectives, when the load offered by the PPs is unknown and varying We formulate the problem approximately as a standard system control problem in which the system has unknown parameters that are subject to change Using the naive-feedback-controller and stochastic approximation techniques, we derive adaptive controls for the system control problem We demonstrate the efficacy of these controls in the original problem by simulations of a queuing model of the CP and the load controls

Journal ArticleDOI
TL;DR: Fault detection, isolation, and recovery methodology employed in the Fault Tolerant Multiprocessor is described and results were found to be in close agreement with earlier assumptions made in reliability modeling.
Abstract: The Fault Tolerant Multiprocessor is a highly reliable computer designed to meet a goal of 10 ~ failures per hour. To a large extent, this level of reliability depends upon the ability to detect and isolate faults rapidly and accurately, purge the faulty module, and dynamically reconfigure the remaining good modules. This paper describes fault detection, isolation, and recovery methodology employed in the Fault Tolerant Multiprocessor. The second part of the paper deals with experimental results obtained by actually injecting faults at the pin level in the Fault Tolerant Multiprocessor. Over 21,000 faults were injected in the central processing unit, memory, bus interface circuits, and error detection, masking, and error reporting circuits of one line replaceable unit of the multiprocessor. Detection, isolation, and reconfiguration times for each fault were recorded. These results were found to be in close agreement with earlier assumptions made in reliability modeling. The experimental results are summarized in this paper.

01 Jan 1986
TL;DR: This work describes a layered design for a dislributed operating system with dislributed protocols that can be modified -or even completely changed -while the system is running, which makes it suitable for diverse applications.
Abstract: There is a need to design large database systems that are not rigid in their choice of algorithms and are responsive to faults/failures and performance degradation. To attack this challenge, we fonnalize and experiment with design principles that allow the implementation of an adaptable distributed system. By adaptable, we imply that systems can be reconfigured at run-time based on perfonnance and continuity of operations requirements and load conditions. Our research focus is on algorithms for concurrency control, resiliency to site failures, network partitioning, and failure of communication systems. The strategies for dynamic reconfiguration of the software algorithms and determining their impact are being studied both theoretically and via experiments on a prototype system called RAID being developed at Purdue. We describe a layered design for a dislributed operating system with dislributed protocols that can be modified -or even completely changed -while the system is running. This capability will be a help in tuning the system to improve its perfonnance and reliability. In addition, the increased flexibility of this design makes it suitable for diverse applications. and capable of incorporating new distributed systems technology as it becomes available, unlike existing systems.

Book ChapterDOI
17 Sep 1986
TL;DR: Fault tolerance techniques which have been developed and implemented for the multiprocessor system DIRMU 25 — a 25-processor system which is operational at the University of Erlangen-Nuremberg are described.
Abstract: This paper describes fault tolerance techniques which have been developed and implemented for the multiprocessor system DIRMU 25 — a 25-processor system which is operational at the University of Erlangen-Nuremberg. First a short overview of the DIRMU hardware architecture, programming environment and parallel application programs is given. Fault-diagnosis and reconfiguration are implemented in a layer of the DIRMOS operating system: the hardware configuration management. The concept of this configuration management is described in general (based on a graph model) and its application for the fault-tolerant execution of parallel programs is discussed.

Journal Article
TL;DR: In this paper, a random access protocol for multiaccess broadcast bus systems is presented, which has the capability to reconfigure the available bandwidth instantaneously and can thus present a more efficient channel configuration to the users subject to local loading conditions.

Journal ArticleDOI
Schwederski1, Siegel
TL;DR: It is important to look for solutions to the development of system routines to efficiently control system reconfiguration and the writing of complex application programs in order to make efficient use of reconfigurable supersystems practical.
Abstract: Steady increases in computer hardware performance show no sign of slowing. Researchers are now exploring new supersystem architectures, such as data flow computers and reconfigurable computers that restructure the organization of their hardware to adapt to computational needs. If a supersystem can be reconfigured, it can more likely efficiently execute tasks that previously required a set of dedicated systems. Reconfiguration is especially important if the programs executed by the supersystem have widely differing computational requirements. Computations needed to control some sophisticated weapon systems are of this variety. The utilization and performace of any supersystem depends strongly on the software available for it. Necessary software includes system software, such as compilers and operating systems, and application software. Designing the system software to make efficient use of complex supercomputers is difficult. It is further complicated if the supercomputer is reconfigurable. The application programs for such systems are typically very large and complex, such as those for mission-critical military tasks. Thus, two problems facing system designers are (1) the development of system routines to efficiently control system reconfiguration, and (2) the writing of complex application programs. Software used to control and program reconfigurable supersystems must efficiently exploit the hardware flexibility available. If not,more » then the system does not fulfill its potential. Frequently in the past, customized application software packages were developed for every new computer, often resulting in programs that could be executed on only one type of machine. This machine dependence leads to duplication of effort since the same algorithms have to be coded repeatedly, thus increasing software cost. It is therefore important to look for solutions to these two problems in order to make efficient use of reconfigurable supersystems practical.« less

Journal ArticleDOI
TL;DR: Primary goal of this work is application of self-reconfiguration Algorithms to Fourier Transform architectures, as required for installation of the processing system on satellites.

Proceedings ArticleDOI
01 Dec 1986
TL;DR: In this article, the robust servomechanism problem is considered for a linear time-invariant system, where the goal is to find the minimal controller structure realization, i.e. one which satisfactorily regulates the system.
Abstract: The following type of problem is considered. Given a linear time-invariant system, assume that there exists a solution to the robust servomechanism problem for the system, for the case of a given class of disturbance/reference input signals. Then it is desired to implement a controller for the system, which solves the robust servomechanism problem, and such that the resulting controller has the property that the number of input/output interconnections required to implement the controller is minimized. Such a controller is said to have a "minimal controller structure realization". The motivation for studying this problem arises in large scale systems, e.g. chemical process control, where, in general, one would like to find the "simplest possible controller", i.e. one which has a minimal controller structure realization, which satisfactorily regulates the system.

Journal ArticleDOI
TL;DR: A polynomial-time algorithm for finding optimal reconfiguration strategies for a class of reconfigurable fault- tolerant computer systems in which there is no repair in failed components is constructed.
Abstract: In this correspondence, we study the problem of finding optimal reconfiguration strategies for a class of reconfigurable fault- tolerant computer systems in which there is no repair in failed components. The problem of finding optimal reconfiguration strategies consists of determining, for each failed state of the system, the operational state into which the system should reconfigure itself. We presented a stochastic model for the above class of reconfigurable computer systems. Based on this model, we construct a polynomial-time algorithm for finding optimal reconfiguration strategies.

Book ChapterDOI
01 Jan 1986
TL;DR: A new methodology for controlling and managing a complex distributed communication system is presented here, and a model which describes the real-time behavior has been proposed, together with the recommended alternative for its implementation.
Abstract: A new methodology for controlling and managing a complex distributed communication system is presented here. A model which describes the real-time behavior has been proposed, together with the recommended alternative for its implementation. The research is based on a rational design approach, supported by a simulation study, for evaluating the relative performance of different strategies. The main problem is related to the constraints of signal observability by all distributed units. It creates an uncertainty gap which ends up in a non-coherent view of the system state by all its units. The uncertainty gap can be minimized by the creation of a new control architecture. The communication coordinator section in every unit should carry all of the mechanistic management activities, leaving only strategic decisions to the control section. Another promising method for reducing the uncertainty gap, is the integration of the reconfiguration technique into the distributed system, which leads to the adaptation of the system to the varying work profile. The simulation study indicated a potential bottleneck in the centralized scheduling approach, and recommended the dynamic allocation method as the prefered one. The minimal-queue unit is selected for running the new task, provided that the source unit queue has reached a threshold level above its average queue length.

Journal Article
TL;DR: Particular emphasis is laid upon reliability improvement techniques, based on duplication of the ring subsystem components, and the automatic reconfiguration algorithm, which determines network configuration when faults occur.

Journal ArticleDOI
TL;DR: An approximation formula is derived for the probability of failure for fault-tolerant process-control computers using Finite-state Markov models, which capture the dynamic behavior of component failure and system recovery.
Abstract: An approximation formula is derived for the probability of failure for fault-tolerant process-control computers. These computers use redundancy and reconfiguration to achieve high reliability. Finite-state Markov models capture the dynamic behavior of component failure and system recovery, and the approximation formula permits an estimation of system reliability by an easy examination of the model.