scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 1986"


Proceedings Article
25 Aug 1986
TL;DR: The Gamma prototype shows how parallelism can be controlled with minimal control overhead through a combination of the use of algorithms based on hashing and the pipelining of data between processes.
Abstract: In this paper, the design, implementation techniques, and initial performance evaluation of Gamma are presented. Gamma is a new relational database machine that exploits dataflow query processing techniques. Ganma is a fully operational prototype consisting of 20 VAX 11/750 computers. The design of Gamma is based on an earlier multiprocessor database machine prototype (DIRECT) and several years of subsequent research on the problems raised by the DIRECT prototype. In addition to demonstrating that parallelism can really be made to work in a database machine context, the Gamma prototype shows how parallelism can be controlled with minimal control overhead through a combination of the use of algorithms based on hashing and the pipelining of data between processes. Except for 2 messages to initiate each operator of a query tree and 1 message when the operator terminates, the execution of a query is entirely self-scheduling. 52 refs., 12 figs.

402 citations


Journal ArticleDOI
TL;DR: It appears that the overhead due to fine-grain parallelism can be made acceptable by sophisticated compiling and employing special hardware for the storage of data structures, and some of the objections raised against the dataflow approach are discussed.
Abstract: Dataflow machines are programmable computers of which the hardware is optimized for fine-grain data-driven parallel computation. The principles and complications of data-driven execution are explained, as well as the advantages and costs of fine-grain parallelism. A general model for a dataflow machine is presented and the major design options are discussed.Most dataflow machines described in the literature are surveyed on the basis of this model and its associated technology. For general-purpose computing the most promising dataflow machines are those that employ packet-switching communication and support general recursion. Such a recursion mechanism requires an extremely fast mechanism to map a sparsely occupied virtual space to a physical space of realistic size. No solution has yet proved fully satisfactory.A working prototype of one processing element is described in detail. On the basis of experience with this prototype, some of the objections raised against the dataflow approach are discussed. It appears that the overhead due to fine-grain parallelism can be made acceptable by sophisticated compiling and employing special hardware for the storage of data structures. Many computing-intensive programs show sufficient parallelism. In fact, a major problem is to restrain parallelism when machine resources tend to get overloaded. Another issue that requires further investigation is the distribution of computation and data structures over the processing elements.

310 citations


Proceedings Article
01 Jan 1986
TL;DR: This paper addresses two different communication problems in Boolean n-cube configured multiprocessors: 1) broadcasting, i.e., distribution of common data from a single source to all other nodes, and 2) sending personalized data from A to Z, and presents a balanced #panning tree algorithm (BST) that offers a lower complexity than the SBT algorithm for Case 2.
Abstract: High communication bandwidth in standard technologies is more expensive to realize than a high rate of arithmetic or logic operations. The effective utilization of communication resources is crucial for good overall performance in highly concurrent systems. In this paper we address two different communication problems in Boolean n-cube configured multiprocessors: 1) broadcasting, i.e., distribution of common data from a single source to all other nodes, and 2) sending personalized data from a single source to all other nodes. The well known spanning tree algorithm obtained by bit-wise complementation of leading zeroes (referredto as the SBT algorithm for Spanning Binomial nee) is compared with an algorithm using multiple spanning binomial trees (MSBT). The MSBT dgorithm offers a potential speed-up over the SBT dgorithm by afactor of log2 N. We also present a balanced #panning tree algorithm (BST) that offers a lower complexity than the SBT algorithm for Case 2. The potential improvement is by a factor of 3 log2 N. The analysis takes into account the size of the data sets, the communication bandwidth, and the overhead in communication. We also provide some experimental data for the Intel iPSC'd7.

128 citations


Journal ArticleDOI
TL;DR: Two algorithms are proposed for self-testing of embedded bedded RAMs, both of which can detect a large variety of stuck-at and non-stuck-at faults.
Abstract: The authors present a built-in self-test (BIST) method for testing embedded memories. Two algorithms are proposed for self-testing of embedded bedded RAMs, both of which can detect a large variety of stuck-at and non-stuck-at faults. The hardware implementation of the methods requires a hardware test-pattern generator, which produces address, data, and read/write inputs. The output responses of the memory can be compressed by using a parallel input signature analyzer, or they can be compared with expected responses by an output comparator. The layout of memories has been considered in the design of additional BIST circuitry. The authors conclude by evaluating the two schemes on the basis of area overhead, performance degradation, fault coverage, test application time, and testing of self-test circuitry. The BIST overhead is very low and test time is quite short. Six devices, with one of the test schemes, have been manufactured and are in the field.

96 citations


Journal ArticleDOI
Joseph W. H. Liu1
TL;DR: Experimental results on practical problems indicate that the amount of savings in overhead storage can be substantial when compared with Sherman's compressed column storage scheme.
Abstract: For a given sparse symmetric positive definite matrix, a compact row-oriented storage scheme for its Cholesky factor is introduced. The scheme is based on the structure of an elimination tree defined for the given matrix. This new storage scheme has the distinct advantage of having the amount of overhead storage required for indexing always bounded by the number of nonzeros in the original matrix. The structural representation may be viewed as storing the minimal structure of the given matrix that will preserve the symbolic Cholesky factor. Experimental results on practical problems indicate that the amount of savings in overhead storage can be substantial when compared with Sherman's compressed column storage scheme.

94 citations


Journal ArticleDOI
TL;DR: 'ADMS+/-' is an advanced data base management system whose architecture integrates the ADSM+ mainframe data base system with a large number of work station data base systems, designated ADMS-; no communications exist between these work stations.
Abstract: 'ADMS+/-' is an advanced data base management system whose architecture integrates the ADSM+ mainframe data base system with a large number of work station data base systems, designated ADMS-; no communications exist between these work stations. The use of this system radically decreases the response time of locally processed queries, since the work station runs in a single-user mode, and no dynamic security checking is required for the downloaded portion of the data base. The deferred update strategy used reduces overhead due to update synchronization in message traffic.

94 citations


Journal ArticleDOI
TL;DR: This paper presents a graph-theoretic model for determining upper and lower bounds on the number of checks needed for achieving concurrent fault detection and location, and estimates the overhead in time and thenumber of processors required for such a scheme.
Abstract: An important consideration in the design of high- performance multiple processor systems should be in ensuring the correctness of results computed by such complex systems which are extremely prone to transient and intermittent failures. The detection and location of faults and errors concurrently with normal system operation can be achieved through the application of appropriate on-line checks on the results of the computations. This is the domain of algorithm-based fault tolerance, which deals with low-cost system-level fault-tolerance techniques to produce reliable computations in multiple processor systems, by tailoring the fault-tolerance techniques toward specific algorithms. This paper presents a graph-theoretic model for determining upper and lower bounds on the number of checks needed for achieving concurrent fault detection and location. The objective is to estimate ate the overhead in time and the number of processors required for such a scheme. Faults in processors, errors in the data, and checks on the data to detect and locate errors are represented as a tripartite graph. Bounds on the time and processor overhead are obtained by considering a series of subproblems. First, using some crude concepts for t-fault detection and t-fault location, bounds on the maximum size of the error patterns that can arise from such fault patterns are obtained. Using these results, bounds are derived on the number of checks required for error detection and location. Some numerical results are derived from a linear programming formulation.

87 citations


Book ChapterDOI
14 Jul 1986
TL;DR: A backtracking algorithm for AND-Parallelism and its implementation at the Abstract Machine level are presented and a generalized version of Restricted AND- Parallelism (RAP) is introduced as characteristic of this class.
Abstract: A backtracking algorithm for AND-Parallelism and its implementation at the Abstract Machine level are presented: first, a class of AND-Parallelism models based on goal independence is defined, and a generalized version of Restricted AND-Parallelism (RAP) introduced as characteristic of this class. A simple and efficient backtracking algorithm for RAP is then discussed. An implementation scheme is presented for this algorithm which offers minimum overhead, while retaining the performance and storage economy of sequential implementations and taking advantage of goal independence to avoid unnecessary backtracking ("restricted intelligent backtracking"). Finally, the implementation of backtracking in sequential and AND-Parallel systems is explained through a number of examples.

70 citations


Journal ArticleDOI
TL;DR: It is illustrated that it is possible to design a learning controller that is able to dynamically acquire relevant job scheduling information by a process of trial and error, and use that information to provide good performance.

49 citations


Book ChapterDOI
01 Sep 1986
TL;DR: This paper describes how a program may be analyzed statically to determine which literals and predicates are functional, and how the program may then be optimized using this information.
Abstract: While the ability to simulate nondeterminism and return multiple outputs for a single input is a powerful and attractive feature of Prolog, it is expensive both in time and space. Since Prolog programs are very often functional, i.e. do not produce more than one distinct output for a single input, this overhead is especially undesirable. This paper describes how a program may be analyzed statically to determine which literals and predicates are functional, and how the program may then be optimized using this information. Our notion of “functionality” subsumes the notion of “determinacy” that has been considered by various researchers. Our algorithms are less reliant on features such as cut, and thus extend more easily to parallel execution strategies than others that have been proposed.

49 citations


Journal ArticleDOI
TL;DR: A new technique for designing easily testable PLA's is presented that consists of the addition of input lines in such a way that, in test mode, any single product line can be activated and its associated circuitry and device can be tested.
Abstract: A new technique for designing easily testable PLA's is presented. The salient features of this technique are: 1) low overhead, 2) high fault coverage, 3) simple design, and 4) little or no impact on normal operation of PLA's. This technique consists of the addition of input lines in such a way that, in test mode, any single product line can be activated and its associated circuitry and device can be tested. Using this technique, all multiple stuck-at faults, as well as all multiple extra and multiple missing device faults, are detected.


01 Jan 1986
TL;DR: This paper provides a detailed performance review of the implementation for the BBN Butterfly Parallel Processor of the LYNX distributed programming language and provides insight into the likely costs of other message·passing systems, both present and future.
Abstract: Conventional wisdom holds that melsage-passing il orden of magnitude more expenlive than shared memory for communication between parallel proceaaes. Differences in the lpeed of underlying hardware mechanilms fail to account for a lubltantial portion of the performance gap. The remainder il generally attributed to the "inevitable COlt" of higher-level &emantica, but a deeper underatanding of the factora that contribute to musage-passing overhead has not been forthcoming. In thil paper we provide a detailed performance analYlil of one message-palling IJltem: the implementation for the BBN Butterfly Parallel Processor of the LYNX distributed programming language. The case study includes a description of the implementation, an explanation of optimizatioDl employed to improve itl performance, and a detailed breakdown of remaining COltS. The data provide a direct measure of the expense of individual features in LYNX. They also provide insight into the likely costs of other message·passing systems, both present and future. Lessons gained from our experience IIhould be of use to other rellearchers in performing similar studies.

01 Jan 1986
TL;DR: It is demonstrated that hash-partitioned query processing algorithms can serve as a basis for a highly parallel, high performance relational database machine and that such parallelism can be controlled with minimal overhead using dataflow query processing techniques that pipeline data between highly autonomous, distributed processes.
Abstract: In this thesis, we demonstrate that hash-partitioned query processing algorithms can serve as a basis for a highly parallel, high performance relational database machine. In addition to demonstrating that parallelism can really be made to work in a database machine context, we will show that such parallelism can be controlled with minimal overhead using dataflow query processing techniques that pipeline data between highly autonomous, distributed processes. For this purpose, we present the design, implementation techniques, and initial performance evaluation of Gamma, a new relational database machine. Gamma is a fully operational prototype consisting of 20 VAX 11/750 computers. The Gamma architecture illustrates that a high performance database machine can be constructed without the assistance of special purpose hardware components. Finally, a simulation model of Gamma is presented that accurately reflects the measured performance of the actual Gamma prototype. Using this simulation model, we explore the performance of Gamma for large multiprocessor systems with varying hardware capabilities.

Journal ArticleDOI
TL;DR: In this paper, the authors present a parallel execution model for Horn Clause logic programs based on the generator-consumer approach, which can be implemented efficiently with small run-time overhead.
Abstract: This paper presents a parallel execution model for exploiting AND-parallelism in Horn Clause logic programs. The model is based upon the generator-consumer approach, and can be implemented efficiently with small run-time overhead. Other related models that have been proposed to minimize the run-time overhead are unable to exploit the full parallelism inherent in the generator-consumer approach. Furthermore, our model performs backtracking more intelligently than these models. We also present two implementation schemes to realize our model: one has a coordinator to control the activities of processes solving different literals in the same clause; and the other achieves synchronization by letting processes pass messages to each other in a distributed fashion. Trade-offs between these two schemes are then discussed.

Journal ArticleDOI
TL;DR: A method for OR-parallel execution of Prolog on a multiprocessor system that allows many processing elements to process simultaneously a common branch of a search tree and each of these PEs creates its local environment and selects a subtree for processing without communication.
Abstract: Based on extending the sequential execution model of Prolog to include parallel execution, we present a method for OR-parallel execution of Prolog on a multiprocessor system. The method reduces the overhead incurred by parallel processing. It allows many processing elements (PEs) to process simultaneously a common branch of a search tree, and each of these PEs creates its local environment and selects a subtree for processing without communication. The run-time overhead is small: simple and efficient operations for selecting the proper subtree. Communication is necessary only when some PEs have exhausted their search spaces and there are others still searching for solutions. The method is able to utilize most of the technology devised for sequential implementation of Prolog. It is optimized for an architecture which supports broadcast copying.

Book
01 Jan 1986
TL;DR: This dissertation presents algorithms that support persistent search trees, with applications in computational geometry, and a general result is shown that allows making arbitrary ephemeral data structures partially persistent with an O(1) space overhead per update operation.
Abstract: This dissertation introduces the concept of persistence in data structures. Classical algorithms operate on data structures is such a manner that modifications to the structure do not preserve its state as it appeared before the modification. A persistent data structure is one in which multiple versions of the structure as it varies through time are maintained. Data structures that do not maintain the history of states of the structure are called ephemeral. A differentiation between two types of persistence, partial persistence and full persistence, is made. A partially persistent data structure allows the modification only of the most recent version of the structure. This makes partial persistence useful in cases where the history of update operations is required for query purposes but no changes of prior versions are desired. Under certain constraints, any ephemeral data structure may be made persistent without a major blow-up of the space and time complexity measures. Full persistence allows modification of any version of the data structure. This dissertation presents algorithms that support persistent search trees, with applications in computational geometry. In particular, the planar point location problem will be solved using persistent binary search trees with an O(log n) query time and O(n) space. Persistent lists are described, with applications in applicative programming languages. In particular, persistent deques are presented that have constant space overhead per deque operation, while still maintaining O(1) update times. Persistent finger search trees are also presented, with applications in text editing. Persistent finger search trees are implemented with an O(log d) space overhead per update, and an O(log d) time bound, where d is the distance between the finger and the affected position. A general result is shown that allows making arbitrary ephemeral data structures partially persistent with an O(1) space overhead per update operation.

Patent
24 Nov 1986
TL;DR: A child's riding wagon, including a wagon body supported on wheels, one or two chairs on the body fitted with safety belts, a removable overhead canopy and a telescopic handle for a person to pull the wagon, can be found in this article.
Abstract: A child's riding wagon, including a wagon body supported on wheels, one or two chairs on the body fitted with safety belts, a removable overhead canopy and a telescopic handle for a person to pull the wagon.

Journal ArticleDOI
TL;DR: The software virtual machine concept is described as a methodology to reduce the manpower required to implement and maintain finite element software and Planned extensions of capabilities in the SVM used by the authors are outlined.
Abstract: The software virtual machine (SVM) concept is described as a methodology to reduce the manpower required to implement and maintain finite element software. A SVM provides the engineering programmer with high‐level languages to facilitate the structuring and management of data, to define and interface process modules, and to manage computer resources. A prototype finite element system has been successfully implemented using the SVM approach. Development effort is significantly reduced compared to a conventional all‐FORTRAN approach. The impact on execution efficiency of the SVM is described along with special procedures developed to minimize overhead in compute‐bound modules. Planned extensions of capabilities in the SVM used by the authors are outlined.

Journal ArticleDOI
C. Goerg1
TL;DR: This paper analyzes SRPT with constant overhead time CV modified by a constant preemption gap CP \gcong CV for M/G/1 systems and shows a considerable reduction of the mean delay time \overline{TD} , especially for service time distributions with high coefficients of variation.
Abstract: Considering queueing systems in data networks, one can identify a special feature of the communication system: the service time, which is proportional to message length, is known in advance. This is a precondition for queueing strategies such as SPT (shortest processing time first) or SRPT (shortest remaining processing time first), which offers the shortest mean delay time among all conceivable strategies. For a practical evaluation of these strategies it is essential to include the influence of overhead time, especially for the Preemptive SRPT strategy. This paper analyzes SRPT with constant overhead time CV modified by a constant preemption gap CP \gcong CV for M/G/1 systems. The comparison of strategies for FIFO, SPT, SRPT, and RR (round robin) using different service time distributions with coefficients of variation >1 and overhead shows, for SRPT and for a combination of SRPT and RR, a considerable reduction of the mean delay time \overline{TD} , especially for service time distributions with high coefficients of variation. This result indicates the potential advantages of these strategies in data network applications.

Journal ArticleDOI
TL;DR: This work investigates how to directly translate path expressions into hardware for process synchronization at the monitor level in software and finds that if the structure of the path expression allows partitioning, the circuit can be laid out in a distributed fashion without additional area overhead.
Abstract: Path expressions were originally proposed by Campbell and Habermann [2] as a mechanism for process synchronization at the monitor level in software. Not surprisingly, they also provide a useful notation for specifying the behavior of asynchronous circuits. Motivated by these potential applications we investigate how to directly translate path expressions into hardware. Our implementation is complicated in the case of multiple path expressions by the need for synchronization on event names that are common to more than one path. Moreover, since events are inherently asynchronous in our model, all of our circuits must be self-timed. Nevertheless, the circuits produced by our construction have are proportional to N · log(N) where N is the total length of the multiple path expression under consideration. This bound holds regardless of the number of individual paths or the degree of synchronization between paths. Furthermore, if the structure of the path expression allows partitioning, the circuit can be laid out in a distributed fashion without additional area overhead.

Journal ArticleDOI
01 Aug 1986
TL;DR: A new, fully-distributed protocol for integrated voice/data traffic in a local-area, random-access broadcast network is described, which introduces a movable voice-data boundary to framed TDMA/CSMA and eliminates the requirement of system-wide synchronized clocks.
Abstract: A new, fully-distributed protocol for integrated voice/data traffic in a local-area, random-access broadcast network is described. The protocol introduces a movable voice-data boundary to framed TDMA/CSMA and eliminates the requirement of system-wide synchronized clocks. The movable boundary is a major advantage in any system where fluctuations in voice and data loads are expected because assignment of idle capacity from one traffic class to the other increases the utilization of the channel. The protocol provides collision-free virtual circuits for voice and periods of non-persistent CSMA/CD for data traffic and call establishment, and can support multi-party calls as well as two-way conversations. The protocol allows variable-size voice packets that have very low overhead and variable-size data packets that may be much longer than voice packets. This is of significant practical advantage over previous work, which has required fixed-size voice and/or data packets, or voice packets with high overhead. A method of dynamically controlling the movable boundary to balance the voice and data traffic is also proposed.

Journal ArticleDOI
TL;DR: Can exceptions be implemented for Ada without imposing overhead on normal execution?
Abstract: Can exceptions be implemented for Ada without imposing overhead on normal execution? Yes, as long as certain rules are followed.


Journal ArticleDOI
01 Jan 1986
TL;DR: The solution of polynomial systems of equations via a globally convergent homotopy algorithm on a hypercube and some timing results for different situations are considered.
Abstract: Comparisons between problems solved on uniprocessor systems and those solved on distributed computing systems generally ignore the overhead associated with information transfer from one process to another. This paper considers the solution of polynomial systems of equations via a globally convergent homotopy algorithm on a hypercube and some timing results for different situations.


30 Jun 1986
TL;DR: Two distributed algorithms to synchronize concurrent updates in replicated database systems using timestamps to perform synchronization and the property of data replication is exploited to replace a set of model parameters by a single parameter which reduces the number of necessary simulation runs.
Abstract: We develop two distributed algorithms to synchronize concurrent updates in replicated database systems Both algorithms use timestamps to perform synchronization First algorithm uses knowledge of readsets and writesets of updates to enhance concurrency It avoids update restarts by having all sites exchange the writeset and timestamp of an update prior to executing it We study a variant of this algorithm which reduces blocking delays due to conflicting access by creating multiple versions of data objects Consequently, it has smaller update response time and higher system throughput However, it requires more storage because multiple versions of data objects may be kept Second algorithm is based on a fully-distributed approach to update execution where each site completely executes every update When a site receives an update from a user, it is transported to all other sites After it, each site executes that update completely without any exchange of computed values The fully-distributed nature of second algorithm imparts to it several features such as higher resiliency to different kinds of failures, higher parallelism, fast response to user updates, and low communication overhead We present a queueing network model of a replicated database system and use it in analytic and simulation performance studies We exploit the property of data replication to replace a set of model parameters by a single parameter which reduces the number of necessary simulation runs We analytically study the performance of first algorithm using an iterative and approximate technique With the help of numerical examples, we show that the analytical solution predicts the values of the performance measures with fairly good accuracy and predicts the shape of the performance curves very accurately We study and compare the performance of our concurrency control algorithms with the performance of some existing concurrency control algorithms 24, 47 using an event-driven simulator Results of the performance study show that our algorithms have better performance (ie, have better response time and throughput characteristics)

Journal ArticleDOI
TL;DR: It is shown that SEF can be used to reduce the associated error rate to insignificant levels, and it incurs a lower area overhead than known techniques.
Abstract: Soft errors caused by ionizing radiation will be a limiting factor in the reliability of VLSI circuits with submicron-feature sizes. A new approach to the design of soft-error-tolerant digital integrated circuits'is presented. It is based on the filtering of transients at register inputs, and it incurs a lower area overhead than known techniques. The method, called soft-error filtering (SEF), is derived on the basis of the analogy between a noise-sensitive finite-state machine and a noisy communication channel. The necessary characteristics of the register are examined and a design is presented for the associated filter. It is shown that SEF can be used to reduce the associated error rate to insignificant levels.

Patent
12 Dec 1986
TL;DR: In this paper, an n-butane/isobutane splitter is operated by compressing the isobutanes overhead to increase its condensing temperature, using the compressed overhead to heat bottoms in a reboiler, which is operated to condense the overhead and cooling the condensed overhead to a temperature no lower than the temperature on the top tray of the splitter and no higher than 20° F.
Abstract: An n-butane/isobutane splitter is operated by compressing the isobutane overhead to increase its condensing temperature, using the compressed overhead to heat bottoms in a reboiler, which is operated to condense the overhead and cooling the condensed overhead to a temperature no lower than the temperature on the top tray of the splitter and no higher than 20° F. above the temperature on the top tray, whereby the throughput of the splitter is increased by 10 to 20%.