scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 1992"


Journal ArticleDOI
TL;DR: A modified Hopfield neural network model for regularized image restoration is presented, which allows negative autoconnections for each neuron and allows a neuron to have a bounded time delay to communicate with other neurons.
Abstract: A modified Hopfield neural network model for regularized image restoration is presented. The proposed network allows negative autoconnections for each neuron. A set of algorithms using the proposed neural network model is presented, with various updating modes: sequential updates; n-simultaneous updates; and partially asynchronous updates. The sequential algorithm is shown to converge to a local minimum of the energy function after a finite number of iterations. Since an algorithm which updates all n neurons simultaneously is not guaranteed to converge, a modified algorithm is presented, which is called a greedy algorithm. Although the greedy algorithm is not guaranteed to converge to a local minimum, the l/sub 1/ norm of the residual at a fixed point is bounded. A partially asynchronous algorithm is presented, which allows a neuron to have a bounded time delay to communicate with other neurons. Such an algorithm can eliminate the synchronization overhead of synchronous algorithms. >

233 citations


Journal ArticleDOI
23 Aug 1992
TL;DR: This work adopts randomized algorithms as the main approach toParametric query optimization and enhances them with a sideways information passing feature that increases their effectiveness in the new task, without much sacrifice in the output quality and with essentially zero run-time overhead.
Abstract: In most database systems, the values of many important run-time parameters of the system, the data, or the query are unknown at query optimization time. Parametric query optimization attempts to identify at compile time several execution plans, each one of which is optimal for a subset of all possible values of the run-time parameters. The goal is that at run time, when the actual parameter values are known, the appropriate plan should be identifiable with essentially no overhead. We present a general formulation of this problem and study it primarily for the buffer size parameter. We adopt randomized algorithms as the main approach to this style of optimization and enhance them with a sideways information passing feature that increases their effectiveness in the new task. Experimental results of these enhanced algorithms show that they optimize queries for large numbers of buffer sizes in the same time needed by their conventional versions for a single buffer size, without much sacrifice in the output quality and with essentially zero run-time overhead.

190 citations


Journal ArticleDOI
TL;DR: An analytical performance model for multithreaded processors that includes cache interference, network contention, context-switching overhead, and data-sharing effects is presented and indicates that processors can substantially benefit from multithreading, even in systems with small caches, provided sufficient network bandwidth exists.
Abstract: An analytical performance model for multithreaded processors that includes cache interference, network contention, context-switching overhead, and data-sharing effects is presented. The model is validated through the author's simulations and by comparison with previously published simulation results. The results indicate that processors can substantially benefit from multithreading, even in systems with small caches, provided sufficient network bandwidth exists. Caches that are much larger than the working-set sizes of individual processes yield close to full processor utilization with as few as two to four contexts. Smaller caches require more contexts to keep the processor busy, while caches that are comparable in size to the working-sets of individual processes cannot achieve a high utilization regardless of the number of contexts. Increased network contention due to multithreading has a major effect on performance. The available network bandwidth and the context-switching overhead limits the best possible utilization. >

188 citations


DissertationDOI
01 Jan 1992
TL;DR: It is shown how the BH algorithm can be adapted to execute in parallel, and the performance of the parallel version of the algorithm is analyzed, finding that the overhead is due primarily to interprocessor synchronization delays and redundant computation.
Abstract: Recent algorithmic advances utilizing hierarchical data structures have resulted in a dramatic reduction in the time required for computer simulation of N-body systems with long-range interactions. Computations which required O($N\sp2$) operations can now be done in O(N log N) or O(N). We review these tree methods and find that they may be distinguished based on a few simple features. The Barnes-Hut (BH) algorithm has received a great deal of attention, and is the subject of the remainder of the dissertation. We present a generalization of the BH tree and analyze the statistical properties of such trees in detail. We also consider the expected number of operations entailed by an execution of the BH algorithm. We find an optimal value for m, the maximum number of bodies in a terminal cell, and confirm that the number of operations is O(N log N), even if the distribution of bodies is not uniform. The mathematical basis of all hierarchical methods is the multipole approximation. We discuss multipole approximations, for the case of arbitrary, spherically symmetric, and Newtonian Green's functions. We describe methods for computing multipoles and evaluating multipole approximations in each of these cases, emphasizing the tradeoff between generality and algorithmic complexity. N-body simulations in computational astrophysics can require 10$\sp6$ or even more bodies. Algorithmic advances are not sufficient, in and of themselves, to make computations of this size feasible. Parallel computation offers, a priori, the necessary computational power in terms of speed and memory. We show how the BH algorithm can be adapted to execute in parallel. We use orthogonal recursive bisection to partition space. The logical communication structure that emerges is that of a hypercube. A local version of the BH tree is constructed in each processor by iteratively exchanging data along each edge of the logical hypercube. We obtain speedups in excess of 380 on a 512 processor system for simulations of galaxy mergers with 180000 bodies. We analyze the performance of the parallel version of the algorithm and find that the overhead is due primarily to interprocessor synchronization delays and redundant computation. Communication is not a significant factor.

166 citations


Proceedings ArticleDOI
12 May 1992
TL;DR: The authors suggest that this methodology must reach a point of diminishing returns, and hence focus on explicit error detection and correction, and suggest that robust mapping requires little overhead beyond that needed for nonrobust mapping.
Abstract: An issue that must be addressed in map-learning systems is that of error accumulation. The primary emphasis in the literature has been on reducing errors entering the map. The authors suggest that this methodology must reach a point of diminishing returns, and hence focus on explicit error detection and correction. By identifying the possible types of mapping errors, structural constraints can be exploited to detect and diagnose mapping errors. Such robust mapping requires little overhead beyond that needed for nonrobust mapping. A mapping system was implemented based on those ideas. Extensive testing in simulation demonstrated the effectiveness of the proposed error-correction strategies. >

158 citations


Proceedings ArticleDOI
05 Oct 1992
TL;DR: A novel algorithm for checkpointing and rollback recovery in distributed systems is presented that includes the damage assessment phase, unlike previous schemes that either assume that an error is detected immediately after it occurs (fail-stop) or simply ignore the damage caused by imperfect detection mechanisms.
Abstract: A novel algorithm for checkpointing and rollback recovery in distributed systems is presented. Processes belonging to the same program must take periodically a nonblocking coordinated global checkpoint, but only a minimum overhead is imposed during normal computation. Messages can be delivered out of order, and the processes are not required to be deterministic. The nonblocking structure is an important characteristic for avoiding laying a heavy burden on the application programs. The method also includes the damage assessment phase, unlike previous schemes that either assume that an error is detected immediately after it occurs (fail-stop) or simply ignore the damage caused by imperfect detection mechanisms. A possible way to evaluate the error detection latency, which enables one to assess the damage made and avoid the propagation of errors, is presented. >

148 citations


Journal ArticleDOI
TL;DR: Algorithms for generating NC tool paths for machining of arbitrarily shaped 2 l/2 dimensional pockets with arbitrary islands are described, based on a new offsetting algorithm presented in this paper.
Abstract: In this paper we describe algorithms for generating NC tool paths for machining of arbitrarily shaped 2 l/2 dimensional pockets with arbitrary islands. These pocketing algorithms are based on a new offsetting algorithm presented in this paper. Our offsetting algorithm avoids costly two-dimensional Boolean set operations, relatively expensive distance calculations, and the overhead of extraneous geometry, such as the Voronoi diagrams, used in other pocketing algorithms.

124 citations


Proceedings ArticleDOI
01 Jul 1992
TL;DR: A technique that can convert most existing lock-based blocking data structure algorithms into nonblocking algorithms with the same functionality with no penalty in the amount of concurrency that was available in the original data structure is proposed.
Abstract: Nonblocking algorithms for concurrent data structures guarantee that a data structure is always accessible. This is in contrast to blocking algorithms in which a slow or halted process can render part or all of the data structure inaccessible to other processes.This paper proposes a technique that can convert most existing lock-based blocking data structure algorithms into nonblocking algorithms with the same functionality. Our instruction-by-instruction transformation can be applied to any algorithm having the following properties:•Interprocess synchronization is established solely through the use of locks.•There is no possiblity of deadlock (e.g. because of a well-ordering among the lock requests).In contrast to a previous work, our transformation requires only a constant amount of overhead per operation and, in the absence of failures, it incurs no penalty in the amount of concurrency that was available in the original data structure.The techniques in this paper may obviate the need for a wholesale reinvention of techniques for nonblocking concurrent data structure algorithms.

123 citations


Patent
18 Mar 1992
TL;DR: In this paper, a data transmission from the virtual space of a process in a certain cluster to the real space of another cluster is executed without copying the data to the buffer provided within the operating system.
Abstract: In a parallel computer, in order to reduce the overhead of data transmissions between the processes, a data transmission from the virtual space of a process in a certain cluster to the virtual space of a process in other cluster is executed without copying the data to the buffer provided within the operating system. The real communication area resident in the real memory is provided in a part of the virtual space of the process, and an identifier unique within the cluster is given to the communication area. When the transmission process has issued a transmission instruction at the time of data transmission, the cluster address of the cluster in which the transmission destination process exists and the identifier of the communication area are determined based on the name of the transmission destination process. Then, the data is directly transmitted between the mutual real communication areas of the transmission originating process and the transmission destination process. Overhead for the data transmission between the processes can be reduced by avoiding making a copy of the data between the user space and the buffer provided within the operating system at the time of data transmission between the processes.

106 citations


Journal ArticleDOI
TL;DR: A tutorial overview of how selected computer-vision-related algorithms can be mapped onto reconfigurable parallel-processing systems is presented and it is demonstrated how reconfigurability can be used by reviewing and examining five computer- Vision- related algorithms, each one emphasizing a different aspect of reconfigURability.
Abstract: A tutorial overview of how selected computer-vision-related algorithms can be mapped onto reconfigurable parallel-processing systems is presented. The reconfigurable parallel-processing system assumed for the discussions is a multiprocessor system capable of mixed-mode parallelism; that is, it can operate in either the SIMD or MIMD modes of parallelism and can dynamically switch between modes at instruction-level granularity with generally negligible overhead. In addition, it can be partitioned into independent or communicating submachines, each having the same characteristics as the original machine. Furthermore, this reconfigurable system model uses a flexible multistage cube interconnection network, which allows the connection patterns among the processors to be varied. It is demonstrated how reconfigurability can be used by reviewing and examining five computer-vision-related algorithms, each one emphasizing a different aspect of reconfigurability. >

98 citations


Journal ArticleDOI
TL;DR: The authors study the degradation in receiver performance caused by actual interblock channel variation, and offer a measure of the rate of channel time-variations, and give guidelines for deciding when those variations can be considered slow.
Abstract: In the design of block-oriented digital communication systems that must operate over time-dispersive channels, it is usually assumed that the channel is constant over the duration of a data block, even if the channel fades. The authors study the degradation in receiver performance caused by actual interblock channel variation. For tractability, attention is restricted to the case in which the channel variations are not tracked using decision-directed adaptation. The results suggest that unless the system parameters are carefully chosen, the constant-channel assumption is far from accurate. The quantitative results also offer a measure of the rate of channel time-variations, and give guidelines for deciding when those variations can be considered slow. These guidelines can be used as first-order evaluators of important system design decisions, such as total block size, training overhead, and data rate, for particular channel conditions. >

Journal ArticleDOI
08 Jul 1992
TL;DR: Loop-breaking algorithm, identifies self-loops in a design generated by a high-level synthesis system and eliminates as many of these loops as possible by altering the register and module bindings and BINET with test cost, is a binding algorithm that takes the cost of testing into account during the binding phase of the high- level synthesis.
Abstract: The authors propose an algorithm for module and register binding which generates register-transfer-level (RTL) designs having low testability cost. They also present an algorithm for altering the register and module binding to reduce testability overhead in the final design. These algorithms were coded and several experiments were conducted to check their performance. The results of these experiments are described. The study shows that the designs produced by the method in almost all the cases have reduced testability overhead. >

Proceedings ArticleDOI
01 Jul 1992
TL;DR: A fundamental relationship between three quantities that characterize an irregular parallel computation is shown: the total available parallelism, the optimal grain size, and the statistical variance of execution times for individual tasks, which yields a dynamic scheduling algorithm that substantially reduces the overhead of executing irregular parallel operations.
Abstract: This paper develops a methodology for compiling and executing irregular parallel programs. Such programs implement parallel operations whose size and work distribution depend on input data. We show a fundamental relationship between three quantities that characterize an irregular parallel computation: the total available parallelism, the optimal grain size, and the statistical variance of execution times for individual tasks. This relationship yields a dynamic scheduling algorithm that substantially reduces the overhead of executing irregular parallel operations.We incorporated this algorithm into an extended Fortran compiler. The compiler accepts as input a subset of Fortran D which includes blocked and cyclic decompositions and perfect alignment; it outputs Fortran 77 augmented with calls to library routines written in C. For irregular parallel operations, the compiled code gathers information about available parallelism and task execution time variance and uses this information to schedule the operation. On distributed memory architectures, the compiler encodes information about data access patterns for the runtime scheduling system so that it can preserve communication locality.We evaluated these compilation techniques using a set of application programs including climate modeling, circuit simulation, and x-ray tomography, that contain irregular parallel operations. The results demonstrate that, for these applications, the dynamic techniques described here achieve near-optimal efficiency on large numbers of processors. In addition, they perform significantly better, on these problems, than any previously proposed static or dynamic scheduling algorithm.

Book ChapterDOI
01 Jan 1992
TL;DR: A membership protocol is described that is based on a multicast facility that preserves only the partial order of messages exchanged among the communicating processes and requires less synchronization overhead than existing protocols.
Abstract: Membership information is used to provide a consistent, system-wide view of which processes are currently functioning or failed in a distributed computation. This paper describes a membership protocol that is used to maintain this information. Our protocol is novel because it is based on a multicast facility that preserves only the partial order of messages exchanged among the communicating processes. Because it depends only on a partial ordering of messages rather than a total ordering, our protocol requires less synchronization overhead. The advantages of our approach are especially pronounced if multiple failures occur concurrently.

Journal ArticleDOI
TL;DR: The motivation for the RAP is described and how the architecture matches the target algorithm is shown, which is to reduce peak performance on the error back-propagation algorithm to about 50% of a linear speedup.

Proceedings ArticleDOI
03 Feb 1992
TL;DR: A quorum-based method which is highly fault tolerant and has a low message overhead is proposed, which can tradeoff fault tolerance for lower message overhead and is compared to existing algorithms.
Abstract: The problem of managing replicated copies of data in a distributed database is considered. Quorum consensus methods for managing replicated data require that an operation proceed only if a group of copies form a quorum. For example, in a majority voting scheme, for a write operation to proceed, a majority of the copies have to form a quorum. The authors first introduce a performance measure for measuring the performance of fault-tolerant algorithms for this problem. They then propose a quorum-based method which is highly fault tolerant and has a low message overhead. The algorithm can tradeoff fault tolerance for lower message overhead. The algorithm is compared to existing algorithms. >

01 Mar 1992
TL;DR: In this article, the authors introduce a simple communication mechanism, active messages, which allows communication to overlap computation and coordinate the two without sacrificing processor cost/performance. But, the communication overhead is high and it is difficult to implement in large-scale multiprocessor architectures.
Abstract: The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high communication costs. Research prototypes of messages driven machine demonstrate low communication overhead, but poor processor cost/performance. We introduce a simple communication mechanism, Active Messages, show that it is intrinsic to both architectures, allows cost effective use of the hardware, and offers tremendous flexibility. Implementations on nCUBE/2 and CM-5 are described and evaluated using a split-phase shared-memory extension to C, Split-C. We further show that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed. With this mechanism, latency tolerance becomes a programming/compiling concern. Hardware support for active messages

Journal ArticleDOI
TL;DR: In this paper, the authors present the results of investigations into a new fault location technique for overhead power distribution systems based on detecting fault-induced high-frequency components on distribution lines, which should enable the detection of discharges from the low-level breakdown of insulators, which cannot be detected by conventional methods.
Abstract: The authors present the results of investigations into a new fault location technique for overhead power distribution systems. The scheme is based on detecting fault-induced high-frequency components on distribution lines. This should enable the detection of discharges from the low-level breakdown of insulators, which cannot be detected by conventional methods. The location of a fault is determined by appropriate signal processing of the generated signals on the line. Simulation results are used to illustrate the basic features of the performance of the new scheme on a simple radial 11 kV feeder system. >

Proceedings ArticleDOI
12 May 1992
TL;DR: The authors propose an indirect adaptive scheme for incompletely controlled mechanical systems such as overhead cranes or, more generally, classical rigid manipulators ended by a simple pendulum, taking advantage of the particular structure of the equations.
Abstract: The authors propose an indirect adaptive scheme for incompletely controlled mechanical systems such as overhead cranes or, more generally, classical rigid manipulators ended by a simple pendulum. In the case of known parameters a dynamic state feedback produces full linearization. An adaptive version is obtained by a simple estimation method together with a certainty equivalence law, taking advantage of the particular structure of the equations. Global stability is discussed, and simulation results for the overhead crane are reported. >

Proceedings ArticleDOI
Lee1, Wolf1, Jha1
01 Jan 1992
TL;DR: In this paper, a data path scheduling algorithm to improve testability without assuming any particular test strategy is presented, and a scheduling heuristic for easy testability, based on previous work on data path allocation for testability is introduced.
Abstract: A data path scheduling algorithm to improve testability without assuming any particular test strategy is presented. A scheduling heuristic for easy testability, based on previous work on data path allocation for testability, is introduced. A mobility path scheduling algorithm to implement this heuristic while also minimizing area is developed. Experimental results on benchmark and example circuits show high fault coverage, short test generation time, and little or no area overhead. >

Proceedings ArticleDOI
01 Apr 1992
TL;DR: A message controller (MSC) was developed to support low-latency message passing communication for the AP1000, to minimize message handling overhead and performance evaluation.
Abstract: Low-latency communication is the key to achieving a high-performance parallel computer. In using state-of-the-art processors, we must take cache memory into account. This paper presents an architecture for low-latency message comunication and implementation, and performance evaluation.We developed a message controller (MSC) to support low-latency message passing communication for the AP1000, to minimize message handling overhead. MSC sends messages directly from cache memory and automatically receives messages in the circular buffer. We designed communication functions between cells and evaluated communication performance by running benchmark programs such as the Pingpong benchmark, the LINPACK benchmark, the SLALOM benchmark, and a solver using the scaled conjugate gradient method.

Proceedings ArticleDOI
11 Oct 1992
TL;DR: The authors propose two behavioral synthesis-for-test heuristics: improve observability and controllability of registers, and reduce sequential depth between registers that give a high fault coverage in small amounts of CPU time at a low area overhead.
Abstract: The first behavioral synthesis scheme for improving testability in data path allocation independent of test strategy is presented. The authors propose two behavioral synthesis-for-test heuristics: improve observability and controllability of registers, and reduce sequential depth between registers. Also presented are algorithms that optimize a behavior-level design using these two criteria while minimizing area. Experimental results for benchmark circuits synthesized by the author's experimental system. PHITS, show that these methods give a high fault coverage in small amounts of CPU time at a low area overhead. >

Patent
02 Mar 1992
TL;DR: In a SONET cross connect, the same physical link is used between the interfaces and the matrix to carry the overhead and the payload as mentioned in this paper, and the cross connection function within the matrix may be used to group, concentrate and route the overhead signals between a server and a matrix.
Abstract: In a SONET cross connect, the same physical link is used between the interfaces and the matrix to carry the overhead and the payload The cross connection function within the matrix may be used to group, concentrate and route the overhead signals between a server and the matrix The matrix may also be used to transport signals between servers Overhead may be grouped and transported as payload

Proceedings ArticleDOI
Nowick1, Dill1
01 Jan 1992
TL;DR: In this article, a method for exact hazard-free logic minimization of Boolean functions is described, given an incompletely specified Boolean function, the method produces a minimal sum-of-products implementation which is hazard free for a given set of multiple input changes.
Abstract: A method for exact hazard-free logic minimization of Boolean functions is described Given an incompletely specified Boolean function, the method produces a minimal sum-of-products implementation which is hazard-free for a given set of multiple-input changes, if such a solution exists The method is a constrained version of the Quine-McCluskey algorithm It has been automated and applied to a number of examples Results are compared with results of a comparable non-hazard-free method (espresso-exact) Overhead due to hazard elimination is shown to be negligible >

Journal ArticleDOI
TL;DR: The probability distribution of the overhead caused by the use of the checkpointing rollback recovery technique is evaluated and strategies based on this distribution are proposed in Laplace-Stieltjes transform form, from which all the moments can be easily calculated.
Abstract: The probability distribution of the overhead caused by the use of the checkpointing rollback recovery technique is evaluated in both cases of a single critical task and of an overall transaction-oriented system. This distribution is obtained in Laplace-Stieltjes transform form, from which all the moments can be easily calculated. Alternatively, inversion methods can be used to evaluate the distribution. The authors propose checkpointing strategies based on the above distribution in order to optimize performance criteria motivated, in the case of critical tasks, by real time constraints, and in the case of transaction-oriented systems, by the need of guaranteeing the users about the maximum system unavailability. >

Patent
24 Dec 1992
TL;DR: In this paper, the authors present a family of radix-2 structures for the computation of the DFT of a discrete signal of N samples, where u (u=2r, r=1,2,..., (log 2 N)-1) specifies the size of each data vector applied at the two input nodes of a butterfly and v represents the number of consecutive stages of the structure whose multiplication operations are merged partially or fully.
Abstract: Since the invention of the radix-2 structure for the computation of the discrete Fourier transform (DFT) by Cooley and Tukey in 1965, the DFT has been widely used for the frequency-domain analysis and design of signals and systems in communications, digital signal processing, and in other areas of science and engineering. While the Cooley-Tukey structure is simpler, regular, and efficient, it has some drawbacks such as more complex multiplications than required by higher-radix structures, and the overhead operations of bit-reversal and data-swapping. The present invention provides a large family of radix-2 structures for the computation of the DFT of a discrete signal of N samples. A member of this set of structures is characterized by two parameters, u and v, where u (u=2r, r=1,2, . . . , (log2 N)-1) specifies the size of each data vector applied at the two input nodes of a butterfly and v represents the number of consecutive stages of the structure whose multiplication operations are merged partially or fully. It is shown that the nature of the problem of computing the DFT is such that the sub-family of the structures with u=2 suits best for achieving its solution. These structures have the features that eliminate or reduce the drawbacks of the Cooley-Tukey structure while retaining its simplicity and regularity. A comprehensive description of the two most useful structures from this sub-family along with their hardware implementations is presented.

Proceedings ArticleDOI
04 Nov 1992
TL;DR: The basic REcomputing with Duplication With Comparison error-detecting adder proposed by Johnson is extended to perform error correction and time redundant multipliers that can detect and correct errors are also proposed in this paper.
Abstract: Time redundancy is an approach to achieve fault-tolerance without introducing too much hardware overhead and can be used in applications where time is not critical. The basic REcomputing with Duplication With Comparison error-detecting adder proposed by Johnson is extended to perform error correction. Time redundant multipliers that can detect and correct errors are also proposed in this paper. The hardware overhead of time redundant error correcting adders and multipliers is much lower than that of hardware or information redundancy approaches. Hence they are useful in systems where hardware complexity is the primary concern. >

Proceedings ArticleDOI
01 Jun 1992
TL;DR: An implementation based on “partially versioned” index sets is proposed, arguing that its space overhead and query-time performance make it suitable for full-text IR, with its heavy dependence on inverted indexing.
Abstract: In this paper, we present an approach to the incorporation of object versioning into a distributed full-text information retrieval system. We propose an implementation based on “partially versioned” index sets, arguing that its space overhead and query-time performance make it suitable for full-text IR, with its heavy dependence on inverted indexing. We develop algorithms for computing both historical queries and time range queries and show how these algorithms can be applied to a number of problems in distributed information management, such as data replication, caching, transactional consistency, and hybrid media repositories.

Book ChapterDOI
23 Mar 1992
TL;DR: A hybrid approach to indexing text is proposed, and it is shown how it can outperform the traditional inverted B-tree index both in storage overhead, in time to perform a retrieval, and, for dynamic databases, inTime for an insertion.
Abstract: Due to the skewed nature of the frequency distribution of term occurrence (e.g., Zipf's law) it is unlikely that any single technique for indexing text can do well in all situations. In this paper we propose a hybrid approach to indexing text, and show how it can outperform the traditional inverted B-tree index both in storage overhead, in time to perform a retrieval, and, for dynamic databases, in time for an insertion, both for single term and for multiple term queries. We demonstrate the benefits of our technique on a database of stories from the Associated Press news wire, and we provide formulae and guidelines on how to make optimal choices of the design parameters in real applications.

Proceedings ArticleDOI
10 May 1992
TL;DR: An approach that maintains the high throughput of previous schemes, yet needs lower hardware overhead and achieves higher fault coverage, is proposed, which results for the different schemes are shown.
Abstract: Algorithm-based fault tolerance (ABFT), a low-overhead technique for incorporating fault tolerance into multiprocessor architectures, is treated. An approach that maintains the high throughput of previous schemes, yet needs lower hardware overhead and achieves higher fault coverage, is proposed. Results for the different schemes are shown. >