scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 1993"


Journal ArticleDOI
TL;DR: There was a significant saccadic overhead, that is, less time was required with the serial format, which allowed data access without eye movements, but the magnitude of the overhead decreased as task complexity increased.
Abstract: Information-processing time was compared for serial and spatially distributed visual presentations with performance measures that permit the separation of total time into its during-display and post-display components. For all subjects, there was a significant saccadic overhead, that is, less time was required with the serial format, which allowed data access without eye movements. However, the magnitude of the overhead decreased as task complexity increased. All subjects were able to exercise some control over the distribution of total processing time, trading off short during-display times with longer post-display times and vice versa.

365 citations


Book ChapterDOI
12 Aug 1993
TL;DR: A new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; a proof that performing fusion to maximize data locality is NP-hard; and two polynomial-time algorithms for improving data locality.
Abstract: Loop fusion is a program transformation that merges multiple loops into one. It is effective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that performing fusion to maximize data locality is NP-hard; and (3) two polynomial-time algorithms for improving data locality. These techniques also apply to loop distribution, which is shown to be essentially equivalent to loop fusion. Our approach is general enough to support other fusion heuristics. Preliminary experimental results validate our approach for improving performance by exploiting data locality and increasing the granularity of parallelism.

280 citations


Journal ArticleDOI
TL;DR: It is shown that on large problems—those for which parallel processing is ideally suited— there is often enough parallel workload so that processors are not usually idle, and the method is within a constant factor of optimal.
Abstract: This paper analytically studies the performance of a synchronous conservative parallel discrete-event simulation protocol The class of models considered simulates activity in a physical domain, and possesses a limited ability to predict future behavior Using a stochastic model, it is shown that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approaches the complexity of the average per-event overhead of a serial simulation, sometimes rapidly The method is therefore within a constant factor of optimal The result holds for the worst case “fully-connected” communication topology, where an event in any other portion of the domain can cause an event in any other protion of the domain Our analysis demonstrates that on large problems—those for which parallel processing is ideally suited— there is often enough parallel workload so that processors are not usually idle It also demonstrated the viability of the method empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor

202 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: Research is described that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated and can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.
Abstract: Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated. Using five significant, allocation-intensive C programs, we show that a great fraction of all bytes allocated are short-lived (> 90% in all cases). Furthermore, we describe an algorithm for liftetime prediction that accurately predicts the lifetimes of 42–99% of all objects allocated. We describe and simulate a storage allocator that takes adavantage of lifetime prediction of short-lived objects and show that it can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.

169 citations


Journal ArticleDOI
TL;DR: The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence with a small cost for the ease of programming offered by coherent caches and the potential for higher performance.
Abstract: The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. The hardware overhead of directory-based cache coherence in a 48-processor is examined. The data show that the overhead is only about 10-15%, which appears to be a small cost for the ease of programming offered by coherent caches and the potential for higher performance. The performance of the system is discussed, and the speedups obtained by a variety of parallel applications running on the prototype are shown. Using a sophisticated hardware performance monitor, the effectiveness of coherent caches and the relationship between an application's reference behavior and its speedup are characterized. The optimizations incorporated in the DASH protocol are evaluated in terms of their effectiveness on parallel applications and on atomic tests that stress the memory system. >

151 citations


Proceedings ArticleDOI
07 Nov 1993
TL;DR: An optimized BIST scheme based on reseeding of multiple polynomial Linear Feedback Shift Registers (LFSRs) that allows an excellent trade-off between test data storage and test application time (number of test patterns) with a very small hardware overhead.
Abstract: In this paper we describe an optimized BIST scheme based on reseeding of multiple polynomial Linear Feedback Shift Registers (LFSRs). The same LFSR that is used to generate pseudo-random patterns, is loaded with seeds from which it produces vectors that cover the testcubes of difficult to test faults. The scheme is compatible with scandesign and achieves full coverage as it is based on random patterns combined with a deterministic test set. A method for processing the test s et to allow for efficient encoding by the .scheme is described. Algorithms for Calculating LFSR seeds from the test set and for the selection and ordering of polynomials are described. Experimental results are provided for ISCAS-89 benchmark circuits to demonstrate the effectiveness of the scheme. The scheme allows an excellent trade-off between test data storage and test application time (number of test patterns) with a very small hardware overhead. We show the trade-off between test data storage and number of test patterns under the scheme.

113 citations


Proceedings ArticleDOI
01 Jul 1993
TL;DR: This work shows that for an array affinely aligned to a template, the local memory access sequence for any processor is characterized by a finite state machine of at most at most k states, and extends the framework to handle multidimensional arrays.
Abstract: Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution, and a computation involving the regular section A(l:h:s), the local memory access sequence for any processor is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time.

113 citations


Proceedings ArticleDOI
22 Jun 1993
TL;DR: The author presents efficient self-checking implementations for adders and ALUs (ripple carry, carry lookahead, carry skip schemes) that are substantially better than any other known scheme.
Abstract: The author presents efficient self-checking implementations for adders and ALUs (ripple carry, carry lookahead, carry skip schemes). Among all the known self-checking adders and ALUs the parity prediction scheme has the advantage to require the minimum overhead for the adder/ALU and the minimum overhead for the other data path blocks. It has also the advantage to be compatible with memory systems checked by parity codes. The drawback of this scheme is that it is not fault secure even for single stuck-at faults. The new designs require lower overhead than the above scheme and also they have all the other advantages of this scheme. In addition the new schemes are strongly fault secure or totally self-checking for a comprehensive fault model which includes stuck-at, stuck-on and stuck open faults. Thus, the new schemes are substantially better than any other known scheme.

111 citations


01 Jan 1993
TL;DR: In this article, the authors present results obtained using Type A and D methods from which it is concluded that simple, reliable and accurate fault locators can now be produced using modern micro-electronic technology.
Abstract: Travelling wave methods of fault location for both underground power cables and overhead power lines have been reported since 1931. Currently few, if any, of the travelling wave overhead line fault location methods which have been reported over the last 60 years are still in service, with only one Type C instrument being available commercially. The authors present results obtained using Type A and D methods from which it is concluded that simple, reliable and accurate fault locators can now be produced using modern micro-electronic technology. The new fault locators can function simultaneously as Type A and D systems. Furthermore the prohibitively high costs of previous implementations have been reduced for both the hardware and, especially, the installation and maintenance.

105 citations


Proceedings ArticleDOI
25 May 1993
TL;DR: The author examines the compare-and-swap operation in the content of contemporary bus-based shared memory multiprocessors, and it is shown that the common techniques for reducing synchronization overhead in the presence of contention are inappropriate when used as the basis for nonblocking synchronization.
Abstract: An important class of concurrent objects are those that are nonblocking, that is, whose operations are not contained within mutually exclusive critical sections. A nonblocking object can be accessed by many threads at a time, yet update protocols based on atomic compare-and-swap operations can be used to guarantee the object's consistency. The author examines the compare-and-swap operation in the content of contemporary bus-based shared memory multiprocessors, although the results generalize to distributed shared memory multiprocessors. He describes an operating system-based solution that permits the construction of a nonblocking compare-and-swap function on architectures that only support more primitive atomic primitives such as test-and-set or atomic exchange. Several locking strategies are evaluated that can be used to synthesize a compare-and-swap operation, and it is shown that the common techniques for reducing synchronization overhead in the presence of contention are inappropriate when used as the basis for nonblocking synchronization. A simple synchronization strategy is described that has good performance because it avoids much of the synchronization overhead that normally occurs when there is contention. >

105 citations


Proceedings ArticleDOI
06 Oct 1993
TL;DR: The technique of lazy checkpoint coordination, which preserves process autonomy while employing communication-induced checkpoint coordination for bounding rollback propagation is proposed, with the notion of laziness introduced to control the coordination frequency.
Abstract: The technique of lazy checkpoint coordination, which preserves process autonomy while employing communication-induced checkpoint coordination for bounding rollback propagation is proposed. The notion of laziness is introduced to control the coordination frequency and allow a flexible tradeoff between the cost of checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme. >

Proceedings ArticleDOI
01 Jul 1993
TL;DR: The Local Time Warp method for parallel discrete-event simulation is proposed and a novel synchronization scheme for it called HCTW is presented, which hierarchically combines a Conservative Time Window algorithm with Time Warp and aims at reducing cascade rollbacks, sensitivity to lookahead, and the scalability problems.
Abstract: The two main approaches to parallel discrete event simulation – conservative and optimistic – are likely to encounter some limitations when the size and complexity of the simulation system increases. For such large scale simulations, the conservative approach appears to be limited by blocking overhead and sensitivity to lookahead, whereas the optimistic approach may become prone to cascading rollbacks, state saving overhead, and demands for larger memory space. These drawbacks restrict the synchronization schemes based on each of the two approaches from scaling up. A combined approach may resolve these limitations, while preserving and utilizing potential advantages of each method. However, the schemes proposed so far integrate the two views at the same level, i.e. local to a logical process, and hence may not be able to fully solve the problems. In this paper we propose the Local Time Warp method for parallel discrete-event simulation and present a novel synchronization scheme for it called HCTW. The new scheme hierarchically combines a Conservative Time Window algorithm with Time Warp and aims at reducing cascade rollbacks, sensitivity to lookahead, and the scalability problems. Local Time Warp is believed to be suitable for parallel machines equipped with thousands of processors and thus an appropriate candidate for simulation of large and complex systems.

Proceedings ArticleDOI
01 Jun 1993
TL;DR: This paper implemented and evaluated two complementary techniques for reducing the overhead of monitoring memory updates and developed data flow algorithms that eliminate checks on some classes of write instructions but may increase the complexity of the remaining checks.
Abstract: A data breakpoint associates debugging actions with programmer-specified conditions on the memory state of an executing program. Data breakpoints provide a means for discovering program bugs that are tedious or impossible to isolate using control breakpoints alone. In practice, programmers rarely use data breakpoints, because they are either unimplemented or prohibitively slow in available debugging software. In this paper, we present the design and implementation of a practical data breakpoint facility.A data breakpoint facility must monitor all memory updates performed by the program being debugged. We implemented and evaluated two complementary techniques for reducing the overhead of monitoring memory updates. First, we checked write instructions by inserting checking code directly into the program being debugged. The checks use a segmented bitmap data structure that minimizes address lookup complexity. Second, we developed data flow algorithms that eliminate checks on some classes of write instructions but may increase the complexity of the remaining checks.We evaluated these techniques on the SPARC using the SPEC benchmarks. Checking each write instruction using a segmented bitmap achieved an average overhead of 42%. This overhead is independent of the number of breakpoints in use. Data flow analysis eliminated an average of 79% of the dynamic write checks. For scientific programs such the NAS kernels, analysis reduced write checks by a factor of ten or more. On the SPARC these optimizations reduced the average overhead to 25%.

Journal ArticleDOI
TL;DR: Simulations with image data indicate that speed-ups of up to 43 can be achieved with less than 1-dB loss in SNR and the proposed architecture achieves the desired speed-up, with little or no degradation in the convergence behavior.
Abstract: Relaxed look-ahead is presented as an attractive technique for pipelining adaptive filters. Unlike conventional look-ahead, relaxed look-ahead does not attempt to maintain the input-output (I/O) mapping between the serial and pipelined architectures but preserves the adaptation characteristics, resulting in a small hardware overhead. Relaxed look-ahead is employed to develop fine-grained pipelined architectures for least-mean-square (LMS) adaptive filtering. The proposed architecture achieves the desired speed-up, with little or no degradation in the convergence behavior. Simulation results verifying the convergence analysis results for the pipelined LMS filter are presented. The filter is then employed to develop a high-speed adaptive differential pulse-code-modulation (ADPCM) codec. The new architecture has a negligible hardware overhead which is independent of the number of quantizer levels, the predictor order and the pipelining level. Additionally, the pipelined codec has a much lower output latency than the level of pipelining. Simulations with image data indicate that speed-ups of up to 43 can be achieved with less than 1-dB loss in SNR. >

Proceedings ArticleDOI
01 Jan 1993
TL;DR: Results on the experimental observation of memory usage for incremental state saving and time performance show the effectiveness of the introduced mechanisms on the Time Warp system.
Abstract: Distributed simulation employing the Time Warp mechanism suffers from the enormous amount of memory it uses for state saving. The main goal of this paper is to address this problem by introducing an incremental state saving technique with an optimized rollback mechanism. This method is especially suitable for simulations with large simulation states and a small amount of state change from one simulation step to the next. A very common example of such a simulation environment is digital logic simulation with up to a few hundred thousand signal and element states with only a small number of signals changing between two simulation steps. The event queue-indispensable part of every discrete event simulation system-is used as a management tool for incremental state saving, thus reducing administration overhead drastically. An optimized rollback mechanism ensures that only those parts of the simulation state are restored which have to be updated to the rollback time. Results on the experimental observation of memory usage for incremental state saving and time performance show the effectiveness of the introduced mechanisms on our Time Warp system.

Proceedings ArticleDOI
17 Oct 1993
TL;DR: Partial-scan based built-in self-test (PSBIST) is a versatile design for testability (DFT) scheme, which employs pseudo-random BIST at all levels of test to achieve fault coverages greater than 98% on average, and supports deterministic partial scan at the IC level to achieve nearly 100% fault coverage.
Abstract: Partial-scan based built-in self-test (PSBIST) is a versatile design for testability (DFT) scheme, which employs pseudo-random BIST at all levels of test to achieve fault coverages greater than 98% on average, and supports deterministic partial scan at the IC level to achieve nearly 100% fault coverage. While PSBIST provides all the benefits of BIST, it incurs less area overhead and performance degradation than full scan. The area overhead is further reduced when the boundary scan cells are reconfigured for BIST usage. >

11 Jan 1993
TL;DR: This dissertation proposes and evaluates data prefetching techniques that address the data access penalty problems and suggests an approach that combines software and hardware schemes is shown to be very promising for reducing the memory latency with the least overhead.
Abstract: Recent technological advances are such that the gap between processor cycle times and memory cycle times is growing. Techniques to reduce or tolerate large memory latencies become essential for achieving high processor utilization. In this dissertation, we propose and evaluate data prefetching techniques that address the data access penalty problems. First, we propose a hardware-based data prefetching approach for reducing memory latency. The basic idea of the prefetching scheme is to keep track of data access patterns in a reference prediction table (RPT) organized as an instruction cache. It includes three variations of the design of the RPT and associated logic: generic design, a lookahead mechanism, and a correlated scheme. They differ mostly on the timing of the prefetching. We evaluate the three schemes by simulating them in a uniprocessor environment using the ten SPEC benchmarks. The results show that the prefetching scheme effectively eliminates a major portion of data access penalty and is particularly suitable to an on-chip design and a primary-secondary cache hierarchy. Next, we study and compare the substantive performance gains that could be achieved with hardware-controlled and software-directed prefetching on shared-memory multiprocessors. Simulation results indicate that both hardware and software schemes can handle programs with regular access patterns. The hardware scheme is good at manipulating dynamic information, whereas software prefetching has the flexibility of prefetching larger blocks of data and of dealing with complex data access patterns. The execution overhead of the additional prefetching instructions may decrease the software prefetching performance gains. An approach that combines software and hardware schemes is shown to be very promising for reducing the memory latency with the least overhead. Finally, we study non-blocking caches that can tolerate read and write miss penalties by exploiting the overlap between post-miss computations and data accesses. We show that hardware data prefetching caches generally outperform non-blocking caches. We derive a static instruction scheduling algorithm to order instructions at compile time. The algorithm is shown to be effective in exploiting instruction parallelism available in a basic block for non-blocking loads.

01 Jan 1993
TL;DR: This presentation focuses on the class of Conflict Resolution Algorithms, which exhibits very good performance characteristics for ‘‘bursty’’ computer communications traffic, including high capacity, low delay under light traffic conditions, and inherent stability.
Abstract: Multiple Access protocols are distributed algorithms that enable a set of geographically dispersed stations to communicate using a single, common, broadcast channel. We concentrate on the class of Conflict Resolution Algorithms. This class exhibits very good performance characteristics for ‘‘bursty’’ computer communications traffic, including high capacity, low delay under light traffic conditions, and inherent stability. One algorithm in this class achieves the highest capacity among all known multiple-access protocols for the infinite population Poisson model. Indeed, this capacity is not far from a theoretical upper bound. After surveying the most important and influential Conflict Resolution Algorithms, the emphasis in our presentation is shifted to methods for their analysis and results of their performance evaluation. We also discuss some extensions of the basic protocols and performance results for non-standard environments, such as Local Area Networks, satellite channels, channels with errors, etc., providing a comprehensive bibliography. 1. Conflict Resolution Based Random Access Protocols The ALOHA protocols were a breakthrough in the area of multiple access communications.1 They delivered, more or less, what they advertized, i.e., low delay for bursty, computer generated traffic. They suffer, however, from stability problems and low capacity.2 The next major breakthrough in the area of multiple access communications was the development of random access protocols that resolve conflicts algorithmically. The invention of Conflict Resolution Algorithms (CRAs) is usually attributed to Capetanakis [Capet78, Capet79, Capet79b], and, independently, to Tsybakov and Mikhailov [Tsyba78]. The same idea, but in a slightly different context, was also presented, earlier, by Hayes [Hayes78]. Later, it was recognized [Berge84, Wolf85] that the underlying idea had been known for a long time in the context of Group Testing [Dorfm43, Sobel59, Ungar60]. Group Testing was developed during World War II to speed up processing of syphilis blood tests. Since the administered test had high sensitivity, it was suggested [Dorfm43] that many blood samples could be pooled together. The result of the test would then be positive if, and only if, there was at least one diseased sample in the pool, in which case individual tests were administered to isolate the diseased samples. Later, it was suggested that, after the first diseased sample was isolated, the remaining samples could again be pooled for further testing. The beginning of a general theory of Group Testing can be found in [Sobel59], where, as pointed out in [Wolf85], a tree search algorithm is suggested, similar to the ones we present in section 1.2. The first application of Group Testing to communications arose when Hayes proposed a new, and more efficient, polling algorithm that he named probing [Hayes78]. Standard polling schemes are unacceptable for large sets of bursty stations because the overhead is proportional to the number of stations in the system, and independent from the amount of traffic. Hayes’ main idea was to shorten the polling cycle by having the central controller query subsets of the total population to discover if these subsets contain stations with waiting packets. If the response is negative, the total subset is ‘‘eliminated’’ in a single query. If the response is positive, the group is split into two subgroups and the process is continued, recursively, until a single active station is polled. This station is then allowed to transmit some data, which does not have to be in the form of constant size packets. Clearly, this is a reservation protocol. In subsequent papers Hayes has also considered direct transmission systems. Notice that the controller receives feedback in the form something — nothing (at least one station, or no station with waiting packets). 1.1. Basic Assumptions The protocols that will be presented in this section have been developed, and analyzed, on the basis of a set of common assumptions3 that describe a standard environment that is usually called an ALOHA-type channel. 1. Synchronous (slotted) operation: The common-receiver model of a broadcast channel is usually implicitly assumed. Furthermore, messages are split into packets of fixed size. All transmitters are (and remain) synchronized, and may initiate transmissions only at predetermined times, one packet transmission time apart. The time between two successive allowable packet transmission times is called a slot and is usually taken as the time unit. Thus, if more than one packet is transmitted during the same slot, they are ‘‘seen’’ by the receiver simultaneously, and therefore, overlap completely. 2. Errorless channel: If a given slot contains a single packet transmission, then the packet will be received correctly (by the common receiver). 1 For an introduction to the area of multiple access communications see the books by Bertsekas and Gallager [Berts92, chapter 4] and [Rom90]. Actually, chapter 4 of [Berts92] and chapter 5 of [Rom90] also present good expositions of Conflict Resolution Algorithms. 2 If no special control is exercised to stabilize the protocols, the term capacity must be taken in the ‘‘broader’’ sense of maximum throughput maintained during considerable periods of time, since the true capacity is zero [Fergu75, Fayol77, Aldou87]. However, having to stabilize the protocols detracts from their initial appeal that was mainly due to their simplicity. 3 Some of the protocols can operate with some of the assumptions weakened. When this is the case we point it out during their presentation. In section 6 we discuss protocols and analyses techniques that weaken or modify some of these assumptions.

Journal ArticleDOI
TL;DR: Methods to include in one numerical simulator a whole variety of different simulation types and the overhead involved in doing this is very small, and the amount of simulator maintenance required is greatly reduced.

Proceedings ArticleDOI
01 Jul 1993
TL;DR: An efficient parallel implementation of the Gro¨bner basis problem, a symbolic algebra application, is developed using the following techniques: a sequential algorithm was rewritten in a transition axiom style, and an application-specific scheduler was designed and tuned to get good performance.
Abstract: Parallelism with irregular patterns of data, communication and computation is hard to manage efficiently. In this paper we present a case study of the Gro¨bner basis problem, a symbolic algebra application. We developed an efficient parallel implementation using the following techniques. First, a sequential algorithm was rewritten in a transition axiom style, in which computation proceeds by non-deterministic invocations of guarded statements at multiple processors. Next, the algebraic properties of the problem were studied to modify the algorithm to ensure correctness in spite of locally inconsistent views of the share data structures. This was used to design data structures with very little overhead for maintaining consistency. Finally, an application-specific scheduler was designed and tuned to get good performance. Our distributed memory implementation achieves impressive speedups.

Journal ArticleDOI
TL;DR: The proposed scheme is shown to provide concurrent error detection capability to FFT networks with low hardware overhead, high throughput, and high fault coverage, and achieves 100% fault coverage theoretically.
Abstract: The algorithm-based fault tolerance techniques have been proposed to obtain reliable results at very low hardware overhead. Even though 100% fault coverage can be theoretically obtained by using these techniques, the system performance, i.e., fault coverage and throughput, can be drastically reduced due to many practical problems, e.g., round-off errors. A novel algorithm-based fault tolerance scheme is proposed for fast Fourier transform (FFT) networks. It is shown that the proposed scheme achieves 100% fault coverage theoretically. An accurate measure of the fault coverage for FFT networks is provided by taking the round-off error into account. The proposed scheme is shown to provide concurrent error detection capability to FFT networks with low hardware overhead, high throughput, and high fault coverage. >

Proceedings ArticleDOI
17 Oct 1993
TL;DR: This paper introduces new design and synthesis techniques that reduce the area and performance overhead of built-in self-test (BIST) architectures such as circular BIST and parallel BIST, and shows that introducing certain types of scan dependence in embedded MISRs can result in reduced overhead and improved fault coverage.
Abstract: This paper introduces new design and synthesis techniques that reduce the area and performance overhead of built-in self-test (BIST) architectures such as circular BIST and parallel BIST. Our goal is to arrange the system bistables into scan paths such that some of the BIST and scan logic is shared with the functional logic. Logic sharing is possible when scan dependence is introduced in the design. Other BIST design techniques attempt to avoid scan dependence because it can reduce the fault coverage of embedded, multiple input signature registers (MISRs). We show that introducing certain types of scan dependence in embedded MISRs can result in reduced overhead and improved fault coverage. We present our results for benchmark circuits that have been synthesized to take advantage of scan dependence in a circular BIST architecture. >

Patent
26 Jul 1993
TL;DR: In this article, a method and system for locating nomadic users in a personal communication services (PCS) system by utilizing two strategies for locating such users and a per-user criterion for determining which, if any, of the two strategies should be used.
Abstract: Method and system for locating nomadic users in a personal communication services (PCS) system by utilizing two strategies for locating such users and a per-user criterion for determining which, if any, of the two strategies should be used. The method and system augment basic two-level strategies for locating users specified in IS-41 and GSM standards of PCS systems. One strategy utilizes forwarding pointers and the other strategy utilizes per-user location caching. One per-user criterion is a call-to-mobility ratio (CMR) which is the ratio of the average rate at which a user receives calls to the average rate at which the user moves. A variation of this criterion is the local CMR (LCMR) which is the ratio of the average rate at which a user receives calls from a given registration area, to the average rate at which the user moves. The method and system reduce the average time and overhead required to locate and deliver information to such nomadic users.

Patent
02 Dec 1993
TL;DR: In this article, a ring network with a plurality of network elements communicating a signal along the network is considered, and a method for restoring transport overhead along the ring network is provided. But it is not shown how to detect a failure of the signal by a second network element and how to remove the second transport overhead break under the direction of the master network element.
Abstract: A method and system for handling transport overhead on a ring network. The system includes a ring network having a plurality of network elements communicating a signal along the network. In one embodiment, a method is provided for restoring transport overhead along the ring network. This method includes designating a first of the plurality of network elements as a master network element and inserting a first transport overhead break on the ring network by the master network element. The method further includes detecting a failure of the signal by a second network element and inserting a second transport overhead break on the ring network by the second network element in response to the detection of a failure. Finally, the method includes removing the second transport overhead break under the direction of said master network element. Another method of restoring transport overhead is disclosed which also includes the steps of designating a first of the plurality of network elements as a master network element and detecting a failure of the signal by a second network element. In addition, the method includes the step of transmitting a count to the master network element by the second network element in response to the step of detecting a failure. In addition, the count is incremented each time it passes through a network element other than the first and second network elements. Finally, the count is received by the master network element.

Proceedings Article
11 Jul 1993
TL;DR: A simple, fast coordination algorithm for the dynamic reorganization of agents in a distributed sensor network and its methodology for analyzing complex control and coordination issues without resorting to a handful of single-instance examples is presented.
Abstract: This paper presents a simple, fast coordination algorithm for the dynamic reorganization of agents in a distributed sensor network. Dynamic reorganization is a technique for adapting to the current local problemsolving situation that can both increase expected system performance and decrease the variance in performance. We compare our dynamic organization algorithm to a static algorithm with lower overhead. 'Oneshot' refers to the fact that the algorithm only uses one meta-level communication action. The other theme of this paper is our methodology for analyzing complex control and coordination issues without resorting to a handful of single-instance examples. Using a general model that we have developed of distributed sensor network environments [Decker and Lesser, 1993a], we present probabilistic performance bounds for our algorithm given any number of agents in any environment that fits our assumptions. This model also allows us to predict exactly in what situations and environments the performance benefits of dynamic reorganization outweigh the overhead.

Patent
26 Feb 1993
TL;DR: In this article, a server combines a multiplexer/demultiplexer with a circuit switch to handle both high-speed overhead and data interfaced to the server through a cross-connect.
Abstract: A server combines a multiplexer/demultiplexer with a circuit switch to handle both high-speed overhead and data interfaced to the server through a cross-connect; multiple servers may be connected to the cross-connect in a star, mesh or ring network.

Journal ArticleDOI
TL;DR: The proposed auto-repair approach is shown to improve the VLSI chip yield by a significant factor, and it can also improve the life span of the chip by automatically restructuring its memory arrays in the event of sporadic cell failures during the field use.
Abstract: It is shown how to represent the objective function of the memory repair problem as a neural-network energy function, and how to exploit the neural network's convergence property for deriving optimal repair solutions. Two algorithms have been developed using a neural network, and their performances are compared with that of the repair most (RM) algorithm. For randomly generated defect patterns, a proposed algorithm with a hill-climbing capability successfully repaired memory arrays in 98% cases, as opposed to RMs 20% cases. It is demonstrated how, by using very small silicon overhead, one can implement this algorithm in hardware within a VLSI chip for built in self repair (BISR) of memory arrays. The proposed auto-repair approach is shown to improve the VLSI chip yield by a significant factor, and it can also improve the life span of the chip by automatically restructuring its memory arrays in the event of sporadic cell failures during the field use. >

Book ChapterDOI
01 Jan 1993
TL;DR: This paper describes a hybrid (hardware/software monitor) fault injection environment and its application to a commercial fault tolerant system and its utility is demonstrated by applying it to the study of a Tandem Integrity S2 system.
Abstract: This paper describes a hybrid (hardware/software monitor) fault injection environment and its application to a commercial fault tolerant system. The hybrid environment is useful for obtaining dependability statistics and failure characteristics for a range of system components. The Software instrumentation keeps the introduced overhead small so that error propagation and control flow are not significantly affected by its presence. The Hybrid environment can be used to obtain precise measurements of instruction-level activity that would otherwise be impossible to perform with a hardware monitor alone. It is also well suited for measuring extremely short error latencies. Its utility is demonstrated by applying it to the study of a Tandem Integrity S2 system. Faults are injected into CPU registers, cache, and local memory. The effects of faults on individual user applications are studied by obtaining subsystem dependability measurements such as detection and latency statistics for cache and local memory. Instruction-level error propagation effects are also measured.

Proceedings ArticleDOI
16 Aug 1993
TL;DR: This presentation explains how communication overhead can significantly impact the performance execution of programs on distributed-memory systems.
Abstract: Communication overhead can significantly impact the performance execution of programs on distributed-memory systems.

Journal ArticleDOI
TL;DR: The authors describe an architecture for a system with a 1.25 GHz packet rate, 32-bit payload, and 100 Gb/s peak bit rate serving a few hundred user nodes, and understand the strengths and weaknesses of using all-optical soliton gates in carefully chosen applications.
Abstract: The recent demonstration of ultrafast, cascadable, all-optical soliton gates, although with long latency and at an early stage of research, opens the possibility of niche exploitation in architectures whose performance is primarily limited by the absence of a few such logic elements. A candidate system is a widely distributed, self-routing short packet, slotted ring system running at peak rates well beyond that of the conventional electronic hosts at each access node. The authors describe an architecture for a system with a 1.25 GHz packet rate, 32-bit payload, and 100 Gb/s peak bit rate serving a few hundred user nodes. An optical format is retained by through-going node traffic, so that the overhead of conversion to/from electronics is incurred only at the source and destination. This design effort has served to sharpen their understanding of the strengths and weaknesses of using such gates in carefully chosen applications. >