Showing papers on "Overhead (computing) published in 1996"

PDF

Open Access

Journal Article•DOI•

[...]

Rakesh Agrawal¹, J.C. Shafer²•Institutions (2)

01 Dec 1996-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work considers the problem of mining association rules on a shared nothing multiprocessor and presents three algorithms that explore a spectrum of trade-offs between computation, communication, memory usage, synchronization, and the use of problem specific information.

...read moreread less

Abstract: We consider the problem of mining association rules on a shared nothing multiprocessor. We present three algorithms that explore a spectrum of trade-offs between computation, communication, memory usage, synchronization, and the use of problem specific information. The best algorithm exhibits near perfect scaleup behavior, yet requires only minimal overhead compared to the current best serial algorithm.

...read moreread less

1,121 citations

Journal Article•DOI•

Efficient mining of association rules in distributed databases

[...]

David W. Cheung¹, Vincent To Yee Ng², Ada Wai-Chee Fu³, Yongjian Fu⁴•Institutions (4)

University of Hong Kong¹, Hong Kong Polytechnic University², The Chinese University of Hong Kong³, Simon Fraser University⁴

01 Dec 1996-IEEE Transactions on Knowledge and Data Engineering

TL;DR: An efficient algorithm called DMA (Distributed Mining of Association rules), which generates a small number of candidate sets and requires only O(n) messages for support-count exchange for each candidate set, in distributed databases.

...read moreread less

Abstract: Many sequential algorithms have been proposed for the mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In this study, an efficient algorithm called DMA (Distributed Mining of Association rules), is proposed. It generates a small number of candidate sets and requires only O(n) messages for support-count exchange for each candidate set, where n is the number of sites in a distributed database. The algorithm has been implemented on an experimental testbed, and its performance is studied. The results show that DMA has superior performance, when compared with the direct application of a popular sequential algorithm, in distributed databases.

...read moreread less

365 citations

Proceedings Article•DOI•

Dealing with disaster: surviving misbehaved kernel extensions

[...]

Margo Seltzer, Yasuhiro Endo, Christopher Small, Keith A. Smith

28 Oct 1996

TL;DR: This paper explains how VINO uses software fault isolation as its safety mechanism and a lightweight transaction system to cope with resource-hoarding and finds that while the overhead of these techniques is high relative to the cost of the extensions themselves, it is lowrelative to the benefits that extensibility brings.

...read moreread less

Abstract: Today’s extensible operating systems allow applications to modify kernel behavior by providing mechanisms for application code to run in the kernel address space. The advantage of this approach is that it provides improved application flexibility and performance; the disadvantage is that buggy or malicious code can jeopardize the integrity of the kernel. It has been demonstrated that it is feasible to use safe languages, software fault isolation, or virtual memory protection to safeguard the main kernel. However, such protection mechanisms do not address the full range of problems, such as resource hoarding, that can arise when application code is introduced into the kernel. In this paper, we present an analysis of extension mechanisms in the VINO kernel. VINO uses software fault isolation as its safety mechanism and a lightweight transaction system to cope with resource-hoarding. We explain how these two mechanisms are sufficient to protect against a large class of errant or malicious extensions, and we quantify the overhead that this protection introduces. We find that while the overhead of these techniques is high relative to the cost of the extensions themselves, it is low relative to the benefits that extensibility brings.

...read moreread less

343 citations

Proceedings Article•

Optimization of Queries with User-defined Predicates

[...]

Surajit Chaudhuri¹, Kyuseok Shim²•Institutions (2)

Microsoft¹, Bell Labs²

03 Sep 1996

TL;DR: In this paper, the authors present an optimization algorithm with complete rank-ordering, which is polynomial in the number of user-defined predicates (for a given number of relations).

...read moreread less

Abstract: Relational databases provide the ability to store user-defined functions and predicates which can be invoked in SQL queries. When evaluation of a user-defined predicate is relatively expensive, the traditional method of evaluating predicates as early as possible is no longer a sound heuristic. There are two previous approaches for optimizing such queries. However, neither is able to guarantee the optimal plan over the desired execution space. We present efficient techniques that are able to guarantee the choice of an optimal plan over the desired execution space. The optimization algorithm with complete rank-ordering improves upon the naive optimization algorithm by exploiting the nature of the cost formulas for join methods and is polynomial in the number of user-defined predicates (for a given number of relations.) We also propose pruning rules that significantly reduce the cost of searching the execution space for both the naive algorithm as well as for the optimization algorithm with complete rank-ordering, without compromising optimality. We also propose a conservative local heuristic that is simpler and has low optimization overhead. Although it is not always guaranteed to find the optimal plans, it produces close to optimal plans in most cases. We discuss how, depending on application requirements, to determine the algorithm of choice. It should be emphasized that our optimization algorithms handle user-defined selections as well as user-defined join predicates uniformly. We present complexity analysis and experimental comparison of the algorithms.

...read moreread less

194 citations

Proceedings Article•DOI•

Built-in self-test of logic blocks in FPGAs (Finally, a free lunch: BIST without overhead!)

[...]

Charles E. Stroud¹, Srinivasa Konala¹, Ping Chen¹, M. Abramovici¹•Institutions (1)

University of Kentucky¹

28 Apr 1996

TL;DR: A new approach for Field Programmable Gate Array (FPGA) testing is presented that exploits the reprogrammability of FPGAs to create Built-In Self-Test (BIST) logic only during off-line test, achieving BIST without any area overhead or performance penalties to the system function implemented by the FPGA.

...read moreread less

Abstract: We present a new approach for Field Programmable Gate Array (FPGA) testing that exploits the reprogrammability of FPGAs to create Built-In Self-Test (BIST) logic only during off-line test. As a result, BIST is achieved without any area overhead or performance penalties to the system function implemented by the FPGA. Our approach is applicable to all levels of testing, achieves maximal fault coverage, and all tests are applied at-speed. We describe the BIST architecture used to test all the programmable logic blocks in an FPGA and the configurations required to implement our approach using a commercial FPGA. We also discuss implementation problems caused by CAD tool limitations and limited architectural resources, and we describe techniques which overcome these limitations.

...read moreread less

167 citations

Proceedings Article•DOI•

A low-overhead recovery technique using quasi-synchronous checkpointing

[...]

Dakshnamoorthy Manivannan¹, Mukesh Singhal¹•Institutions (1)

Ohio State University¹

27 May 1996

TL;DR: A quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it that preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery.

...read moreread less

Abstract: In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easiness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronous for single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation.

...read moreread less

132 citations

Proceedings Article•DOI•

Algorithms for deferred view maintenance

[...]

Latha S. Colby¹, Timothy G. Griffin¹, Leonid Libkin¹, Inderpal Singh Mumick¹, Howard Trickey¹ - Show less +1 more•Institutions (1)

Bell Labs¹

01 Jun 1996

TL;DR: In this article, the authors propose an algorithm to incrementally refresh a view during deferred maintenance to avoid a state bug that has artificially limited techniques previously used for deferred maintenance, by using auxiliary tables that contain information recorded since the last view refresh.

...read moreread less

Abstract: Materialized views and view maintenance are important for data warehouses, retailing, banking, and billing applications. We consider two related view maintenance problems: 1) how to maintain views after the base tables have already been modified, and 2) how to minimize the time for which the view is inaccessible during maintenance.Typically, a view is maintained immediately, as a part of the transaction that updates the base tables. Immediate maintenance imposes a significant overhead on update transactions that cannot be tolerated in many applications. In contrast, deferred maintenance allows a view to become inconsistent with its definition. A refresh operation is used to reestablish consistency. We present new algorithms to incrementally refresh a view during deferred maintenance. Our algorithms avoid a state bug that has artificially limited techniques previously used for deferred maintenance.Incremental deferred view maintenance requires auxiliary tables that contain information recorded since the last view refresh. We present three scenarios for the use of auxiliary tables and show how these impact per-transaction overhead and view refresh time. Each scenario is described by an invariant that is required to hold in all database states. We then show that, with the proper choice of auxiliary tables, it is possible to lower both per-transaction overhead and view refresh time.

...read moreread less

132 citations

Proceedings Article•DOI•

New performance driven routing techniques with explicit area/delay tradeoff and simultaneous wire sizing

[...]

John Lillis¹, Chung-Kuan Cheng¹, T.-T.Y. Lin¹, Ching-Yen Ho²•Institutions (2)

University of California, San Diego¹, LSI Corporation²

01 Jun 1996

TL;DR: This work presents new algorithms for construction of performance driven Rectilinear Steiner Trees under the Elmore delay model that derive an explicit area/delay trade-off curve and achieves this goal by limiting the solution space to the set of topologies induced by a permutation on the sinks of the net.

...read moreread less

Abstract: We present new algorithms for construction of performance driven Rectilinear Steiner Trees under the Elmore delay model. Our algorithms represent a departure from previous approaches in that we derive an explicit area/delay trade-off curve. We achieve this goal by limiting the solution space to the set of topologies induced by a permutation on the sinks of the net. This constraint allows efficient identification of optimal solutions while still providing a rich solution space. We also incorporate simultaneous wire sizing. Our technique consistently produces topologies equalling the performance of previous approaches with substantially less area overhead.

...read moreread less

127 citations

Journal Article•DOI•

Reconciling responsiveness with performance in pure object-oriented languages

[...]

Urs Hölzle¹, David Ungar²•Institutions (2)

University of California, Santa Barbara¹, Sun Microsystems Laboratories²

01 Jul 1996-ACM Transactions on Programming Languages and Systems

TL;DR: The SELF implementation described here offers novel approaches to optimization that result in a system that can execute programs significantly faster than previous systems while retaining much of the interactiveness of an interpreted system.

...read moreread less

Abstract: Dynamically dispatched calls often limit the performance of object-oriented programs, since opject-oriented programming encourages factoring code into small, reusable units, thereby increasing the frequency of these expensive operations. Frequent calls not only slow down execution with the dispatch overhead per se, but more importantly they hinder optimization by limiting the range and effectiveness of standard global optimizations. In particular, dynamically dispatched calles prevent standard interprocedual optimizations that depend on the availability of a static call graph. The SELF implementation described here offers tow novel approaches to optimization. Type feedback speculatively inlines dynamically dispatched calls based on profile information that predicts likely receiver classes. Adaptive optimization reconciles optimizing compilation with interactive performance by incrementally optimizing only the frequently executed parts of a program. When combined, these two techniques result in a system that can execute programs significantly faster than previous systems while retaining much of the interactiveness of an interpreted system.

...read moreread less

123 citations

Proceedings Article•DOI•

Neural generalized predictive control

[...]

D. Soloway¹, P.J. Haley•Institutions (1)

Langley Research Center¹

15 Sep 1996

TL;DR: This paper presents a detailed derivation of the neural generalized predictive control algorithm with Newton-Raphson as the minimization algorithm and results show convergence to a good solution within two iterations and timing data show that real-time control is possible.

...read moreread less

Abstract: An efficient implementation of generalized predictive control using a multilayer feedforward neural network as the plant's nonlinear model is presented. By using Newton-Raphson as the optimization algorithm, the number of iterations needed for convergence is significantly reduced from other techniques. The main cost of the Newton-Raphson algorithm is in the calculation of the Hessian, but even with this overhead the low iteration numbers make Newton-Raphson faster than other techniques and a viable algorithm for real-time control. This paper presents a detailed derivation of the neural generalized predictive control algorithm with Newton-Raphson as the minimization algorithm. Simulation results show convergence to a good solution within two iterations and timing data show that real-time control is possible. Comments about the algorithm's implementation are also included.

...read moreread less

121 citations

Modeling Communication Overhead: MPI and MPL Performance on the

[...]

Ibm Sp, Zhiwei Xu, Kai Hwang

01 Jan 1996

Proceedings Article•DOI•

The relative importance of concurrent writers and weak consistency models

[...]

Peter J. Keleher¹•Institutions (1)

University of Maryland, College Park¹

27 May 1996

TL;DR: It is found that in this environment, which is believed to be representative of distributed systems today and in the near future, the consistency model has a much higher impact on overall performance than the choice of whether to allow concurrent writers.

...read moreread less

Abstract: This paper presents a detailed comparison of the relative importance of allowing concurrent writers versus the choice of the underlying consistency model. Our comparison is based on single- and multiple-writer versions of a lazy release consistent (LRC) protocol, and a single-writer sequentially consistent protocol, all implemented in the CVM software distributed shared memory system. We find that in our environment, which we believe to be representative of distributed systems today and in the near future, the consistency model has a much higher impact on overall performance than the choice of whether to allow concurrent writers. The multiple writer LRC protocol performs an average of 9% better than the single writer LRC protocol, but 34% better than the single-writer sequentially consistent protocol. Set against this, MW-LRC required an average of 72% memory overhead, compared to 10% overhead for the single-writer protocoIs.

...read moreread less

Journal Article•DOI•

Performance optimization using template mapping for datapath-intensive high-level synthesis

[...]

M.R. Corazao¹, M. Khalaf, L.M. Guerra, Miodrag Potkonjak, Jan M. Rabaey - Show less +1 more•Institutions (1)

Intel¹

01 Aug 1996-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper introduces a new approach to performance-driven template mapping for high-level synthesis that focuses on datapath-intensive ASIC design, though the concepts are also highly applicable to compiler development.

...read moreread less

Abstract: This paper introduces a new approach to performance-driven template mapping for high-level synthesis. Template mapping, the process of mapping high-level algorithmic descriptions to specialized hardware libraries or instruction sets, involves template matching, template selection, and clock selection. Efficient algorithms for each are presented, and novel issues such as partial matching are addressed. The paper focuses on datapath-intensive ASIC design, though the concepts are also highly applicable to compiler development. Experimental results on examples from real applications show significant improvements in throughput with limited area overhead.

...read moreread less

Journal Article•DOI•

Scalable parallel computational geometry for coarse grained multicomputers

[...]

Frank Dehne¹, Andreas Fabri², Andrew Rau-Chaplin¹•Institutions (2)

Carleton University¹, French Institute for Research in Computer Science and Automation²

01 Sep 1996-International Journal of Computational Geometry and Applications

TL;DR: This work presents O(Tsequential/p+Ts(n, p)) time scalable parallel algorithms for several computational geometry problems, which use only a small number of very large messages and greatly reduces the overhead for the communication protocol between processors.

...read moreread less

Abstract: We study scalable parallel computational geometry algorithms for the coarse grained multicomputer model: p processors solving a problem on n data items, were each processor has O(n/p)≫O(1) local memory and all processors are connected via some arbitrary interconnection network (e.g. mesh, hypercube, fat tree). We present O(Tsequential/p+Ts(n, p)) time scalable parallel algorithms for several computational geometry problems. Ts(n, p) refers to the time of a global sort operation. Our results are independent of the multicomputer’s interconnection network. Their time complexities become optimal when Tsequential/p dominates Ts(n, p) or when Ts(n, p) is optimal. This is the case for several standard architectures, including meshes and hypercubes, and a wide range of ratios n/p that include many of the currently available machine configurations. Our methods also have some important practical advantages: For interprocessor communication, they use only a small fixed number of one global routing operation, global sort, and all other programming is in the sequential domain. Furthermore, our algorithms use only a small number of very large messages, which greatly reduces the overhead for the communication protocol between processors. (Note however, that our time complexities account for the lengths of messages.) Experiments show that our methods are easy to implement and give good timing results.

...read moreread less

Journal Article•DOI•

Program Slicing‐Based Regression Testing Techniques

[...]

Rajiv Gupta¹, Mary Jean Harrold², Mary Lou Soffa¹•Institutions (2)

University of Pittsburgh¹, Ohio State University²

01 Jun 1996-Software Testing, Verification & Reliability

TL;DR: This paper presents a novel approach to data flow based regression testing that uses slicing algorithms for the explicit detection of definition‐use associations that are affected by a program change, without maintaining a test suite.

...read moreread less

Abstract: After changes are made to a previously tested program, a goal of regression testing is to perform retesting based on the modifications while maintaining the same testing coverage as completely retesting the program. This paper presents a novel approach to data flow based regression testing that uses slicing algorithms for the explicit detection of definition-use associations that are affected by a program change. An important benefit of this slicing technique is that, unlike previous techniques, neither data flow history nor recomputation of data flow for the entire program is required to detect affected definition-use associations. The program changes drive the recomputation of the required partial data flow through slicing. Another advantage is that the technique achieves the same testing coverage with respect to the affected definition-use associations as a complete retest of the program, without maintaining a test suite. Thus, the overhead of maintaining and updating a test suite is eliminated.

...read moreread less

Proceedings Article•DOI•

Implementation of an efficient parallel BDD package

[...]

Tony Stornetta¹, Forrest Brewer¹•Institutions (1)

University of California, Santa Barbara¹

01 Jun 1996

TL;DR: This paper presents an efficient parallel BDD package for a distributed environment such as a network of workstations or a distributed memory parallel computer that exploits a number of different forms of parallelism that can be found in depth-first algorithms.

...read moreread less

Abstract: Large BDD applications push completing resources to their limits. One solution to overcoming resource limitations is to distribute the BDD data structure across multiple networked workstations. This paper presents an efficient parallel BDD package for a distributed environment such as a network of workstations (NOW) or a distributed memory parallel computer. The implementation exploits a number of different forms of parallelism that can be found in depth-first algorithms. Significant effort is made to limit the communication overhead, including a two-level distributed hash table and an uncomputed cache. The package simultaneously executes multiple threads of computation on a distributed BDD.

...read moreread less

Journal Article•DOI•

Dynamic reservation multiple access (DRMA): a new multiple access scheme for personal communication systems (PCS)

[...]

Xiaoxin Qiu¹, Victor O. K. Li¹•Institutions (1)

University of Southern California¹

01 Jun 1996-Wireless Networks

TL;DR: A new reservation protocol, called Dynamic Reservation Multiple Access (DRMA), is proposed in this paper, and numerical results indicate that its performance is superior to the existing reservation protocols, especially in the integrated traffic scenario.

...read moreread less

Abstract: To improve the spectrum efficiency of integrated voice and data services in Personal Communication System (PCS), several reservation-type multiple access schemes, such as Packet Reservation Multiple Access (PRMA), Dynamic Time Division Multiple Access (D-TDMA), Resource Auction Multiple Access (RAMA), etc., have been proposed. PRMA uses the data packet itself to make a channel reservation, and is inefficient in that each unsuccessful reservation wastes one slot. However, it does not have a fixed reservation overhead and offers shorter access delay. On the other hand, fixed reservation overhead is unavoidable in both RAMA and D-TDMA. Compared to D-TDMA and PRMA, RAMA is superior in the sense that its slot assignment is independent of the traffic load. But its implementation is difficult. With these observations, a new reservation protocol, called Dynamic Reservation Multiple Access (DRMA), is proposed in this paper. With this new protocol, the success probability of channel access is greatly improved at the expense of slightly increased system complexity. It solves the problem of inefficiency in PRMA, but without introducing the fixed reservation overhead as in D-TDMA and RAMA. In addition, it is more suited to the dynamic behavior of the integrated traffic because there is no fixed boundary between voice and data slots (which is mandatory in D-TDMA and RAMA). Our numerical results indicate that its performance is superior to the existing reservation protocols, especially in the integrated traffic scenario. Moreover, the soft capacity feature is exhibited when the traffic load increases.

...read moreread less

Proceedings Article•DOI•

Stratified random sampling for power estimation

[...]

Chih-Shun Ding¹, Cheng-Ta Hsieh¹, Qing Wu¹, Massoud Pedram¹•Institutions (1)

University of Southern California¹

01 Nov 1996

TL;DR: The designer is provided with options to either improve the accuracy or the execution time when using power macro-modeling in the context of RTL simulation, and a regression estimator is described to reduce the error of the macro- modeling approach.

...read moreread less

Abstract: In this paper, we propose a statistical power evaluation framework at the RT-level. We first discuss the power macro-modeling formulation, and then propose a simple random sampling technique to alleviate the the overhead of macro-modeling during RTL simulation. Next, we describe a regression estimator to reduce the error of the macro-modeling approach. Experimental results indicate that the execution time of the simple random sampling combined with power macro-modeling is 50 X lower than that of conventional macro-modeling while the percentage error of regression estimation combined with power macro-modeling is 16 X lower than that of conventional macro-modeling. Hence, we provide the designer with options to either improve the accuracy or the execution time when using power macro-modeling in the context of RTL simulation.

...read moreread less

Journal Article•DOI•

Portable run-time support for dynamic object-oriented parallel processing

[...]

Andrew S. Grimshaw¹, Jon Weissman¹, W. Timothy Strayer¹•Institutions (1)

University of Virginia¹

01 May 1996-ACM Transactions on Computer Systems

TL;DR: This article presents the Mentat run-time system, an object-oriented parallel processing system designed to simplify the task of writing portable parallel programs for parallel machines and workstation networks, and an analysis of the minimum granularity required for application programs to overcome the run- time overhead.

...read moreread less

Abstract: Mentat is an object-oriented parallel processing system designed to simplify the task of writing portable parallel programs for parallel machines and workstation networks. The Mentat compiler and run-time system work together to automatically manage the communication and synchronization between objects. The run-time system marshals member function arguments, schedules objects on processors, and dynamically constructs and executes large-grain data dependence graphs. In this article we present the Mentat run-time system. We focus on three aspects—the software architecture, including the interface to the compiler and the structure and interaction of the principle components of the run-time system; the run-time overhead on a component-by-component basis for two platforms, a Sun SparcStation 2 and an Intel Paragon; and an analysis of the minimum granularity required for application programs to overcome the run-time overhead.

...read moreread less

Proceedings Article•DOI•

Representing control in the presence of one-shot continuations

[...]

Carl Bruggeman¹, Oscar Waddell¹, R. Kent Dybvig¹•Institutions (1)

Indiana University¹

01 May 1996

TL;DR: One-shot continuations are introduced, how they interact with traditional multi-shot continuation mechanisms are shown, and a stack-based implementation of control is described that handles both one-shot and multi- shot continuations.

...read moreread less

Abstract: Traditional first-class continuation mechanisms allow a captured continuation to be invoked multiple times. Many continuations, however, are invoked only once. This paper introduces one-shot continuations, shows how they interact with traditional multi-shot continuations, and describes a stack-based implementation of control that handles both one-shot and multi-shot continuations. The implementation eliminates the copying overhead for one-shot continuations that is inherent in multi-shot continuations.

...read moreread less

Patent•

Multiple level minimum logic network

[...]

Coke S. Reed

19 Jul 1996

TL;DR: In this article, a data flow technique that is based on timing and positioning of messages communicating through the interconnect structure is proposed to reduce the amount of control and logic structures in the inter-connect structure.

...read moreread less

Abstract: A network or interconnect structure utilizes a data flow technique that is based on timing and positioning of messages communicating through the interconnect structure. Switching control is distributed throughout multiple nodes in the structure so that a supervisory controller providing a global control function and complex logic structures are avoided. The interconnect structure operates as a "deflection" or "hot potato" system in which processing and storage overhead at each node is minimized. Elimination of a global controller and buffering at the nodes greatly reduces the amount of control and logic structures in the interconnect structure, simplifying overall control components and network interconnect components and improving speed performance of message communication.

...read moreread less

Journal Article•DOI•

Output‐SensitiveVisibility Algorithms for Dynamic Scenes with Applications to Virtual Reality

[...]

Oded Sudarsky¹, Craig Gotsman¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Aug 1996-Computer Graphics Forum

TL;DR: Two main ideas are used: first, update the spatial data structure to reflect the dynamic objects’ current positions; make this update efficient by restricting it to a small part of the data structure, and use temporal bounding volumes (TBVs) to avoid having to consider every dynamic object in each frame.

...read moreread less

Abstract: An output-sensitive visibility algorithm is one whose runtime is proportional to the number of visible graphic primitives in a scene model-not to the total number of primitives, which can be much greater The known practical output-sensitive visibility algorithms are suitable only for static scenes, because they include a heavy preprocessing stage that constructs a spatial data structure which relies on the model objects' positions. Any changes to the scene geometry might cause significant modifications to this data structure. We show how these algorithms may be adapted to dynamic scenes. Two main ideas are used: first, update the spatial data structure to reflect the dynamic objects' current positions; make this update efficient by restricting it to a small part of the data structure. Second, use temporal bounding volumes (TBVs) to avoid having to consider every dynamic object in each frame. The combination of these techniques yields efficient, output-sensitive visibility algorithms for scenes with multiple dynamic objects. The performance of our methods is shown to be significantly better than previous output-sensitive algorithms, intended for static scenes. TBVs can be adapted to applications where no prior knowledge of the objects' trajectories is available, such as virtual reality (VR), simulations etc. Furthermore, they save updates of the scene model itself, not just of the auxiliary data structure used by the visibility algorithm. They can therefore be used to greatly reduce the communications overhead in client-server VR systems, as well as in general distributed virtual environments.

...read moreread less

Proceedings Article•DOI•

Highly fault-tolerant parallel computation

[...]

Daniel A. Spielman¹•Institutions (1)

Massachusetts Institute of Technology¹

14 Oct 1996

TL;DR: Any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model and it is shown how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.

...read moreread less

Abstract: We re-introduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an error-correcting code. A computational device correctly computes a function in the coded model if its input and output, once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable even if each of their components can fail with some constant probability. We consider fine-grained parallel computations in which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using wlog/sup 0(1/)w processors and time tlog/sup 0(1)/w. The failure probability of the computation will be at most t/spl middot/exp(-w/sup 1/4 /). The codes used to communicate with our fault-tolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in O(nlog/sup 0(1)/n) sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.

...read moreread less

Journal Article•DOI•

Optimal operation of distribution networks

[...]

G.J. Peponis¹, M.P. Papadopulos¹, Nikos Hatziargyriou¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 Feb 1996-IEEE Transactions on Power Systems

TL;DR: In this article, the authors outline and validate a methodology for the optimization of MV distribution network operation, taking into account the protective scheme applied, reliability and voltage quality aspects, are attained by the installation of shunt capacitors and reconfiguration of the network.

...read moreread less

Abstract: The objective of the analysis presented is to outline and validate a methodology for the optimization of MV distribution network operation. Loss minimization, taking into account the protective scheme applied, reliability and voltage quality aspects, are attained. Loss minimization is achieved by the installation of shunt capacitors and reconfiguration of the network. Two different reconfiguration methods are applied and compared. Special attention is given to the impact of network reconfiguration on the protective scheme applied, as well as on network voltage quality and reliability. Loads are assumed to be time variable, following typical daily curves. A general optimization method, suitable for overhead and underground networks, is outlined and validated through applications.

...read moreread less

Journal Article•DOI•

Arithmetic additive generators of pseudo-exhaustive test patterns

[...]

S. Gupta¹, Janusz Rajski², Jerzy Tyszer•Institutions (2)

Nortel¹, Mentor Graphics²

01 Aug 1996-IEEE Transactions on Computers

TL;DR: This paper facilitates a BIST strategy for high performance datapath architectures that uses the functionality of existing hardware, is entirely integrated with the circuit under test, and results in at-speed testing with no performance degradation and no area overhead.

...read moreread less

Abstract: Existing built-in self-test (BIST) strategies require the use of specialized test pattern generation hardware which introduces significant area overhead and performance degradation. In this paper, we propose an entirely new approach to generate test patterns. The method is based on adders widely available in data-path architectures used in digital signal processing circuits and general purpose processors. The resultant test patterns, generated by continuously accumulating a constant value, provide a complete state coverage on subspaces of contiguous bits. This new test generation scheme, along with the recently introduced accumulator-based compaction scheme (Rajski and Tyszer, 1993) facilitates a BIST strategy for high performance datapath architectures that uses the functionality of existing hardware, is entirely integrated with the circuit under test, and results in at-speed testing with no performance degradation and no area overhead.

...read moreread less

Proceedings Article•DOI•

An on-line algorithm for checkpoint placement

[...]

Avi Ziv¹, Jehoshua Bruck¹•Institutions (1)

Advanced Technology Center¹

30 Oct 1996

TL;DR: Although the proposed algorithm uses only online knowledge about the cost of checkpointing, its behavior is close to that of the off-line optimal algorithm that uses the complete knowledge of the checkpointing cost.

...read moreread less

Abstract: Checkpointing is a common technique for reducing the time to recover from faults in computer systems. By saving intermediate states of programs in a reliable storage device, checkpointing enables one to reduce the processing time loss caused by faults. The length of the intervals between the checkpoints affects the execution time of the programs. Long intervals lead to a long re-processing time, while too-frequent checkpointing leads to a high checkpointing overhead. In this paper, we present an online algorithm for the placement of checkpoints. The algorithm uses online knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. We show how the execution time of a program using this algorithm can be analyzed. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only online knowledge about the cost of checkpointing, its behavior is close to that of the off-line optimal algorithm that uses the complete knowledge of the checkpointing cost.

...read moreread less

Book Chapter•DOI•

Scalable Update Propagation in Epidemic Replicated Databases

[...]

Michael Rabinovich¹, Narain H. Gehani¹, Alex Kononov¹•Institutions (1)

Bell Labs¹

25 Mar 1996

TL;DR: Many distributed databases use an epidemic approach to manage replicated data, in which a separate activity performs periodic pair-wise comparison of data item copies to detect and bring up to date obsolete copies.

...read moreread less

Abstract: Many distributed databases use an epidemic approach to manage replicated data. In this approach, user operations are executed on a single replica. Asynchronously, a separate activity performs periodic pair-wise comparison of data item copies to detect and bring up to date obsolete copies. The overhead due to comparison of data copies grows linearly with the number of data items in the database, which limits the scalability of the system.

...read moreread less

Journal Article•DOI•

The layout of virtual paths in ATM networks

[...]

Ornan Ori Gerstel¹, Israel Cidon², Shmuel Zaks¹•Institutions (2)

Technion – Israel Institute of Technology¹, Sun Microsystems Laboratories²

01 Dec 1996-IEEE ACM Transactions on Networking

TL;DR: An algorithm that finds a layout by decomposing the network into subnetworks and operating on each subnetwork, recursively is presented and an upper bound on the optimality of the resulting layout and a matching lower bound for the problem are proved.

...read moreread less

Abstract: We study the problem of designing a layout of virtual paths (VPs) on a given ATM network. We first define a mathematical model that captures the characteristics of virtual paths. In this model, we define the general VP layout problem, and a more restricted case; while the general case layout should cater connections between any pair of nodes in the network, the restricted case layout should only cater connections between a specific node to the other nodes. For the latter case, we present an algorithm that finds a layout by decomposing the network into subnetworks and operating on each subnetwork, recursively; we prove an upper bound on the optimality of the resulting layout and a matching lower bound for the problem, that are tight under certain realistic assumptions. Finally, we show how the solution for the restricted case is used as a building block in various solutions to more general cases (trees, meshes, K-separable networks, and general topology networks) and prove a lower bound for some of our results. The results exhibit a tradeoff between the efficiency of the call setup and both the utilization of the VP routing tables and the overhead during recovery from link disconnections.

...read moreread less

Proceedings Article•DOI•

A reversible hierarchical scheme for microcellular systems with overlaying macrocells

[...]

Roberto Beraldi, Salvatore Marano, Carlo Mastroianni

24 Mar 1996

TL;DR: A reversible hierarchical scheme characterized by the presence of handover attempts from macrocells to microcells is proposed, conceived so that the microcells are given the majority of the traffic load as they are able to operate with very high capacity, while the macrocells can better carry out their support task.

...read moreread less

Abstract: Future cellular systems are expected to use multilayered, multisized cells to cover non-homogeneous populated areas. An example in literature is given by a 2 level hierarchical architecture in which an overlaying macrocell provides a group of overflow channels utilized when a microcell, which covers a densely populated area, is not able to accommodate a new call, or a handover from another microcell. The macrocell has the higher hierarchical position, meaning that it can receive handover requests from microcells, lower in the hierarchy, as well as from other macrocells. On the contrary, a call served by the macrocell cannot handover to a microcell. This paper proposes a reversible hierarchical scheme characterized by the presence of handover attempts from macrocells to microcells. The scheme is conceived so that the microcells are given the majority of the traffic load as they are able to operate with very high capacity, while the macrocells, having lower channel utilization, can better carry out their support task. An analytical study is carried out showing that the system performance can be improved, at the expense of relatively little increase of network control overhead, when compared with the classical, i.e. nonreversible hierarchical scheme.

...read moreread less

Book Chapter•DOI•

Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs

[...]

Pedro C. Diniz¹, Martin Rinard¹•Institutions (1)

University of California, Santa Barbara¹

08 Aug 1996

TL;DR: This paper proposes a method to reduce the overhead of atomic operations in parallel computing systems by coarsening the granularity at which the computation locks objects.

...read moreread less

Abstract: Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks. In an object-based programming system the natural granularity is to give each object its own lock. Each operation can then make its execution atomic by acquiring and releasing the lock for the object that it accesses. But this fine lock granularity may have high synchronization overhead. To achieve good performance it may be necessary to reduce the overhead by coarsening the granularity at which the computation locks objects.

...read moreread less

Collapse