scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 1991"


Proceedings ArticleDOI
25 Feb 1991
TL;DR: HITEC is presented, a sequential circuit test generation package to generate test patterns for sequential circuits, without assuming the use of scan techniques or a reset state, and several new techniques are introduced to improve the performance of test generation.
Abstract: This paper presents HITEC, a sequential circuit test generation package to generate test patterns for sequential circuits, without assuming the use of scan techniques or a reset state Several new techniques are introduced to improve the performance of test generation A targeted D element technique is presented, which greatly increases the number of possible mandatory assignments and reduces the over-specification of state variables which can sometimes result when using a standard PODEM algorithm A technique to use the state knowledge of previously generated vectors for state justification, without the memory overhead of a state transition diagram is presented For faults that were aborted during the standard test generation phase, knowledge that was gained about fault propagation, by the fault simulator, is used These techniques, when used together, produce the best published results for the ISCAS89 sequential benchmark circuits

673 citations



17 Sep 1991
TL;DR: The authors discuss assumptions and reasons for selecting the HVDC benchmark model as it is now, with the objectives of promoting more discussions on this subject, and evolving the degree of information and exchange among power system engineers.
Abstract: The idea of establishing a benchmark system to study certain phenomena is not new. However this is a first attempt to create a common reference for HVDC studies, especially one related to control strategies and recovery performance. Another benefit that results from such a standard system is the possible comparison of different simulation methods and results. The authors discuss assumptions and reasons for selecting the HVDC benchmark model as it is now, with the objectives of promoting more discussions on this subject, and evolving the degree of information and exchange among power system engineers. >

253 citations


01 Jan 1991
TL;DR: This work set out to develop a benchmark that could be used to evaluate DIRECT both relative to itself and relative to the "university" version of Ingres, and found it difficult to understand application-specific benchmarks.
Abstract: In 1981 as we were completing the implementation of the DIRECT database machine [DEWI79, BORA82], attention turned to evaluating its performance. At that time no standard database benchmark existed. There were only a few application-specific benchmarks. While application-specific benchmarks measure which database system is best for a particular application, it was very difficult to understand them. We were interested in a benchmark to measure DIRECT's speedup characteristics. Thus, we set out to develop a benchmark that could be used to evaluate DIRECT both relative to itself and relative to the "university" version of Ingres.

189 citations


Proceedings ArticleDOI
11 Nov 1991
TL;DR: The difficult problem of identifying the equivalence of two faults, analogous to the problem of redundancy identification in ATPG, has been solved and the efficient algorithm is demonstrated by experimental results for a set of benchmark circuits.
Abstract: The authors present an efficient algorithm for the generation of diagnostic test patterns which distinguish between two arbitrary single stuck-at faults. The algorithm is able to extend a given set of test patterns which is generated from the viewpoint of fault detection to a diagnostic test pattern set with a diagnostic resolution down to a fault equivalence class. The difficult problem of identifying the equivalence of two faults, analogous to the problem of redundancy identification in ATPG, has been solved. The efficiency of the algorithm is demonstrated by experimental results for a set of benchmark circuits. DIATEST, the implementation of the algorithm, either generates diagnostic test patterns for all distinguishable pairs of faults or identifies pairs of faults as being equivalent for each of the benchmark circuits. >

129 citations


Journal ArticleDOI
11 Nov 1991
TL;DR: A novel learning while searching iterative improvement probabilistic algorithm has been developed and is used to resolve the associated NP-complete combinatorial optimization problem.
Abstract: A transformational approach aimed at improving the resource utilization in high level synthesis is introduced. The current implementation combines retiming and associativity in a single framework. This combination of transformations results in considerable area improvements, as is amply demonstrated by benchmark examples. A novel learning while searching iterative improvement probabilistic algorithm has been developed and is used to resolve the associated NP-complete combinatorial optimization problem. The effectiveness of the proposed algorithms and the transformations is demonstrated using standard benchmark examples, with the aid of statistical analysis, and through a comparison with estimated minimal bounds. The proposed algorithm has proven to be very effective in reaching the optimal solution as well as in runtime. >

127 citations


Proceedings ArticleDOI
01 Jan 1991

118 citations


Proceedings ArticleDOI
11 Nov 1991
TL;DR: An efficient algorithm, RITUAL (residual iterative technique for updating all Lagrange multipliers), for obtaining a placement of cell-based ICs subject to performance constraints is described and yields very good results, as is shown on a set of real examples.
Abstract: An efficient algorithm, RITUAL (residual iterative technique for updating all Lagrange multipliers), for obtaining a placement of cell-based ICs subject to performance constraints is described. Using sophisticated mathematical techniques, one is able to solve large problems quickly and effectively. The algorithm is very simple and elegant, making it easy to implement. In addition, it yields very good results, as is shown on a set of real examples. The algorithm was tested on the ISCAS set of logic benchmark examples using parameters for 1 mu m CMOS technology. On average , there is a 25% improvement in the wire delay for these examples compared to TimberWolf-5.6 with a small impact on the chip area. >

117 citations


Book ChapterDOI
07 Aug 1991
TL;DR: The techniques used to hand-parallelize, for the Alliant FX/80, four Fortran programs from the Perfect-Benchmark suite have wide applicability and can be incorporated into existing translators.
Abstract: This paper discusses the techniques used to hand-parallelize, for the Alliant FX/80, four Fortran programs from the Perfect-Benchmark suite. The paper also includes the execution times of the programs before and after the transformations. The four programs considered here were not effectively parallelized by the automatic translators available to the authors. However, most of the techniques used for hand parallelization, and perhaps all of them, have wide applicability and can be incorporated into existing translators.

112 citations


Proceedings ArticleDOI
01 Jun 1991
TL;DR: A high level synthesis for testability method is presented whose objective is to generate self-testable RTL designs from data flow behavioral descriptions based on an underlying structural testability model and its connection rules.
Abstract: A high level synthesis for testability method is presented whose objective is to generate self-testable RTL designs from data flow behavioral descriptions. The approach is formulated as an allocation problem based on an underlying structural testability model and its connection rules. Two allocation techniques have been developed to solve this problem: one based on an efficient heuristic algorithm that generates cost-effective designs, the other based on an integer linear program formulation that generates optimal designs. The allocation algorithms have been implemented and several benchmark examples are presented.

107 citations


Proceedings ArticleDOI
26 Oct 1991
TL;DR: It is shown that for a class of circuits with a high fnult compatibility well-known test set compaction methods do not effectively minimize the test set, and an algorithm based on finding a maximal clique in a graph to estimate the size of a minimum test set is presented.
Abstract: Generating minimal test sets for combinational circuits is a NP-hard problem. In this paper it will be shown that for a class of circuits with a high fnult compatibility well-known test set compaction methods such as dynamic compaction and reverse order fault simulation do not effectively minimize the test set. Furthermore it will be shown for a number of benchmark circuits that it is possible to generate test sets that are significantly smaller than test sets generated by conventional test set compaction methods. This paper will also present an algorithm based on finding a maximal clique in a graph to estimate the size of a minimum test set.


Journal ArticleDOI
TL;DR: An abstract system of benchmark characteristics that makes it possible, in the beginning of the design stage, to design with benchmark performance in mind is presented.
Abstract: An abstract system of benchmark characteristics that makes it possible, in the beginning of the design stage, to design with benchmark performance in mind is presented. The benchmark characteristics for a set of commonly used benchmarks are then shown. The benchmark set used includes some benchmarks from the Systems Performance Evaluation Cooperative. The SPEC programs are industry-standard applications that use specific inputs. Processor, memory-system, and operating-system characteristics are addressed. >

Journal ArticleDOI
TL;DR: In this article, the suitability of the Chandy-Misra-Bryant (CMB) algorithm for the domain of digital logic simulation is explored based on results for six realistic benchmark circuits, one of them being the R6000 microprocessor form MIPS.
Abstract: We explore the suitability of the Chandy-Misra-Bryant (CMB) algorithm for the domain of digital logic simulation. Our evaluation is based on results for six realistic benchmark circuits, one of them being the R6000 microprocessor form MIPS. A quantitative evaluation of the concurrency exhibited by the CMB algorithm shows that an average of 42-196 element activations can be evaluated in parallel if arbitrarily many processors are available. One major factor limiting the parallel performance is the large number of deadlocks that occur. We present a classification of the types of deadlocks and describe them in terms of circuit structure. Using domain-specific knowledge, we propose and evaluate several methods for both reducing the number of deadlock occurences and for reducing the time spent on each occurence. Running on a 16-processor Encore Multimax we observe speedups of 6-9. While these self-relative speedups are larger than a parallel version of the traditional centralized-time event-driven algorithm, they come at the price of large overheads: significantly more complex element evaluations, extra element evaluations, and deadlock resolution time. These overheads overwhelm the advantages of using distributed time and consistently make the parallel performance of the CMB algorithm about three times slower than that of the traditional parallel event-driven algorithm. Our experience leads us to conclude that the distributed-time CMB algorithm does not present a viable alternative to the centralized-time event-driven algorithm in the domain of parallel digital logic simulation.

Journal ArticleDOI
TL;DR: The design of a benchmark is presented, SLALOM{trademark}, that scales automatically to the computing power available, and corrects several deficiencies in various existing benchmarks: it is highly scalable, it solves a real problem, it includes input and output times, and it can be run on parallel machines of all kinds, using any convenient language.

Proceedings ArticleDOI
23 Sep 1991
TL;DR: The authors introduce a fast, parallelizable approach to circuit clustering based on analysis of random walks in the netlist that yields good clustering solutions for classes of 'difficult' inputs in the literature and for industry benchmark circuits.
Abstract: The authors introduce a fast, parallelizable approach to circuit clustering based on analysis of random walks in the netlist. The method yields good clustering solutions for classes of 'difficult' inputs in the literature as well as for industry benchmark circuits. The authors characterize their results using a new clustering metric which facilitates comparison with future work. Extensions to a number of other CAD applications are proposed. >

01 Jan 1991
TL;DR: The Set Query benchmark chooses a list of "basic" set queries from a review of three major types of strategic data applications: document search, direct marketing, and decision support, and results are presented for two leading database products used in large scale operations.
Abstract: Many of the application systems being designed today, variously known as marketing information, decision support, and management reporting systems, aim to exploit the strategic value of operational data of a commercial enterprise. These applications depart from the row-at-a-time update transaction model of the DebitCredit [1] and TPC benchmarks, and are almost wholly dependent for their performance on what we name "set queries", queries which need to refer to data from a potentially large set of table rows for an answer. The Set Query benchmark chooses a list of "basic" set queries from a review of three major types of strategic data applications: document search, direct marketing, and decision support. In Section 1 of what follows, the data and queries used in the Set Query benchmark are explained and motivated. In Section 2, benchmark results are presented for two leading database products used in large scale operations: IBM's DB2 and Computer Corporation of America's (CCA's) MODEL 204. Surprisingly large performance differences, factors of ten or more for some queries, are observed with respect to I/O, CPU and elapsed time, emphasizing the critical value of benchmarks in this area. In Section 3, a detailed explanation is given of how to generate the data and run the benchmark on an independent platform.

Journal ArticleDOI
TL;DR: The n-queens problem is often used as a benchmark problem for AI research and in combinatorial optimization and a polynomial time algorithm for finding a solution was presented in this magazine.
Abstract: The n-queens problem is often used as a benchmark problem for AI research and in combinatorial optimization. An example is the recent article [1] in this magazine that presented a polynomial time algorithm for finding a solution. Several CPU-hours were spent finding solutions for some n up to 500,000.

Proceedings ArticleDOI
01 Sep 1991
TL;DR: The overall result is that the dynamic and static approaches are comparable in performance, and both methods achieve more than two times speedup over a high performance single-instruction-issue processor.
Abstract: This paper examines two alternative approaches to supporting code scheduling for multiple-instruction-issue processors. One is to provide a set of non-trapping instructions so that the compiler can perform aggressive static code scheduling. The application of this approach to existing commercial architectures typically requires extending the instruction set. The other approach is to support out-of-order execution in the microarchitecture so that the hardware can perform aggressive dynamic code scheduling. This approach usually does not require modifying the instruction set but requires complex hardware support. In this paper, we analyze the performance of the two alternative approaches using a set of important nonnumerical C benchmark programs. A distinguishing feature of the experiment is that the code for the dynamic approach has been optimized and scheduled as much as allowed by the architecture. The hardware is only responsible for the additional reordering that cannot be performed by the compiler. The overall result is that the dynamic and static approaches are comparable in performance. When applied to a four-instruction-issue processor, both methods achieve more than two times speedup over a high performance single-instruction-issue processor. However, the performance of each scheme varies among the benchmark programs. To explain this variation, we have identi ed the conditions in these programs that make one approach perform better than the other.

Patent
29 Jul 1991
TL;DR: In this paper, a benchmark program is run on an existing host computer, and is monitored to determine the actual sequence of instructions in the instruction set of the host, which are then converted into the corresponding sequence in the target system.
Abstract: A method and apparatus as described for predicting the performance of a computer system. A benchmark program is run on an existing host computer, and is monitored to determine the actual sequence of instructions in the instruction set of the host. These are then converted into the corresponding sequence in the instruction set of the target. The performance of the target system in executing these instructions is then determined.

Journal ArticleDOI
TL;DR: A benchmark parallel version of the Van Slyke and Wets algorithm for two-stage stochastic programs and an implementation of that algorithm on the Sequent/Balance are described and demonstrated, indicating that the benchmark implementation parallelizes well and that even with the use of parallel processing, problems with random variables having large numbers of realizations can take prohibitively large amounts of computation for solution.
Abstract: We describe a benchmark parallel version of the Van Slyke and Wets algorithm for two-stage stochastic programs and an implementation of that algorithm on the Sequent/Balance. We also report results of a numerical experiment using random test problems and our implementation. These performance results, to the best of our knowledge, are the first available for the Van Slyke and Wets algorithm on a parallel processor. They indicate that the benchmark implementation parallelizes well, and that even with the use of parallel processing, problems with random variables having large numbers of realizations can take prohibitively large amounts of computation for solution. Thus, they demonstrate the need for exploiting both parallelization and approximation for the solution of stochastic programs. 15 refs., 18 tabs.

Proceedings ArticleDOI
25 Jun 1991
TL;DR: Experimental results demonstrate that for most circuits TSUNAMI can generate test sets for all faults in fairly small amounts of time and is very efficient for hard-to-detect and redundant faults.
Abstract: An algorithm is presented for generating tests for single stuck line faults using a combination of algebraic processing and conventional path oriented search. Unlike conventional test generation algorithms, this algorithm uses algebraic methods to determine the complete set of input assignments that will propagate an error signal through a gate in a path to a primary output. The algorithm uses ordered binary decision diagrams (BDDs) for algebraic processing. For a large number of circuits that are amenable to analysis using BDDs, the algorithm is faster than previous algebraic methods. The algorithm has been implemented as the program TSUNAMI. Experimental results demonstrate that for most circuits TSUNAMI can generate test sets for all faults in fairly small amounts of time and is very efficient for hard-to-detect and redundant faults. Moreover, since a large set of tests is generated for each fault, these sets can be compacted to a very high degree. Using benchmark circuits as a reference, TSUNAMI obtains test sets up to 70% smaller than test sets generated by conventional algorithms. >

Proceedings ArticleDOI
11 Nov 1991
TL;DR: The authors consider both the maximum cliques in the horizontal constraint graph and the longest paths in the vertical constraint graph as a basis for choosing the nets to route over the cells and prove that their net selection algorithm is guaranteed to produce a solution within 68% of the optimum.
Abstract: The authors present a novel algorithm for three-layer, over-the-cell channel routing of standard cell designs. The novelty of the proposed approach lies in the use of 'vacant' terminals for over-the-cell routing. Furthermore, the authors consider both the maximum cliques in the horizontal constraint graph and the longest paths in the vertical constraint graph as a basis for choosing the nets to route over the cells. They prove that their net selection algorithm is guaranteed to produce a solution within 68% of the optimum. The proposed algorithm has been implemented and tested on several benchmark examples. For the entire PRIMARY 1 benchmark, they reduce the total routing height by 76% as compared to a two-layer channel router, which leads to a 7% reduction in chip height. >

Journal ArticleDOI
01 Dec 1991
TL;DR: The EuroBen benchmark as mentioned in this paper was developed by the EuroBen group to evaluate the performance profile of high-performance scientific computers, especially vector-and parallel architectures, using a graded approach to ensure a more general assessment of performance.
Abstract: The EuroBen group has been established in mid-1990 by a group of people that was concerned about obtaining the performance profile of high-performance scientific computers. To this end a benchmark was designed which is distributed between the members of the group. As the founders of EuroBen believe that characterisation of the performance for high-performance scientific computers cannot be done by a single performance measure, especially where vector- and parallel architectures are involved, a graded approach was used to ensure a more general assessment of the performance. In this paper we describe the structure of the EuroBen benchmark, its rationale, and the supporting activities of the EuroBen group with regard to the benchmark. In addition, new developments will be discussed.

Proceedings ArticleDOI
25 Feb 1991
TL;DR: This work addresses the problem of area prediction of VLSI layouts by presenting an approach based on two models, analytical and constructive, which permits the user to trade off the accuracy of the prediction versus the runtime of the predictor.
Abstract: The author addresses the problem of area prediction of VLSI layouts. They present an approach based on two models, analytical and constructive. A circuit design is recursively partitioned down to a level specified by the user thus generating a slicing tree. An analytical model is then used to predict the shape functions of the leaf subcircuits. By traversing the tree bottom up the shape function of the entire layout design can then be constructively predicted. This approach permits the user to trade off the accuracy of the prediction versus the run time of the predictor. Such a scheme is quite useful for high-level synthesis and system level partitioning. The experimental validation results are quite good, indicating an average error of the order of 5% in predicting shape functions for standard cell benchmark designs with sizes ranging from 125 to 12000 cells. >

Proceedings ArticleDOI
15 Apr 1991
TL;DR: The authors present several methods to enhance the performance of sequential test generation algorithms with a new circuit model, a novel learning technique, new methods to deal with testability measures and a powerful procedure to identify untestable faults.
Abstract: The authors present several methods to enhance the performance of sequential test generation algorithms. Among the innovations proposed are a new circuit model, a novel learning technique, new methods to deal with testability measures and a powerful procedure to identify untestable faults. They use an enhanced implementation of the BACK algorithm together with a set of published benchmark circuits to demonstrate the efficiency of the proposed techniques. The results show that the overall performance of the BACK algorithm is greatly improved. For many of the benchmark circuits, test generation time is reduced by more than one order of magnitude. >

Proceedings ArticleDOI
25 Feb 1991
TL;DR: A new global routing algorithm designed specifically for sea-of-gates circuits, which has been generalized to handle gate array and standard cell circuits and achieved uniform channel densities, which are the lowest values that have ever been reported.
Abstract: Describes a new global routing algorithm designed specifically for sea-of-gates circuits. The algorithm has been generalized to handle gate array and standard cell circuits. The main features of the algorithm are: (1) interconnection length minimization using a new Steiner tree generation method, (2) a two-stage coarse global routing method which seeks to even congestion, (3) a maze routing procedure which removes overflows and reduces the congestion, (4) vertical track assignment, and (5) congestion evening at the detailed global routing level. In tests on the MCNC benchmark circuits, the algorithm produced layouts with an average of 11% fewer routing tracks than the other algorithms. In tests on gate array benchmark circuits, the algorithm not only achieved uniform channel densities, but the maximum channel densities it produced are the lowest values that have ever been reported. >

Journal ArticleDOI
01 Jul 1991
TL;DR: This work proposes a new, three-step approach to simulations of modular networks, in which decomposition and mapping are taken out of the simulation program, and an implementation of a Hopfield network for image restoration is described.
Abstract: While simulation programs for single neural networks, running on parallel machines, always use a fixed problem decomposition and mapping strategy, we show that this is not possible for modular neural networks. We demonstrate this by analysing decomposition and mapping issues for a particular modular neural network model: the entropy-driven artificial neural network. The classic approach to simulations consists of two steps: first a data structure is built, describing the problem to be simulated. For a neural network this data structure contains the network topology, the interconnection strengths, etc. In a second step, this data structure is read into the simulation program, which performs a fixed decomposition and mapping before simulation can take place. Since this approach cannot be used any more for simulations of modular networks, we propose a new, three-step approach, in which decomposition and mapping are taken out of the simulation program. A compiler is used to prepare the problem data structure, a splitter program takes care of problem decomposition, and the simulator program takes the decomposed problem as its input. Since all decisions with respect to decomposition and mapping are taken by the splitter, the simulator program is independent of decomposition and mapping, and hence it can handle any decomposition and mapping. Following this approach, a machine-independent simulation environment was designed, and this design was implemented on a transputer system. To show that our approach is generic (i.e. not limited to simulations of modular networks) an implementation of a Hopfield network for image restoration is described. In spite of the classic preconceptions about generic software, performance analysis and benchmark results show that our novel, generic approach can be implemented efficiently on transputer arrays.

Proceedings ArticleDOI
30 Apr 1991
TL;DR: The Chief project provides a environment for analyzing parallel systems and aims to employ the Perfect club benchmarks to evaluate performance.
Abstract: The Chief project provides a environment for analyzing parallel systems. The project aims to employ the Perfect club benchmarks to evaluate performance. A tool (MaxPar) determines the maximum available parallelism in a program by instrumenting each program; the results are computed by running the compiled result. Three different parallel trace generators convert benchmark programs into trace input files for simulation. Parallel simulation kernels based upon event-driven and hybrid time- and event-driven models run on multiprocessors such as the Alliant FX/8. Simulation components are designed in a high-level language, and their modular design encourages reconfiguring existing components into new simulators to explore architectural variations. A graphical user-interface provides a powerful tool for simulator configuration, simulator debugging, and result visualization. >

Journal ArticleDOI
TL;DR: The prejump mechanism, implemented as a hardware solution for the jump problem, executes benchmark programs 16.8% faster on the average and Optimized microinstructions permit bitmap-manipulation instructions to perform two to five times faster than the software loops.
Abstract: A description is given of the Gmicro/100, a 32-b VLSI microprocessor based on the TRON specification. The Gmicro/100 five-stage pipeline, prejump mechanism, and bitmap manipulation are examined. Performance results are reported. They show that the prejump mechanism, implemented as a hardware solution for the jump problem, executes benchmark programs 16.8% faster on the average. Optimized microinstructions permit bitmap-manipulation instructions to perform two to five times faster than the software loops. The application-specific standard product approach used to implement Gmicro/100 is discussed. >