scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 1996"


Book ChapterDOI
01 Jan 1996
TL;DR: This chapter continues with a detailed computational study of the most powerful algorithm on 162 benchmark problems and discusses the suitability of the algorithm for either very large or very difficult JSP instances.
Abstract: In this chapter we give a survey on the GA approaches considered so far. We continue with a detailed computational study of the most powerful algorithm on 162 benchmark problems. Finally we discuss the suitability of the algorithm for either very large or very difficult JSP instances.

711 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe automatic parallelization techniques in the SUIF (Stanford University Intermediate Format) compiler that result in good multiprocessor performance for array-based numerical programs.
Abstract: This article describes automatic parallelization techniques in the SUIF (Stanford University Intermediate Format) compiler that result in good multiprocessor performance for array-based numerical programs. Parallelizing compilers for multiprocessors face many hurdles. However, SUIF's robust analysis and memory optimization techniques enabled speedups on three fourths of the NAS and SPECfp95 benchmark programs.

592 citations


Journal ArticleDOI
TL;DR: This article presents compiler optimizations to improve data locality based on a simple yet accurate cost model and finds performance improvements were difficult to achieve, but improved several programs.
Abstract: In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In the this article, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. To validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments illustrate that for kernels our model and algorithm can select and achieve the best loop structure for a nest. For over 30 complete applications, we executed the original and transformed versions and simulated cache hit rates. We collected statistics about the inherent characteristics of these programs and our ability to improve their data locality. To our knowledge, these studies are the first of such breadth and depth. We found performance improvements were difficult to achieve bacause benchmark programs typically have high hit rates even for small data caches; however, our optimizations significanty improved several programs.

566 citations


Journal ArticleDOI
TL;DR: A tabu search algorithm for the multi-depot vehicle routing problem with capacity and route length restrictions is described and is shown to outperform existing heuristics.

342 citations


Journal ArticleDOI
TL;DR: A machine-imdependent model of program execution is developed to characterize both machine performance and program execution, and a metric for program similarity is developed that makes it possible to classify benchmarks with respect to a large set of characteristics.
Abstract: Standard benchmarking provides to run-times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics) and fails to provide run-times for that program on some other machine, or some other programs on that machine. We have developed a machine-imdependent model of program execution to characterize both machine performance and program execution. By merging these machine and program characterizations, we can estimate execution time for arbitrary machine/program combinations. Our technique allows us to identify those operations, either on the machine or in the programs, which dominate the benchmark results. This information helps designers in improving the performance of future machines and users in tuning their applications to better utilize the performance of existing machines. Here we apply our methodology to characterize benchmarks and predict their execution times. We present extensive run-time statistics for a large set of benchmarks including the SPEC and Perfect Club suites. We show how these statistics can be used to identify important shortcoming in the programs. In addition, we give execution time estimates for a large sample of programs and machines and compare these against benchmark results. Finally, we develop a metric for program similarity that makes it possible to classify benchmarks with respect to a large set of characteristics.

230 citations


15 May 1996
TL;DR: The main contributions of this thesis are an 8-fold speedup and 4-fold memory size reduction over the baseline Sphinx-II system, and the improvement in speed is obtained from the following techniques: lexical tree search, phonetic fast match heuristic, and global best path search of the word lattice.
Abstract: : Advances in speech technology and computing power have created a surge of interest in the practical application of speech recognition. However, the most accurate speech recognition systems in the research world are still far too slow and expensive to be used in practical, large vocabulary continuous speech applications. Their main goal has been recognition accuracy, with emphasis on acoustic and language modelling. But practical speech recognition also requires the computation to be carried out in real time within the limited resources CPU power and memory size of commonly available computers. There has been relatively little work in this direction while preserving the accuracy of research systems. In this thesis, we focus on efficient and accurate speech recognition. It is easy to improve recognition speed and reduce memory requirements by trading away accuracy, for example by greater pruning, and using simpler acoustic and language models. It is much harder to improve both the recognition speed and reduce main memory size while preserving the accuracy. This thesis presents several techniques for improving the overall performance of the CMU Sphinx-II system. Sphinx-II employs semi-continuous hidden Markov models for acoustics and trigram language models, and is one of the premier research systems of its kind. The techniques in this thesis are validated on several widely used benchmark test sets using two vocabulary sizes of about 20K and 58K words. The main contributions of this thesis are an 8-fold speedup and 4-fold memory size reduction over the baseline Sphinx-II system. The improvement in speed is obtained from the following techniques: lexical tree search, phonetic fast match heuristic, and global best path search of the word lattice.

221 citations


Book
01 Jan 1996
TL;DR: Contents: Introduction: Mixed Analog-Digital Chips The MOSFET: Introduction and Qualitative View.
Abstract: Contents: Introduction: Mixed Analog-Digital Chips The MOSFET: Introduction and Qualitative View MOSFET DC Modeling MOSFET Small-Signal Modeling Technology and Available Circuit Components Layout Appendices: Additional MOS Transistor Modeling Information A Set of Benchmark Tests for Evaluating MOSFET Models for Analog Design A Sample Spice Input File.

195 citations


Journal ArticleDOI
TL;DR: An efficient method for selecting important input variables when building a fuzzy model from data by systematically removing premises in the fuzzy rules of this initial model to search for the best simplified model without actually generating any new models.
Abstract: We present an efficient method for selecting important input variables when building a fuzzy model from data. Past methods for input variable selection require generating different models while searching for the optimal combination of variables; our method requires generating only one model that employs all possible input variables. To determine the important variables, premises in the fuzzy rules of this initial model are systematically removed to search for the best simplified model without actually generating any new models. We also present an efficient method for generating the initial model that typically must incorporate a large number of input variables. These methods are illustrated through application to the benchmark Box and Jenkins gas furnace data; the results are compared with those of other fuzzy models found in literature.

141 citations


Book ChapterDOI
01 Jan 1996
TL;DR: A Parallel Tabu Search algorithm for the vehicle routing problem under capacity and distance restrictions and in the neighborhood search, the algorithm uses compound moves generated by an ejection chain process.
Abstract: In this paper we describe a Parallel Tabu Search algorithm for the vehicle routing problem under capacity and distance restrictions. In the neighborhood search, the algorithm uses compound moves generated by an ejection chain process. Parallel processing is used to explore the solution space more extensively and different parallel techniques are used to accelerate the search process. Tests were carried out on a network of SUNSparc workstations and computational results for a set of benchmark problems prove the efficiency of the algorithm proposed.

133 citations


Proceedings ArticleDOI
01 Sep 1996
TL;DR: It is demonstrated that compiler-directed page coloring can lead to significant performance improvements over two commonly used page mapping strategies for machines with either direct-mapped or two-way set-associative caches, and is complementary to latency-hiding techniques such as prefetching.
Abstract: This paper presents a new technique, compiler-directed page coloring, that eliminates conflict misses in multiprocessor applications. It enables applications to make better use of the increased aggregate cache size available in a multiprocessor. This technique uses the compiler's knowledge of the access patterns of the parallelized applications to direct the operating system's virtual memory page mapping strategy. We demonstrate that this technique can lead to significant performance improvements over two commonly used page mapping strategies for machines with either direct-mapped or two-way set-associative caches. We also show that it is complementary to latency-hiding techniques such as prefetching.We implemented compiler-directed page coloring in the SUIF parallelizing compiler and on two commercial operating systems. We applied the technique to the SPEC95fp benchmark suite, a representative set of numeric programs. We used the SimOS machine simulator to analyze the applications and isolate their performance bottlenecks. We also validated these results on a real machine, an eight-processor 350MHz Digital AlphaServer. Compiler-directed page coloring leads to significant performance improvements for several applications. Overall, our technique improves the SPEC95fp rating for eight processors by 8% over Digital UNIX's page mapping policy and by 20% over a page coloring, a standard page mapping policy. The SUIF compiler achieves a SPEC95fp ratio of 57.4, the highest ratio to date.

123 citations


Journal ArticleDOI
TL;DR: A prototype system named GATTO is used to assess the effectiveness of the approach in terms of result quality and CPU time requirements and the results are the best ones reported in the literature for most of the largest standard benchmark circuits.
Abstract: This paper deals with automated test pattern generation for large synchronous sequential circuits and describes an approach based on genetic algorithms. A prototype system named GATTO is used to assess the effectiveness of the approach in terms of result quality and CPU time requirements. An account is also given of a distributed version of the same algorithm, named GATTO*. Being based on the PVM library, it runs on any network of workstations and is able to either reduce the required time, or improve the result quality with respect to the monoprocessor version. In the latter case, in terms of Fault Coverage, the results are the best ones reported in the literature for most of the largest standard benchmark circuits. The flexibility of GATTO enables users to easily tradeoff fault coverage and CPU time to suit their needs.

Journal ArticleDOI
TL;DR: The results on benchmark and real circuits indicate that a large number of redundancies are found, much faster than a test-generation-based approach for redundancy identification, however, FIRE is not guaranteed to identify all redundancies in a circuit.
Abstract: FIRE is a novel Fault-Independent algorithm for combinational REdundancy identification. The algorithm is based on a simple concept that a fault which requires a conflict as a necessary condition for its detection is undetectable and hence redundant. FIRE does not use the backtracking-based exhaustive search performed by fault-oriented automatic test generation algorithms, and identifies redundant faults without any search. Our results on benchmark and real circuits indicate that we find a large number of redundancies (about 80% of the combinational redundancies in benchmark circuits), much faster than a test-generation-based approach for redundancy identification. However, FIRE is not guaranteed to identify all redundancies in a circuit.

Book ChapterDOI
01 Jul 1996
TL;DR: This work describes how to model and verify real-time systems using the formal verification tool Cospan, which supports automata-theoretic verification of coordinating processes with timing constraints.
Abstract: We describe how to model and verify real-time systems using the formal verification tool Cospan. The verifier supports automata-theoretic verification of coordinating processes with timing constraints. We discuss different heuristics, and our experiences with the tool for certain benchmark problems appearing in the verification literature.

Proceedings ArticleDOI
15 Feb 1996
TL;DR: Preliminary results indicate that compared to LUT-based FPGAs the Hybrid offers savings of more than a factor of two in terms of chip area.
Abstract: This paper proposes a new field-programmable architecture that is a combination of two existing technologies: Field Programmable Gate Arrays (FPGAs) based on LookUp Tables (LUTs), and Complex Programmable Logic Devices based on PALs/PLAs. The methodology used for development of the new architecture, called Hybrid FPGA, is based on analysis of a large set of benchmark circuits, in which we determine what types of logic resources best match the needs of the circuits. The proposed Hybrid FPGA is evaluated by manually technology mapping a set of circuits into the new architecture and estimating the total chip area needed for each circuit, compared to the area that would be required if only LUTs were available. Preliminary results indicate that compared to LUT-based FPGAs the Hybrid offers savings of more than a factor of two in terms of chip area.

Patent
28 Feb 1996
TL;DR: A benchmarking application for testing the performance of a database server (14) includes a plurality of execution parameters (82) and a program (78) operable to read the execution parameters as discussed by the authors.
Abstract: A benchmarking application for testing the performance of a database server (14) includes a plurality of execution parameters (82) and a program (78) operable to read the execution parameters (82). Processes (56, 58, 60) are generated by the program (78) in accordance with the execution parameters (82). Each process (56, 58, 60) represents a user (16, 18, 20) of the database server (14) and generates benchmark transactions (108) for submission to the database server (14).

Proceedings ArticleDOI
03 Feb 1996
TL;DR: The introduction of a new metric, called the R-metric, to evaluate the representativeness of reduced traces when applied to a wide class of processor designs and the development of a novel graph-based heuristic to generate reduced traces based on the notions incorporated in the metric.
Abstract: Performance evaluation of processor designs using dynamic instruction traces is a critical part of the iterative design process. The widening gap between the billions of instructions in such traces for benchmark programs and the throughput of timers performing the analysis in the tens of thousands of instructions per second has led to the use of reduced traces during design. This opens up the issue of whether these traces are truly representative of the actual workload in these benchmark programs. The first key result in this paper is the introduction of a new metric, called the R-metric, to evaluate the representativeness of these reduced traces when applied to a wide class of processor designs. The second key result, is the development of a novel graph-based heuristic to generate reduced traces based on the notions incorporated in the metric. These ideas have been implemented in a prototype system (SMART) for generating representative and reduced traces. Extensive experimental results are presented on various benchmarks to demonstrate the quality of the synthetic traces and the uses of the R-metric.

Book
01 Dec 1996
TL;DR: Work done to develop a benchmark problem for genetic programming, the royal tree, is a function that accounts for tree shape as part of its evaluation function, thus it controls for a parameter not often found in the GP literature.
Abstract: We report on work done to develop a benchmark problem for genetic programming, both as a diicult problem to test GP abilities and as a platform for tuning GP parameters. This benchmark, the royal tree, is a function that accounts for tree shape as part of its evaluation function, thus it controls for a parameter not often found in the GP literature. It also is a progressive function, allowing the user to set the diiculty of the problem attempted. We not only describe the function, but also report on results of using island parallelism for solving GP problems. The results obtained are somewhat surprising, as it appears that a single large population outperforms a group of smaller populations under all the conditions tested. 15.1 Introduction Given the multiplicity of GP programs that could produce the correct solution for a particular problem, it is diicult to judge the eeectiveness of various architectural changes or parameter settings on the performance of a GP system. We encountered these problems directly in the design of our genetic programming tool lilgp. When lilgp was completed, we wanted to test how well it solved a set of standard GP problems. In fact, for a new GP system it is diicult to judge whether it is performing as intended or not, since the programs it generates are not necessarily identical to those generated by other GP systems. This raised two questions: what constitutes a \standard" problem in GP, and how do we rate the performance of a system on such a problem. One of the goals of this research was to create a benchmark problem to test how well a particular GP connguration would perform as compared to other conngura-tions. Such benchmarks have existed for some time in the GA eld, in particular the royal road problems of Holland Jones 1994]. In creating the royal road, Holland addressed three issues. First, the royal road provides a proof-of-principle for the kind of diicult problems, exhibiting deception, that a genetic algorithm is capable of solving. Second, it serves as a benchmark of performance for tuning GA parameters. For example, at ICGA93, Holland claimed a specialized, properly tuned GA

Journal ArticleDOI
TL;DR: Over 25 implementations of different functional languages are benchmarked using the same program, a floatingpoint intensive application taken from molecular biology, and the principal aspects studied are compile time and execution time for the various implementations that were benchmarked.
Abstract: Over 25 implementations of different functional languages are benchmarked using the same program, a floatingpoint intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation. With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the then current GNU C compiler (GCC version 2.5.8). Compilers that generate C or Lisp are often slower than those that generate native code directly: the cost of compiling the intermediate form is normally a large fraction of the total compilation time. There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implemtations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations. The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of 'typical' applications.j Secondly, the compilers vary in the kind and level of optimisations offered, so the effort required to obtain an optimal version of the program is similarly varied.

Journal ArticleDOI
TL;DR: The proposed Genetic Algorithm for the Floorplan Area Optimization problem is based on suitable techniques for solution encoding and evaluation function definition, effective cross-over and mutation operators, and heuristic operators which further improve the method's effectiveness.
Abstract: The paper describes a Genetic Algorithm for the Floorplan Area Optimization problem. The algorithm is based on suitable techniques for solution encoding and evaluation function definition, effective cross-over and mutation operators, and heuristic operators which further improve the method's effectiveness. An adaptive approach automatically provides the optimal values for the activation probabilities of the operators. Experimental results show that the proposed method is competitive with the most effective ones as far as the CPU time requirements and the result accuracy is considered, but it also presents some advantages. It requires a limited amount of memory, it is not sensible to special structures which are critical for other methods, and has a complexity which grows linearly with the number of implementations. Finally, we demonstrate that the method is able to handle floorplans much larger (in terms of number of basic rectangles) than any benchmark previously considered in the literature.

Proceedings ArticleDOI
03 Jan 1996
TL;DR: This paper proposes a novel approach to obtain a lower bound of the maximum power consumption using Automatic Test Generation (ATG) technique and shows that this approach generates the lower bound with the quality which cannot be achieved using simulation-based techniques.
Abstract: Excessive instantaneous power consumption in VLSI circuits may reduce the reliability and performance of VLSI chips. Hence, to synthesize circuits with high reliability, it is essential to efficiently obtain a precise estimation of the maximum power dissipation. However, due to the inherent input-pattern dependence of the problem, it is intractable to conduct an exhaustive search for circuits with a large number of primary inputs. Hence, the practical approach is to generate a tight lower bound and an upper bound for maximum power dissipation within a reasonable amount of CPU time. In this paper, instead of using the traditional simulation-based techniques, we propose a novel approach to obtain a lower bound of the maximum power consumption using Automatic Test Generation (ATG) technique. Experiments with MCNC and ISCAS-85 benchmark circuits show that our approach generates the lower bound with the quality which cannot be achieved using simulation-based techniques. In addition, a Monte Carlo based technique to estimate maximum power dissipation is described. It not only serves as a comparison version for our ATG approach, but also generates a metric to measure the quality of a lower bound from a statistical point of view.

Proceedings ArticleDOI
15 Feb 1996
TL;DR: This paper presents a method for generating large random circuits with a fixed number of inputs, outputs, blocks, pins per cell, and approximate rent exponent, and finds that routability is best predicted by estimating the total wirelength in the circuit, not the mean wirelength times pins percell.
Abstract: FPLD architectures are often designed based on the results of experiments with "typical" benchmark circuits. For very large FPLDs, it may be difficult to obtain enough benchmark circuits to accurately evaluate an architecture. In this paper, we present a method for generating large random circuits with a fixed number of inputs, outputs, blocks, pins per cell, and approximate rent exponent. The circuits generated are used to evaluate several routability measures. We find that routability is best predicted by estimating the total wirelength in the circuit, not the mean wirelength times pins per cell.

Journal ArticleDOI
TL;DR: A near-optimum parallel algorithm for solving facility layout problems is presented in this paper where the problem is NP-complete and the algorithm has given improved solutions for several benchmark problems over the best existing algorithms.

Proceedings ArticleDOI
01 Jun 1996
TL;DR: It is claimed that tree based algorithms, like the one described in this paper, should be the technique of choice for basic blocks code generation with heterogeneous memory register architectures.
Abstract: In this paper we address the problem of code generation for basic blocks in heterogeneous memory-register DSP processors. We propose a new a technique, based on register-transfer paths, that can be used for efficiently dismantling basic block DAGs (Directed Acyclic Graphs) into expression trees. This approach builds on recent results which report optimal code generation algorithm for expression trees for these architectures. This technique has been implemented and experimentally validated for the TMS320C25, a popular fixed point DSP processor. The results show that good code quality can be obtained using the proposed technique. An analysis of the type of DAGs found in the DSPstone benchmark programs reveals that the majority of basic blocks in this benchmark set are expression trees and leaf DAGs. This leads to our claim that tree based algorithms, like the one described in this paper, should be the technique of choice for basic blocks code generation with heterogeneous memory register architectures.

Journal ArticleDOI
TL;DR: In this paper, the generalized integral transform technique (GITT) is employed to handle the steady two-dimensional incompressible Navier-Stokes equations in stream function-only formulation.

Journal ArticleDOI
TL;DR: This paper evaluates the IBM SP2 architecture, the AIX parallel programming environment, and the IBM message-passing library through STAP (Space-Time Adaptive Processing) benchmark experiments, and conducts a scalability analysis to reveal the performance growth rate as a function of machine size and STAP problem size.
Abstract: This paper evaluates the IBM SP2 architecture, the AIX parallel programming environment, and the IBM message-passing library (MPL) through STAP (Space-Time Adaptive Processing) benchmark experiments. Only coarse-grain parallelism was exploited on the SP2 due to its high communication overhead. A new parallelization scheme is developed for programming message passing multicomputers. Parallel STAP benchmark structures are illustrated with domain decomposition, efficient mapping of partitioned programs, and optimization of collective communication operations. We measure the SP2 performance in terms of execution time, Gflop/s rate, speedup over a single SP2 node, and overall system utilization. With 256 nodes, the Maul SP2 demonstrated the best performance of 23 Gflop/s in executing the High-Order Post-Doppler program, corresponding to a 34% system utilization. We have conducted a scalability analysis to reveal the performance growth rate as a function of machine size and STAP problem size. Important lessons learned from these parallel processing benchmark experiments are discussed in the context of real-time, adaptive, radar signal processing on massively parallel processors (MPP).


Proceedings ArticleDOI
12 Jun 1996
TL;DR: This paper discusses the applicability of the wireless MAC protocols and validates its usefulness for real-time communications based on the parameters in the benchmark table.
Abstract: This paper presents the performance analysis of the wireless medium access control (WMAC) protocol and the remote frame medium access control (RFMAC) protocol for a wireless control area network (WCAN). These two MAC protocols are suggested for distributed and centralised wireless communications respectively, as part of the complete WCAN project. The performances of the protocols are evaluated by simulating the models of the protocols using set of signals called the "SAE benchmark". The benchmark table provides an example to illustrate the application of control area network (CAN) system. This paper discusses the applicability of the wireless MAC protocols and validates its usefulness for real-time communications based on the parameters in the benchmark table.

01 Jan 1996
TL;DR: The benchmark results based on a variety of DSP algorithms in video processing, digital communication, digital filtering, and speech recognition confirm the performance, efficiency and generality of the PADDI-2 architecture.
Abstract: A data-driven multiprocessor architecture called PADDI-2 specially designed for rapid prototyping of high throughput digital signal processing algorithms is presented. Characteristics of typical high speed DSP systems were examined and the efficiencies and deficiencies of existing traditional architectures were studied to establish the architectural requirements, and to guide the architectural design. The proposed PADDI-2 architecture is a highly scalable and modular, multiple-instruction stream multiple-data stream (MIMD) architecture. It consists of a large number of fine-grain processing elements called nanoprocessors interconnected by a flexible and high-bandwidth communication network. The basic idea is that a data flow graph representing a DSP algorithm is directly mapped onto a network of nanoprocessors. The algorithm is executed by the nanoprocessors executing the operations associated with the assigned data flow nodes in a data-driven manner. High computation power is achieved by using multiple nanoprocessors to exploit the large amount of fine-grain parallelism inherent in the target algorithms. Programming flexibility is provided by the MIMD control strategy and the flexible interconnection network which can be reconfigured to handle a wide range of DSP algorithms, including those with heterogeneous communication patterns. As a proof of concept, a single-chip multiprocessor integrated circuit containing 48 16-bit nanoprocessors was designed and fabricated in a 2-metal 1-$\mu$m CMOS technology. A 2-level, area-efficient communication network occupying only 17% of the core area provides flexible and high-bandwidth inter-processor communications. Running at 50 MHz, the chip achieves 2.4 GOPS peak performance and 800 MBytes per second I/O bandwidth. An integrated development system including an assembler, a VHDL-base system simulator, and a demonstration board has been developed for PADDI-2 program development and demonstration. The benchmark results based on a variety of DSP algorithms in video processing, digital communication, digital filtering, and speech recognition confirm the performance, efficiency and generality of the architecture. Moreover, when compared with several competitive architectures that target the same application domain, PADDI-2 is in general about 2 to 3 times better in terms of hardware efficiency.

Journal ArticleDOI
TL;DR: A way to obtain the entire cost versus delay tradeoff curve of a combinational logic circuit in an efficient way is described, and every point on the resulting curve is the global optimum of the corresponding gate sizing problem.
Abstract: The gate sizing problem is the problem of finding load drive capabilities for all gates in a given Boolean network such, that a given delay limit is kept, and the necessary cost in terms of active area usage and/or power consumption is minimal. This paper describes a way to obtain the entire cost versus delay tradeoff curve of a combinational logic circuit in an efficient way. Every point on the resulting curve is the global optimum of the corresponding gate sizing problem. The problem is solved by mapping it onto piecewise linear models in such a way, that a piecewise linear (circuit) simulator can do the job. It is shown that this setup is very efficient, and can produce tradeoff curves for large circuits (thousands of gates) in a few minutes. Benchmark results for the entire set of MCNC '91 two-level examples are given.

Proceedings ArticleDOI
10 Nov 1996
TL;DR: Two hierarchical strategies for avoiding local optima during iterative improvement are proposed: (1) Partial Clustering, and (2) Module Restructuring, which are successful in reducing both area and wire length in addition to reducing the computational time required for optimization.
Abstract: In this paper, we propose a hybrid floorplanning methodology. Two hierarchical strategies for avoiding local optima during iterative improvement are proposed: (1) Partial Clustering, and (2) Module Restructuring. These strategies work for localizing nets connecting small modules in small regions, and conceal such small modules and their nets during the iterative improvement phase. This method is successful in reducing both area and wire length in addition to reducing the computational time required for optimization. Although our method only searches slicing floorplans, the results are superior to the results obtained even with non-slicing floorplans. We applied our method to the largest MCNC floorplan benchmark example, ami49, and industrial data. For the ami49 benchmark, we obtained results superior to any published results for both estimated area and routing results.