scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 1998"


Proceedings ArticleDOI
10 Aug 1998
TL;DR: This paper presents a foundation for the simulation and analysis of DVS algorithms applied to a benchmark suite specifically targeted for PDA devices.
Abstract: The reduction of energy consumption in microprocessors can be accomplished without impacting the peak performance through the use of dynamic voltage scaling (DVS). This approach varies the processor voltage under software control to meet dynamically varying performance requirements. This paper presents a foundation for the simulation and analysis of DVS algorithms. These algorithms are applied to a benchmark suite specifically targeted for PDA devices.

593 citations


Proceedings ArticleDOI
01 Nov 1998
TL;DR: In this paper, two new algorithms, redundant vector elimination (RVE) and essential fault reduction (EFR), were proposed for generating compact test sets for combinational circuits under the single stuck at fault model.
Abstract: This paper presents two new algorithms, Redundant Vector Elimination (RVE) and Essential Fault Reduction (EFR), for generating compact test sets for combinational circuits under the single stuck at fault model, and a new heuristic for estimating the minimum single stuck at fault test set size. These algorithms together with the dynamic compaction algorithm are incorporated into an advanced ATPG system for combinational circuits, called MinTest. MinTest found better lower bounds and generated smaller test sets than the previously published results for the ISCAS85 and full scan version of the ISCAS89 benchmark circuits.

451 citations


Proceedings ArticleDOI
Charles J. Alpert1
01 Apr 1998
TL;DR: The ISPD98 benchmark suite is introduced which consists of 18 circuits with sizes ranging from 13,000 to 210,000 modules and Experimental results for three existing partitioners are presented so that future researchers in partitioning can more easily evaluate their heuristics.
Abstract: From 1985-1993, the MCNC regularly introduced and maintained circuit benchmarks for use by the Design Automation community. However, during the last five years, no new circuits have been introduced that can be used for developing fundamental physical design applications, such as partitioning and placement. The largest circuit in the existing set of benchmark suites has over 100,000 modules, but the second largest has just over 25,000 modules, which is small by today's standards. This paper introduces the ISPD98 benchmark suite which consists of 18 circuits with sizes ranging from 13,000 to 210,000 modules. Experimental results for three existing partitioners are presented so that future researchers in partitioning can more easily evaluate their heuristics.

318 citations


Journal ArticleDOI
TL;DR: Modularity of the method is intended to fit the human organization and map well on the computing technology of concurrent processing.
Abstract: BLISS is a method for optimization of engineering systems by decomposition. It separates the system level optimization, having a relatively small number of design variables, from the potentially numerous subsystem optimizations that may each have a large number of local design variables. The subsystem optimizations are autonomous and may be conducted concurrently. Subsystem and system optimizations alternate, linked by sensitivity data, producing a design improvement in each iteration. Starting from a best guess initial design, the method improves that design in iterative cycles, each cycle comprised of two steps. In step one, the system level variables are frozen and the improvement is achieved by separate, concurrent, and autonomous optimizations in the local variable subdomains. In step two, further improvement is sought in the space of the system level variables. Optimum sensitivity data link the second step to the first. The method prototype was implemented using MATLAB and iSIGHT programming software and tested on a simplified, conceptural level supersonic business jet design, and a detailed design of an electronic device. Satisfactory convergence and favorable agreement with the benchmark results were observed. Modularity of the method is intended to fit the human organization and map well on the computing technology of concurrent processing.

263 citations


Proceedings ArticleDOI
19 Aug 1998
TL;DR: Modularity of the method is intended to fit the human organization and map well on the computing technology of concurrent processing.
Abstract: BLISS is a method for optimization of engineering systems by decomposition. It separates the system level optimization, having a relatively small number of design variables, from the potentially numerous subsystem optimizations that may each have a large number of local design variables. The subsystem optimizations are autonomous and may be conducted concurrently. Subsystem and system optimizations alternate, linked by sensitivity data, producing a design improvement in each iteration. Starting from a best guess initial design, the method improves that design in iterative cycles, each cycle comprised of two steps. In step one, the system level variables are frozen and the improvement is achieved by separate, concurrent, and autonomous optimizations in the local variable subdomains. In step two, further improvement is sought in the space of the system level variables. Optimum sensitivity data link the second step to the first. The method prototype was implemented using MATLAB and iSIGHT programming software and tested on a simplified, conceptual level supersonic business jet design, and a detailed design of an electronic device. Satisfactory convergence and favorable agreement with the benchmark results were observed. Modularity of the method is intended to fit the human organization and map well on the computing technology of concurrent processing.

241 citations


Journal ArticleDOI
TL;DR: In this paper, a benchmark problem for tracking maneuvering targets is presented, where the best tracking algorithm is the one that minimizes a weighted average of the radar energy and radar time, while satisfying a constraint of 4% on the maximum number of lost tracks.
Abstract: A benchmark problem for tracking maneuvering targets is presented. The benchmark problem involves beam pointing control of a phased array (i.e., agile beam) radar against highly maneuvering targets in the presence of false alarms (FAs) and electronic counter measurements (ECM). The testbed simulation described includes the effects of target amplitude fluctuations, beamshape, missed detections, FAs, finite resolution, target maneuvers, and track loss. Multiple waveforms are included in the benchmark so that the radar energy can be coordinated with the tracking algorithm. The ECM includes a standoff jammer (SOJ) broadcasting wideband noise and targets attempting range gate pull off (RGPO). The limits on the position and maneuverability of the targets are given along with descriptions of six target trajectories. The "best" tracking algorithm is the one that minimizes a weighted average of the radar energy and radar time, while satisfying a constraint of 4% on the maximum number of lost tracks, The radar model, the ECM techniques, the target scenarios, and performance criteria for the benchmark are presented.

200 citations


Journal ArticleDOI
TL;DR: In this article, the authors present the overview and problem definition for a benchmark structural control problem, which is a scale model of a three-storey building employing an active mass driver.
Abstract: This paper presents the overview and problem definition for a benchmark structural control problem. The structure considered—chosen because of the widespread interest in this class of systems—is a scale model of a three-storey building employing an active mass driver. A model for this structural system, including the actuator and sensors, has been developed directly from experimentally obtained data and will form the basis for the benchmark study. Control constraints and evaluation criteria are presented for the design problem. A simulation program has been developed and made available to facilitate comparison of the efficiency and merit of various control strategies. A sample control design is given to illustrate some of the design challenges. © 1998 John Wiley & Sons, Ltd.

196 citations


Book ChapterDOI
30 Mar 1998
TL;DR: It is argued that the focus should be on on-line open systems, and proposed that a standard workload should be used as a benchmark for schedulers, which will specify distributions of parallelism and runtime, as found by analyzing accounting traces.
Abstract: The evaluation of parallel job schedulers hinges on two things: the use of appropriate metrics, and the use of appropriate workloads on which the scheduler can operate. We argue that the focus should be on on-line open systems, and propose that a standard workload should be used as a benchmark for schedulers. This benchmark will specify distributions of parallelism and runtime, as found by analyzing accounting traces, and also internal structures that create different speedup and synchronization characteristics. As for metrics, we present some problems with slowdown and bounded slowdown that have been proposed recently.

191 citations


Journal ArticleDOI
TL;DR: This paper proposes a new multilevel partitioning algorithm that exploits some of the latest innovations of classical iterative partitioning approaches and presents quadrisection results which compare favorably to the partitionings obtained by the GORDIAN cell placement tool.
Abstract: Many previous works in partitioning have used some underlying clustering algorithm to improve performance. As problem sizes reach new levels of complexity, a single application of a clustering algorithm is insufficient to produce excellent solutions. Recent work has illustrated the promise of multilevel approaches. A multilevel partitioning algorithm recursively clusters the instance until its size is smaller than a given threshold, then unclusters the instance, while applying a partitioning refinement algorithm. In this paper, we propose a new multilevel partitioning algorithm that exploits some of the latest innovations of classical iterative partitioning approaches. Our method also uses a new technique to control the number of levels in our matching-based clustering algorithm. Experimental results show that our heuristic outperforms numerous existing bipartitioning heuristics with improvements ranging from 6.9 to 27.9% for 100 runs and 3.0 to 20.6% for just ten runs (while also using less CPU time). Further, our algorithm generates solutions better than the best known mincut bipartitionings for seven of the ACM/SIGDA benchmark circuits, including golem3 (which has over 100000 cells). We also present quadrisection results which compare favorably to the partitionings obtained by the GORDIAN cell placement tool. Our work in multilevel quadrisection has been used as the basis for an effective cell placement package.

171 citations


Proceedings ArticleDOI
10 Aug 1998
TL;DR: The main features of the proposed cell include a rich local-interconnect network, which drastically reduces the energy dissipated in the wiring, and a dual-voltage scheme that allows pass-transistor networks to operate at low-voltages yet maintains decent performance.
Abstract: This paper introduces an energy-efficient FPGA module, intended for embedded implementations. The main features of the proposed cell include a rich local-interconnect network, which drastically reduces the energy dissipated in the wiring, and a dual-voltage scheme that allows pass-transistor networks to operate at low-voltages yet maintains decent performance. Simulations on a benchmark set demonstrate that the proposed module succeeds in its goal of reducing energy consumption by an order of magnitude over existing implementations.

151 citations


Proceedings ArticleDOI
16 Apr 1998
TL;DR: This paper examines the performance of desktop applications running on the Microsoft Windows NT operating system on Intel x86 processors, and contrasts these applications to the programs in the integer SPEC95 benchmark suite, and shows that the desktop applications have similar characteristics to theinteger SPEC95 benchmarks for many of these metrics.
Abstract: This paper examines the performance of desktop applications running on the Microsoft Windows NT operating system on Intel x86 processors, and contrasts these applications to the programs in the integer SPEC95 benchmark suite. We present measurements of basic instruction set and program characteristics, and detailed simulation results of the way these programs use the memory system and processor branch architecture. We show that the desktop applications have similar characteristics to the integer SPEC95 benchmarks for many of these metrics. However, compared to the integer SPEC95 applications, desktop applications have larger instruction working sets, execute instructions in a greater number of unique functions, cross DLL boundaries frequently, and execute a greater number of indirect calls.

Journal ArticleDOI
TL;DR: In this paper, the design and implementation of robust controllers for the focus servo of a compact disk and the tracking servos of a hard disk mechanism is investigated. But the authors focus on the track-following performance in the presence of disk disturbances.

Journal ArticleDOI
TL;DR: The minimum-relative-entropy algorithm is a special case of a general class of algorithms for calibrating models based on stochastic control and convex optimization and shows that the algorithm has a unique solution which is stable, i.e. it depends smoothly on the input prices.
Abstract: We present an algorithm for calibrating asset-pricing models to the prices of benchmark securities. The algorithm computes the probability that minimizes the relative entropy with respect to a prior distribution and satisfies a finite number of moment constraints. These constraints arise from fitting the model to the prices of benchmark prices are studied in detail. We find that the sensitivities can be interpreted as regression coefficients of the payoffs of contingent claims on the set of payoffs of the benchmark instruments. We show that the algorithm has a unique solution which is stable, i.e. it depends smoothly on the input prices. The sensitivities of the values of contingent claims with respect to varriations in the benchmark instruments, in the risk-neutral measure. We also show that the minimum-relative-entropy algorithm is a special case of a general class of algorithms for calibrating models based on stochastic control and convex optimization. As an illustration, we use minimum-relative-entropy to construct a smooth curve of instantaneous forward rates from US LIBOR swap/FRA data and to study the corresponding sensitivities of fixed-income securities to variations in input prices.

Journal ArticleDOI
TL;DR: Augmenting the estimation technique to a conventional systolic-architecture-based VLSI motion estimation reduces the power consumption by a factor of 2, while still preserving the optimal solution and the throughput.
Abstract: This paper presents an architectural enhancement to reduce the power consumption of the full-search block-matching (FSBM) motion estimation. Our approach is based on eliminating unnecessary computation using conservative approximation. Augmenting the estimation technique to a conventional systolic-architecture-based VLSI motion estimation reduces the power consumption by a factor of 2, while still preserving the optimal solution and the throughput. A register-transfer level implementation as well as simulation results on benchmark video clips are presented.

Journal ArticleDOI
TL;DR: In this article, the exact analytical truss solutions for some "benchmark" problems, which are often used as test examples in both discretized layout optimization of trusses and variable topology shape optimization of perforated plates under plane stress, are provided.
Abstract: The aim of this paper is to provide the exact analytical truss solutions for some “benchmark” problems, which are often used as test examples in both discretized layout optimization of trusses and variable topology (or generalized) shape optimization of perforated plates under plane stress.

Proceedings ArticleDOI
16 Apr 1998
TL;DR: This paper presents Selective Eager Execution (SEE), an execution model to overcome mis-speculation penalties by executing both paths after diffident branches, and presents the micro-architecture of the PolyPath processor, which is an extension of an aggressive superscalar, out-of-order architecture.
Abstract: Control-flow misprediction penalties are a major impediment to high performance in wide-issue superscalar processors. In this paper we present Selective Eager Execution (SEE), an execution model to overcome mis-speculation penalties by executing both paths after diffident branches. We present the micro-architecture of the PolyPath processor, which is an extension of an aggressive superscalar, out-of-order architecture. The PolyPath architecture uses a novel instruction tagging and register renaming mechanism to execute instructions from multiple paths simultaneously in the same processor pipeline, while retaining maximum resource availability for single-path code sequences.Results of our execution-driven, pipeline-level simulations show that SEE can improve performance by as much as 36% for the go benchmark, and an average of 14% on SPECint95, when compared to a normal superscalar, out-of-order, speculative execution, monopath processor. Moreover, our architectural model is both elegant and practical to implement, using a small amount of additional state and control logic.

Proceedings ArticleDOI
01 Dec 1998
TL;DR: MPI-SIM as mentioned in this paper is a library for the execution driven parallel simulation of MPI programs, which can be used to predict the performance of existing MPI applications as a function of architectural characteristics, including number of processors and message communication latencies.
Abstract: This paper describes the design and implementation of MPI-SIM, a library for the execution driven parallel simulation of MPI programs. MPI-LITE, a portable library that supports multithreaded MPI, is also described. MPI-SIM, built on top of MPI-LITE, can be used to predict the performance of existing MPI programs as a function of architectural characteristics, including number of processors and message communication latencies. The simulation models can be executed sequentially or in parallel. Parallel executions of MPI-SIM models are synchronized using a set of asynchronous conservative protocols. MPI-SIM reduces synchronization overheads by exploiting the communication characteristics of the program it simulates. This paper presents validation and performance results from the use of MPI-SIM to simulate applications from the NAS Parallel Benchmark suite. Using the techniques described here, we are able to reduce the number of synchronizations in the parallel simulation as compared with the synchronous quantum protocol and are able to achieve speedups ranging from 3.2-11.9 in going from sequential to parallel simulation using 16 processors on the IBM SP2.

Journal ArticleDOI
TL;DR: Spencer et al. as mentioned in this paper defined a benchmark structural control problem for a model building configured with an active mass driver (AMD), based on a high-fidelity analytical model of a three-storey tendon-controlled structure at the National Center for Earthquake Engineering Research (NCEER).
Abstract: In a companion paper (Spencer et al.), an overview and problem definition was presented for a well-defined benchmark structural control problem for a model building configured with an Active Mass Driver (AMD). A second benchmark problem is posed here based on a high-fidelity analytical model of a three-storey, tendon-controlled structure at the National Center for Earthquake Engineering Research (NCEER). The purpose of formulating this problem is to provide another setting in which to evaluate the relative effectiveness and implementability of various structural control algorithms. To achieve a high level of realism, an evaluation model is presented in the problem definition which is derived directly from experimental data obtained for the structure. This model accurately represents the behaviour of the laboratory structure and fully incorporates actuator/sensor dynamics. As in the companion paper, the evaluation model will be considered as the real structural system. In general, controllers that are successfully implemented on the evaluation model can be expected to perform similarly in the laboratory setting. Several evaluation criteria are given, along with the associated control design constraints. © 1998 John Wiley & Sons, Ltd.

01 Jan 1998
Abstract: SKaMPI is a benchmark for MPI implementations. Its purpose is the detailed analysis of the runtime of individual MPI operations and comparison of these for different implementations of MPI. SKaMPI can be configured and tuned in many ways: operations, measurement precision, communication modes, packet sizes, number of processors used etc. The technically most interesting feature of SKaMPI are measurement mechanisms which combine accuracy, efficiency and robustness. Postprocessors support graphical presentation and comparisons of different sets of results which are collected in a public web-site. We describe the SKaMPI design and implementation and illustrate its main aspects with actual measurements.

Journal ArticleDOI
TL;DR: In this paper, the authors present a methodology for the evaluation of respirometry-based control strategies in a full scale environment, based on a methodology including simulation model, plant layout, controller and test procedure.

Proceedings ArticleDOI
01 Nov 1998
TL;DR: This paper evaluates the X86 architecture's multimedia extension (MMX) instruction set on a set of benchmarks to understand which aspects of native signal processing instruction sets are most useful, the current limitations, and how they can be utilized most efficiently.
Abstract: Many current general purpose processors are using extensions to the instruction set architecture to enhance the performance of digital signal processing (DSP) and multimedia applications. In this paper, we evaluate the X86 architecture's multimedia extension (MMX) instruction set on a set of benchmarks. Our benchmark suite includes kernels (filtering, fast Fourier transforms, and vector arithmetic) and applications (JPEG compression, Doppler radar processing, imaging, and G.722 speech encoding). Each benchmark has at least one non-MMX version in C and an MMX version that makes calls to an MMX assembly library. The versions differ in the implementation of filtering, vector arithmetic, and other relevant kernels. The observed speed up for the MMX versions of the suite ranges from less than 1.0 to 6.1. In addition to quantifying the speedup, we perform detailed instruction level profiling using Intel's VTune profiling tool. Using VTune, we profile static and dynamic instructions, microarchitecture operations, and data references to isolate the specific reasons for speedup or lack thereof. This analysis allows one to understand which aspects of native signal processing instruction sets are most useful, the current limitations, and how they can be utilized most efficiently.


Journal ArticleDOI
TL;DR: An algorithm for automatically restructuring the controllers of the data paths in which variable-latency units have been introduced is formulated, and results show an average throughput improvement exceeding 27%, at the price of a modest area increase.
Abstract: This paper introduces a novel optimization paradigm for increasing the throughput of digital systems. The basic idea consists of transforming fixed-latency units into variable-latency ones that run with a faster clock cycle. The transformation is fully automatic and can be used in conjunction with traditional design techniques to improve the overall performance of speed-critical units. In addition, we introduce procedures for reducing the area overhead of the modified units, and we formulate an algorithm for automatically restructuring the controllers of the data paths in which variable-latency units have been introduced. Results, obtained on a large set of benchmark circuits, show an average throughput improvement exceeding 27%, at the price of a modest area increase (less than 8% on average).


Journal Article
TL;DR: In this article, the authors present a new method that integrates path and timing analysis to estimate worst-case execution time on contemporary processors with complex pipelines and multi-level memory hierarchies.
Abstract: Previously published methods for estimation of the worst-case execution time on contemporary processors with complex pipelines and multi-level memory hierarchies result in overestimations owing to insufficient path and/or timing analysis. This paper presents a new method that integrates path and timing analysis to address these limitations. First, it is based on instruction-level architecture simulation techniques and thus has a potential to perform arbitrarily detailed timing analysis of hardware platforms. Second, by extending the simulation technique with the capability of handling unknown input data values, it is possible to exclude infeasible (or false) program paths in many cases, and also calculate path information, such as bounds on number of loop iterations, without the need for annotating the programs. Finally, in order to keep the number of program paths to be analyzed at a manageable level, we have extended the simulator with a path-merging strategy. This paper presents the method and particularly evaluates its capability to exclude infeasible paths based on seven benchmark programs.

Journal ArticleDOI
TL;DR: An overview of the VelociTI including architectural principles, data path, instruction set, and pipeline operation is presented, and both the C62x fixed-point CPU and the C67x floating-point CPUs are described.
Abstract: The Texas Instruments VelociTI architecture is a very long instruction word (VLIW) architecture. The TMS320C6x family of digital signal processors (DSPs) is the first to employ the VelociTI architecture, with the TMS3206201 (C6201) being the first device in this family. The C6201 is based on the fixed-point TMS320C62x (C62x) CPU. This article describes the VelociTI VLIW architecture and discusses the C62x, C67x, C6201, and the VelociTI development tools. An overview of the VelociTI including architectural principles, data path, instruction set, and pipeline operation is presented, and both the C62x fixed-point CPU and the C67x floating-point CPU are described. A summary of the C62x benchmark performance is also presented. The chip-level support outside the CPU that allows the C6201 to operate in a variety of high-performance DSP environments is also described. An overview of the C6x development environment is also given, demonstrating the breadth of the development environment and illustrating the programming methodology. The article concludes with a performance analysis of the C compiler.

Journal ArticleDOI
TL;DR: In this paper, a more practical benchmark, which is specified in terms of desired closed-loop dynamics, is proposed for performance assessment of feedback controllers in the H 2 framework, for MIMO processes.

Journal ArticleDOI
TL;DR: In this paper, a series expansion solution to the Hamilton-Jacobi-Isaacs Equation associated with the nonlinear disturbance attenuation problem was obtained for a nonlinear controller.
Abstract: In this paper, we use the theory of L2 disturbance attenuation for linear (H1) and nonlinear systems to obtain solutions to the Nonlinear Benchmark Problem (NLBP) proposed in the paper by Bupp et. al.1. By considering a series expansion solution to the Hamilton-Jacobi-Isaacs Equation associated with the nonlinear disturbance attenuation problem, we obtain a series expansion solution for a nonlinear controller. Numerical simulations compare the performance of the third order approximation of the nonlinear controller with its rst order approximation (which is the same as the linear H1 controller obtained from the linearized problem.)

Book ChapterDOI
TL;DR: Three new techniques were derived that produced good speedups when manually applied to the authors' benchmark codes and can be implemented in a parallelizing compiler and applied automatically.
Abstract: Automatic parallelization is usually believed to be less effective at exploiting implicit paralleism in sparse/irregular programs than in their dense/regular counterparts. However, not much is really known because there have been few research reports on this topic. In this work, we have studied the possibility of using an automatic parallelizing compiler to detect the parallelism in sparse/irregular programs. The study with a collection of sparse/irregular programs led us to some common loop patterns. Based on these patterns three new techniques were derived that produced good speedups when manually applied to our benchmark codes. More importantly, these parallelization methode can be implemented in a parallelizing compiler and can be applied automatically.

Book ChapterDOI
01 Jun 1998
TL;DR: A new method that integrates path and timing analysis to address limitations of previously published methods for estimation of the worst-case execution time on contemporary processors with complex pipelines and multi-level memory hierarchies is presented.
Abstract: Previously published methods for estimation of the worst-case execution time on contemporary processors with complex pipelines and multi-level memory hierarchies result in overestimations owing to insufficient path and/or timing analysis. This paper presents a new method that integrates path and timing analysis to address these limitations. First, it is based on instruction-level architecture simulation techniques and thus has a potential to perform arbitrarily detailed timing analysis of hardware platforms. Second, by extending the simulation technique with the capability of handling unknown input data values, it is possible to exclude infeasible (or false) program paths in many cases, and also calculate path information, such as bounds on number of loop iterations, without the need for annotating the programs. Finally, in order to keep the number of program paths to be analyzed at a manageable level, we have extended the simulator with a path-merging strategy. This paper presents the method and particularly evaluates its capability to exclude infeasible paths based on seven benchmark programs.