scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 1993"


Proceedings ArticleDOI
01 Jun 1993
TL;DR: The OO7 Benchmark is described and it is hoped that the benchmark will provide useful insight for end-users evaluating the performance of OODBMS systems and the research community will find that OO8 provides a database schema, instance, and workload that is useful for evaluating new techniques and algorithms for OodBMS implementation.
Abstract: The OO7 Benchmark represents a comprehensive test of OODBMS performance. In this paper we describe the benchmark and present performance results from its implementation in three OODBMS systems. It is our hope that the OO7 Benchmark will provide useful insight for end-users evaluating the performance of OODBMS systems; we also hope that the research community will find that OO7 provides a database schema, instance, and workload that is useful for evaluating new techniques and algorithms for OODBMS implementation.

292 citations


Proceedings Article
01 Jan 1993

273 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: This paper presents a benchmark that concisely captures the data base requirements of a collection of Earth Scientists working in the SEQUOIA 2000 project, and uses real data sets and real queries that are representative of Earth Science tasks.
Abstract: This paper presents a benchmark that concisely captures the data base requirements of a collection of Earth Scientists working in the SEQUOIA 2000 project on various aspects of global change research. This benchmark has the novel characteristic that it uses real data sets and real queries that are representative of Earth Science tasks. Because it appears that Earth Science problems are typical of the problems of engineering and scientific DBMS users, we claim that this benchmark represents the needs of this more general community. Also included in the paper are benchmark results for three example DBMSs: GRASS, IPW and POSTGRES.

229 citations


Patent
23 Dec 1993
TL;DR: In this article, the authors present a system and method to enable real-time establishment and maintenance of a standard of operation for a data communications network, which is a data set which includes network activity which is historically categorized by traffic type and by activity.
Abstract: The invention features a system and method to enable real-time establishment and maintenance of a standard of operation for a data communications network. The standard is a data set which includes network activity which is historically categorized by traffic type and by activity. The process begins with monitoring the network media or some network component over some period of time. The monitoring information is used to build benchmark data sets. The benchmark data sets contain a standard of operation for the network, which are historically categorized by either traffic type or activity. This standard of operation is constantly built by the intelligent monitoring facilities. After some period of time which is referred to as the benchmark data set refresh interval, the benchmark that was created is employed in a fashion to allow a determination as to whether the data that is taken from the current monitoring activity indicates normal network behavior. If the current network operating characteristics are outside the bounds or normal behavior, then alerts and logs of information can be sent to the expert system. The expert system can then effect some network control. In this manner, auto benchmarking is accomplished with self customization.

152 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: A modelling study of the TPC-C benchmark for both single node and distributed database management systems is presented and it is shown that close to linear scale-up can be achieved in a distributed system, assuming replication of a read-only table.
Abstract: The TPC-C benchmark is a new benchmark approved by the TPC council intended for comparing database platforms running a medium complexity transaction processing workload. Some key aspects in which this new benchmark differs from the TPC-A benchmark are in having several transaction types, some of which are more complex than that in TPC-A, and in having data access skew. In this paper we present results from a modelling study of the TPC-C benchmark for both single node and distributed database management systems. We simulate the TPC-C workload to determine expected buffer miss rates assuming an LRU buffer management policy. These miss rates are then used as inputs to a throughput model. From these models we show the following: (i) We quantify the data access skew as specified in the benchmark and show what fraction of the accesses go to what fraction of the data. (ii) We quantify the resulting buffer hit ratios for each relation as a function of buffer size. (iii) We show that close to linear scale-up (about 3% from the ideal) can be achieved in a distributed system, assuming replication of a read-only table. (iv) We examine the effect of packing hot tuples into pages and show that significant price/performance benefit can be thus achieved. (v) Finally, by coupling the buffer simulations with the throughput model, we examine typical disk/memory configurations that maximize the overall price/performance.

141 citations


Patent
Jon A. Frankle1, Mon-Ren Chene1
27 May 1993
TL;DR: In this paper, the suggested delay limits for use by layout tools which cause a programmable integrated circuit device to implement a logic design are presented, which can be used by such tools as an initial placement algorithm, a placement improvement algorithm, and a routing algorithm for evaluating and guiding potential layouts.
Abstract: The present invention provides suggested delay limits for use by layout tools which cause a programmable integrated circuit device to implement a logic design. The suggested delay limits can be used by such tools as an initial placement algorithm, a placement improvement algorithm, and a routing algorithm for evaluating and guiding potential layouts. The suggested delay limits take into account characteristics of the programmable device being used by estimating lower bound delays for each connection in a logic design, and take into account any previously achieved delays or achievable delays for each connection in calculating the suggested limits. Results of routing benchmark designs using the novel suggested limits show improved timing performance for all benchmark cases tested.

132 citations


Journal ArticleDOI
TL;DR: Benchmark results for the Numerical Aerodynamic Simulation (NAS) Program at NASA Ames Research Center, which is dedicated to advancing the science of computational aerodynamics are presented, reflecting improvements both in compilers and in implementations.
Abstract: Benchmark results for the Numerical Aerodynamic Simulation (NAS) Program at NASA Ames Research Center, which is dedicated to advancing the science of computational aerodynamics are presented. The benchmark performance results are for the Y-MP, Y-MO EL, and C-90 systems from Cray Research; the TC2000 from Bolt Baranek and Newman; the Gamma iPSC/860 from Intel; the CM-2, CM-200, and CM-5 from Thinking Machines; the CS-1 from Meiko Scientific; the MP-1 and MP-2 from MasPar Computer; and the KSR-1 from Kendall Square Research. The results for the MP-1 and -2, the KSR-1, and the CM-5 have not been published before. Many of the other results are improved from previous listings, reflecting improvements both in compilers and in implementations. >

122 citations


Proceedings ArticleDOI
07 Nov 1993
TL;DR: An optimized BIST scheme based on reseeding of multiple polynomial Linear Feedback Shift Registers (LFSRs) that allows an excellent trade-off between test data storage and test application time (number of test patterns) with a very small hardware overhead.
Abstract: In this paper we describe an optimized BIST scheme based on reseeding of multiple polynomial Linear Feedback Shift Registers (LFSRs). The same LFSR that is used to generate pseudo-random patterns, is loaded with seeds from which it produces vectors that cover the testcubes of difficult to test faults. The scheme is compatible with scandesign and achieves full coverage as it is based on random patterns combined with a deterministic test set. A method for processing the test s et to allow for efficient encoding by the .scheme is described. Algorithms for Calculating LFSR seeds from the test set and for the selection and ordering of polynomials are described. Experimental results are provided for ISCAS-89 benchmark circuits to demonstrate the effectiveness of the scheme. The scheme allows an excellent trade-off between test data storage and test application time (number of test patterns) with a very small hardware overhead. We show the trade-off between test data storage and number of test patterns under the scheme.

113 citations


Proceedings Article
01 Jan 1993

105 citations


Proceedings Article
21 Jun 1993
TL;DR: The LADDIS NFS file server benchmark has been developed to resolve nhfsstone's shortcomings and provide new functionality and the major technical issues involved and the rationale used to establish default 097.LADDIS workload parameter values are described.
Abstract: The ability to compare the performance of various NFS(1) file server configurations from several vendors is critically important to a computing facility when selecting an NFS file server. To date, nhfsstone(2) has been a popular means of characterizing NFS file server performance. However, several deficiencies have been found in nhfsstone. The LADDIS NFS file server benchmark has been developed to resolve nhfsstone's shortcomings and provide new functionality. The Standard Performance Evaluation Corporation (SPEC(3)) released the System File Server (SFS) Release 1.0 benchmark suite, which contains 097.LADDIS, as an industry-standard NFS file server benchmark in April 1993. This paper describes the major technical issues involved in developing the benchmark and the rationale used to establish default 097.LADDIS workload parameter values. Where appropriate, areas for further research are identified and encouraged.

104 citations


Proceedings ArticleDOI
29 Jun 1993
TL;DR: Experiments find a surprising amount of trivial and redundant operation, and various architectural means of exploiting this knowledge to improve computational efficiency include detection of trivial operands and the result cache.
Abstract: The notion of trivial computation, in which the appearance of simple operands renders potentially complex operations simple, is discussed. An example of a trivial operation is integer division, where the divisor is two; the division becomes a simple shift operation. The concept of redundant computation, in which some operation repeatedly does the same function because it repeatedly sees the same operands, is also discussed. Experiments on two separate benchmark suites, the SPEC benchmarks and the Perfect Club, find a surprising amount of trivial and redundant operation. Various architectural means of exploiting this knowledge to improve computational efficiency include detection of trivial operands and the result cache. Further experimentation shows significant speedup from these techniques, as measured on three different styles of machine architecture. >

Proceedings ArticleDOI
05 Jan 1993
TL;DR: A data-driven multiprocessor architecture for the rapid prototyping of complex DSP algorithms, based on direct execution of data-flow graphs, is presented, which confirms the performance efficiency and generality of the architecture.
Abstract: A data-driven multiprocessor architecture for the rapid prototyping of complex DSP algorithms, based on direct execution of data-flow graphs, is presented. High computation bandwidth is achieved by exploiting the fine-grain parallelism inherent in the target algorithms using simple processing elements called nanoprocessors interconnected by a configurable static communication network. The use of distributed control and the data-driven execution approach resulted in a highly scalable and modular architecture. A prototype chip, which is currently being designed, contains 64 nanoprocessors, 1 kByte of memory in four banks and eight 16-bit I/O ports, and provides 3.2 GOPS peak when running at 50 MHz. The benchmark results, based on a variety of DSP algorithms in video processing, digital communication, digital filtering and speech recognition, confirm the performance efficiency and generality of the architecture. >

01 Jan 1993
TL;DR: A generic version of a real-time railroad crossing system is developed to provide insight into the utility of different methods for solving real- time problems and to use this example as a benchmark for comparing different formalisms.
Abstract: : To be considered correct or useful, real-time systems must deliver results within specified time intervals, either without exception or with high probability. Recently, a large number of formal methods have been invented for specifying and verifying real-time systems. It has been suggested that these formal methods need to be tested out on actual real-time systems. Such testing will allow the scalability of the methods to be assessed and also will uncover new problems requiring a formal solution. However, before these methods can be productively applied to industrial systems, greater understanding is needed about how they compare (e.g., what classes of problems they are designed to solve, the availability of mechanical support, etc.). To provide insight into the utility of different methods for solving real-time problems, the authors have developed a generic version of a real-time railroad crossing system. Their plan is to use this example as a benchmark for comparing different formalisms. In this paper, the authors define the problem, describe three classes of formalisms that can be applied, and summarize efforts currently in progress to specify the system of interest and prove properties about its behavior.

Proceedings ArticleDOI
22 Jun 1993
TL;DR: An initial attempt at the development of a set of benchmarks to gauge a system's robustness as measured by its ability to tolerate errors by combining several primitive benchmarks that can be combined into a robustness benchmark suite are presented.
Abstract: An initial attempt at the development of a set of benchmarks to gauge a system's robustness as measured by its ability to tolerate errors is presented. Due to the large domain of system components whose intolerance to errors can lead to system failure, several primitive benchmarks that can be combined into a robustness benchmark suite are presented. Each primitive benchmark targets a system functionality and measure its behavior given erroneous inputs. Four primitive benchmarks have been implemented in this initial effort. They target the file management system, memory access, user application, and the C library functions. The motivation and experimental results of each of these primitive benchmarks are presented in detail followed by an analysis of the results. A methodology to combine the primitive benchmarks to form an overall robustness figure is presented. A list of additional primitive benchmarks is suggested.

Proceedings ArticleDOI
01 Jun 1993
TL;DR: A self-scaling benchmark that dynamically adjusts aspects of its workload according to the performance characteristic of the system being measured, which gives a far more accurate comparative performance evaluation than traditional single point benchmarks.
Abstract: Current I/O benchmarks suffer from several chronic problems: they quickly become obsolete, they do not stress the I/O system, and they do not help in understanding I/O system performance. We propose a new approach to I/O performance analysis. First, we propose a self-scaling benchmark that dynamically adjusts aspects of its workload according to the performance characteristic of the system being measured. By doing so, the benchmark automatically scales across current and future systems. The evaluation aids in understanding system performance by reporting how performance varies according to each of fie workload parameters. Second, we propose predicted performance, a technique for using the results from the self-scaling evaluation to quickly estimate the performance for workloads that have not been measured. We show that this technique yields reasonably accurate performance estimates and argue that this method gives a far more accurate comparative performance evaluation than traditional single point benchmarks. We apply our new evaluation technique by measuring a SPARCstation 1+ with one SCSI disk, an HP 730 with one SCSI-II disk, a Sprite LFS DECstation 5000/200 with a three-disk disk array, a Convex C240 minisupercomputer with a four-disk disk array, and a Solbourne 5E/905 fileserver with a two-disk disk array.

Proceedings ArticleDOI
01 Jul 1993
TL;DR: Five implementations of different lazy functional languages are compared using a common benchmark of a dozen medium size programs so that one set of programs can be translated automatically into different languages, thus allowing a fair comparison of the quality of compilers.
Abstract: Five implementations of different lazy functional languages are compared using a common benchmark of a dozen medium size programs. The benchmarking procedure has been designed such that one set of programs can be translated automatically into different languages, thus allowing a fair comparison of the quality of compilers for different lazy functional languages. Aspects studied include compile time, execution time, ease of programmingdetermined by the availability of certain key features

Proceedings ArticleDOI
01 Dec 1993
TL;DR: The micro benchmark approach is used to analyze the KSR1 and, in particular, the ALLCACHE memory architecture and ring interconnection and has enabled the authors to identify and characterize parts of the memory design not described by Kendall Square Research.
Abstract: The micro benchmark approach is used to analyze the KSR1 and, in particular, the ALLCACHE memory architecture and ring interconnection. The authors have been able to elucidate many facets of memory performance. The technique has enabled them to identify and characterize parts of the memory design not described by Kendall Square Research. The results show that a miss in the local cache can incur a penalty ranging from 7.5 to 500 ms (when a dirty "page" in the local cache must be evicted). The programmer must be very careful in placement and accessing of data to obtain maximum performance from the KSR1; the data presented will help in understanding the performance actually obtained.

Proceedings ArticleDOI
J.W. Davis1
10 May 1993
TL;DR: The author proposes a strategy and technique for benchmarking computer energy efficiency from an end user's perspective and might be useful in distinguishing products on the competitive issue.
Abstract: The author proposes a strategy and technique for benchmarking computer energy efficiency from an end user's perspective. It is noted that performance benchmarking is complicated by the various classes of computers, even within the narrower area of personal computers and how performance benchmarks relate to the end user's expectation of the computer. This is further aggravated when the computer under scrutiny is performing with power management techniques to maximize its energy efficiency. The proposed strategy might be useful in distinguishing products on the competitive issue. >

Proceedings ArticleDOI
01 Jul 1993
TL;DR: An efficient heuristic partitioning procedure of polynomial complexity is presented and experiments on combinational benchmark circuits validate the efficiency and quality of this approach.
Abstract: Pseudo-exhaustive testing involves applying all possible input patterns to individual output cones of a circuit. Circuits with output cones driven by a large number of inputs often need to be partitioned to reduce the test application time. In this paper, we present an efficient heuristic partitioning procedure of polynomial complexity. Experiments on combinational benchmark circuits validate the efficiency and quality of our approach.

Journal ArticleDOI
TL;DR: A general purpose neurocomputer, SYNAPSE-1, which exhibits a multiprocessor and memory architecture is presented, which offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude--including learning.
Abstract: A general purpose neurocomputer, SYNAPSE-1, which exhibits a multiprocessor and memory architecture is presented. It offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude — including learning. The computational power is provided by a 2-dimensional systolic array of neural signal processors. Since the weights are stored outside these NSPs, memory size and processing power can be adapted individually to the application needs. A neural algorithms programming language, embedded in C++ has been defined for the user to cope with the neurocomputer. In a benchmark test, the prototype of SYNAPSE-1 was 8000 times as fast as a standard workstation.

Book
02 Jan 1993
TL;DR: In this article, a methodology is proposed for the analysis of computer benchmark results, and illustrated with results from a parallelised particle/mesh (PIC) code, the Genesis LPM1 benchmark.
Abstract: A methodology is proposed for the sensible analysis of computer benchmark results, and illustrated with results from a parallelised particle/mesh (PIC) code, the Genesis LPM1 benchmark. The importance of choosing the correct performance metric is emphasised, and the temporal, simulation, benchmark and hardware performance metrics are deened. The use of speedup and MMop/s as gures of merit are strongly discouraged.


Proceedings ArticleDOI
01 Jul 1993
TL;DR: New algorithms for performing probabilistic simulation that provide significant improvements over existing ones, both in accuracy and speed are presented.
Abstract: Probabilistic simulation has been shown to be a very cost-effective approach to the computation of voltage and current waveform statistics in CMOS digital circuits, compared to exhaustive simulation. This approach is particularly attractive when long-term reliability issues such as electromigration, hot-carrier effects, and average power, are to be estimated over all possible input signals. In this paper we present new algorithms for performing probabilistic simulation that provide significant improvements over existing ones, both in accuracy and speed. The improvements are carried out at the subcircuit level, where the statistics of the current and voltage waveforms and the delays are computed more accurately, and at the global level, where signal correlations are considered. The new algorithms have been implemented in a computer program and tested on a number of large benchmark circuits.

Proceedings ArticleDOI
Sharat Prasad1, K. Roy1
18 Feb 1993
TL;DR: The authors describe an efficient implementation of a general algorithm to compute expected number of transitions per unit time at circuit nodes, which are in turn used to compute power dissipation.
Abstract: The problem of optimization of multilevel combinational logic to achieve low power dissipation as well as low area is considered wherein it is assumed that static CMOS gates are used Given a multilevel Boolean network as a collection of functions, the system determines a new function at a time, adds it to the collection and expresses the existing functions in terms of it In selecting the new function the effect on power dissipation as well as area are considered The authors describe an efficient implementation of a general algorithm to compute expected number of transitions per unit time at circuit nodes These numbers are in turn used to compute power dissipation A prototype multilevel logic optimization system has been implemented Results are given for a selection of benchmark examples >

01 Nov 1993
TL;DR: A genetic algorithm needs to be heavily customized to work ``well'' for the clique problem, and a GA is computationally very expensive, and its use is only recommended if it is known to find larger cliques than other algorithms.
Abstract: This paper investigates the power of genetic algorithms at solving the MAX-CLIQUE problem. We measure the performance of a standard genetic algorithm on an elementary set of problem instances consisting of embedded cliques in random graphs. We indicate the need for improvement, and introduce a new genetic algorithm, the {\em multi-phase annealed GA}, which exhibits superior performance on the same problem set. As we scale up the problem size and test on ``hard'''' benchmark instances, we notice a degraded performance in the algorithm caused by premature convergence to local minima. To alleviate this problem, a sequence of modifications are implemented ranging from changes in input representation to systematic local search. The most recent version, called {\em union GA}, incorporates the features of union cross-over, greedy replacement, and diversity enhancement. It shows a marked speed-up in the number of iterations required to find a given solution, as well as some improvement in the clique size found. We discuss issues related to the SIMD implementation of the genetic algorithms on a Thinking Machines CM-5, which was necessitated by the intrinsically high time complexity ($O(n^3)$) of the serial algorithm for computing one iteration. Our preliminary conclusions are: (1) a genetic algorithm needs to be heavily customized to work ``well'''' for the clique problem; (2) a GA is computationally very expensive, and its use is only recommended if it is known to find larger cliques than other algorithms; (3) although our customization effort is bringing forth continued improvements, there is no clear evidence, at this time, that a GA will have better success in circumventing local minima.

Proceedings ArticleDOI
05 Jan 1993
TL;DR: A synthetic benchmark was developed to allow more detailed testing of the DISC architecture, and was designed and run in the Verilog simulation language.
Abstract: DISC is a simple processor architecture targeted for real-time applications. The architecture is based on dynamic fine-grained multithreading where the next instruction is fetched from one of several possible simultaneously active threads. The DISC architecture uses a combination of concepts including, a register stack file, a four stage pipeline, up to four active threads, a dynamic scheduler, and special input/output (I/O) and interrupt constructs to allow maximization of performance for real-time control applications. Previous stochastic results were very encouraging and so a synthetic benchmark was developed to allow more detailed testing. The benchmark was based on a Hughes Aircraft Company satellite control system, and assembled with the DISC assembler. The model was designed and run in the Verilog simulation language. >

Book ChapterDOI
13 Sep 1993
TL;DR: A general purpose neurocomputer, SYNAPSE-1, is presented which exhibits a multi processor and memory architecture that offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude -- including learning.
Abstract: A general purpose neurocomputer, SYNAPSE-1, is presented which exhibits a multi processor and memory architecture. It offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude -- including learning. The computational power is provided by a 2-dimensional systolic array of neural signal processors. Since the weights are stored outside these NSPs memory size and processing power can be adapted individually to the applicational needs. A neural Algorithms Programming Language, embedded in C ++, has been defined for the user to cope with the neurocomputer. In a benchmark test the prototype of SYNAPSE-1 was 8000 times as fast as a standard workstation.

Book
02 Jan 1993
TL;DR: The structure of theEuroBen benchmark, its rationale, and the supporting activities of the EuroBen group with regard to the benchmark are described.

Journal ArticleDOI
TL;DR: This first paper defines the methodology to be used to analyse the benchmark results, and gives an example of a fully analysed application benchmark from General Relativity (GR1), which treats the execution time and absolute performance as functions of at least two variables, namely the problem size and the number of proecssors.
Abstract: This is the first of a series of papers on the Genesis distributed-memory benchmarks, which were developed under the European ESPRIT research program. The benchmarks provide a standard reference Fortran77 uniprocessor version, a distributed memory. MIMD version, and in some cases a Fortran90 version suitable for SIMD computers. The problems selected all have a scientific origin (mostly from physics or theoretical chemistry), and range from synthetic code fragments designed to measure the basic hardware properties of the computer (especially communication and synchronisation overheads), through commonly used library subroutines, to full application codes. This first paper defines the methodology to be used to analyse the benchmark results, and gives an example of a fully analysed application benchmark from General Relativity (GR1). First, suitable absolute performance metrics are carefully defined, then the performance analysis treats the execution time and absolute performance as functions of at least two variables, namely the problem size and the number of proecssors. The theoretical predictions are compared with, or fitted to, the measured results, and then used to predict (with due caution) how the performance might scale for larger problems and more processors than were actually available during the benchmarking. Benchmark measurements are given primarily for the German SUPRENUM computer, but also for the IBM 3083J, Convex C210 and a Parsys Supernode with 32 T800-20 transputers.

Journal ArticleDOI
TL;DR: A method for integrated control/structure optimization by multilevel decomposition is presented and shows that several previously reported methods were actually partial decompositions wherein only the control was decomposed into a subsystem design.
Abstract: A method for integrated control/structure optimization by multilevel decomposition is presented. It is shown that several previously reported methods were actually partial decompositions wherein only the control was decomposed into a subsystem design. One of these partially decomposed problems was selected as a benchmark example for comparison. The system is fully decomposed into structural and control subsystem designs and an improved design is produced. Theory, implementation, and results for the method are presented and compared with the benchmark example.