Showing papers on "Benchmark (computing) published in 2001"

PDF

Open Access

Proceedings Article•DOI•

MiBench: A free, commercially representative embedded benchmark suite

[...]

Matthew R. Guthaus¹, Jeff Ringenberg¹, Daniel J. Ernst¹, Todd Austin¹, Trevor Mudge¹, Richard B. Brown¹ - Show less +2 more•Institutions (1)

University of Michigan¹

02 Dec 2001

TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.

...read moreread less

Abstract: This paper examines a set of commercially representative embedded programs and compares them to an existing benchmark suite, SPEC2000. A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors. Several characteristics distinguish the representative embedded programs from the existing SPEC benchmarks including instruction distribution, memory behavior, and available parallelism. The embedded benchmarks, called MiBench, are freely available to all researchers.

...read moreread less

3,548 citations

Book Chapter•DOI•

An Experimental Comparison of Min-cut/Max-flow Algorithms for Energy Minimization in Vision

[...]

Yuri Boykov¹, Vladimir Kolmogorov²•Institutions (2)

Siemens¹, Cornell University²

03 Sep 2001

TL;DR: The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for applications in vision, comparing the running times of several standard algorithms, as well as a new algorithm that is recently developed.

...read moreread less

Abstract: After [10, 15, 12, 2, 4] minimum cut/maximum flow algorithms on graphs emerged as an increasingly useful tool for exact or approximate energy minimization in low-level vision. The combinatorial optimization literature provides many min-cut/max-flow algorithms with different polynomial time complexity. Their practical efficiency, however, has to date been studied mainly outside the scope of computer vision. The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for energy minimization in vision. We compare the running times of several standard algorithms, as well as a new algorithm that we have recently developed. The algorithms we study include both Goldberg-style "push-relabel" methods and algorithms based on Ford-Fulkerson style augmenting paths. We benchmark these algorithms on a number of typical graphs in the contexts of image restoration, stereo, and interactive segmentation. In many cases our new algorithm works several times faster than any of the other methods making near real-time performance possible.

...read moreread less

3,099 citations

Proceedings Article•DOI•

Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

[...]

Timothy Sherwood¹, Erez Perelman¹, Brad Calder¹•Institutions (1)

University of California, San Diego¹

08 Sep 2001

TL;DR: This paper proposes Basic Block Distribution Analysis as an automated approach for finding these small portions of the program to simulate that are representative of the entire program's execution and shows that theperiodicity of the basic block frequency profile reflects the periodicity of detailed simulation across several different architectural metrics.

...read moreread less

Abstract: Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of a program's execution to evaluate their results, rather than simulating the entire program. In this paper we propose Basic Block Distribution Analysis as an automated approach for finding these small portions of the program to simulate that are representative of the entire program's execution. This approach is based upon using profiles of a program's code structure (basic blocks) to uniquely identify different phases of execution in the program. We show that the periodicity of the basic block frequency profile reflects the periodicity of detailed simulation across several different architectural metrics (e.g., IPC, branch miss rate, cache miss rate, value misprediction, address misprediction, and reorder buffer occupancy). Since basic block frequencies can be collected using very fast profiling tools, our approach provides a practical technique for finding the periodicity and simulation points in applications.

...read moreread less

571 citations

The COST Simulation Benchmark: Description and Simulator Manual. ffice for Official Publications of the European Community, Luxembourg, 2001.

[...]

Peter Vanrolleghem, Sylvie Gillot

01 Jan 2001

510 citations

The COST Simulation Benchmark: Description and Simulator Manual

[...]

Peter A. Vanrolleghem, Sylvie Gillot

01 Jan 2001

375 citations

Journal Article•DOI•

On Benchmarking Optical Flow

[...]

Brendan McCane¹, Kevin Novins², D Crannitch¹, Ben Galvin¹•Institutions (2)

University of Otago¹, University of Auckland²

01 Oct 2001-Computer Vision and Image Understanding

TL;DR: A new method for generating motion fields from real sequences containing polyhedral objects is presented and a test suite for benchmarking optical flow algorithms consisting of complex synthetic sequences and real scenes with ground truth is presented.

...read moreread less

294 citations

Journal Article•DOI•

Attack modelling: towards a second generation watermarking benchmark

[...]

Svyatoslav Voloshynovskiy¹, Shelby Pereira¹, V. Iquise¹, Thierry Pun¹•Institutions (1)

University of Geneva¹

01 Jun 2001-Signal Processing

TL;DR: A stochastic formulation of watermarking attacks using an estimation-based concept and a new method of evaluating image quality based on the Watson metric which overcomes the limitations of the PSNR are proposed.

...read moreread less

283 citations

Book Chapter•DOI•

SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance

[...]

Vishal Aslot¹, Max Domeika², Rudolf Eigenmann¹, Greg Gaertner, Wesley Jones, Bodo K. Parady³ - Show less +2 more•Institutions (3)

Purdue University¹, Intel², Sun Microsystems³

30 Jul 2001

TL;DR: An overview of a new benchmark suite for parallel computers, SPEComp, which targets mid-size parallel servers and includes a number of science/engineering and data processing applications, is presented.

...read moreread less

Abstract: We present a new benchmark suite for parallel computers. SPEComp targets mid-size parallel servers. It includes a number of science/engineering and data processing applications. Parallelism is expressed in the OpenMP API. The suite includes two data sets, Medium and Large, of approximately 1.6 and 4 GB in size. Our overview also describes the organization developing SPEComp, issues in creating OpenMP parallel benchmarks, the benchmarking methodology underlying SPEComp, and basic performance characteristics.

...read moreread less

239 citations

Book Chapter•DOI•

Second Generation Benchmarking and Application Oriented Evaluation

[...]

Shelby Pereira¹, Sviatoslav Voloshynovskiy¹, Maribel Madueno¹, Stéphane Marchand-Maillet¹, Thierry Pun¹ - Show less +1 more•Institutions (1)

University of Geneva¹

25 Apr 2001

TL;DR: A second generation benchmark for image watermarking is proposed which includes attacks which take into account powerful prior information about the watermark and theWatermarking algorithms and presents results as a function of application.

...read moreread less

Abstract: Digital image watermarking techniques for copyright protection have become increasingly robust. The best algorithms perform well against the now standard benchmark tests included in the Stirmark package. However the stirmark tests are limited since in general they do not properly model the watermarking process and consequently are limited in their potential to removing the best watermarks. Here we propose a second generation benchmark for image watermarking which includes attacks which take into account powerful prior information about the watermark and the watermarking algorithms. We follow the model of the Stirmark benchmark and propose several new categories of tests including: denoising (ML and MAP), wavelet compression, watermark copy attack, active desynchronization, denoising, geometrical attacks, and denoising followed by perceptual remodulation. In addition, we take the important step of presenting results as a function of application. This is an important contribution since it is unlikely that one technology will be suitable for all applications.

...read moreread less

193 citations

Proceedings Article•DOI•

Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources

[...]

Dmitry Ponomarev¹, Gurhan Kucuk¹, Kanad Ghose¹•Institutions (1)

Binghamton University¹

01 Dec 2001

TL;DR: A mechanism to dynamically, simultaneously and independently adjust the sizes of the issue queue (IQ), the reorder buffer (ROB) and the load/store queue (LSQ) based on the periodic sampling of their occupancies to achieve significant power savings with minimal impact on performance is proposed.

...read moreread less

Abstract: The "one-size-fits-all" philosophy used for permanently allocating datapath resources in today's superscalar CPUs to maximize performance across a wide range of applications results in the overcommitment of resources in general. To reduce power dissipation in the datapath, the resource allocations can be dynamically adjusted based on the demands of applications. We propose a mechanism to dynamically, simultaneously and independently adjust the sizes of the issue queue (IQ), the reorder buffer (ROB) and the load/store queue (LSQ) based on the periodic sampling of their occupancies to achieve significant power savings with minimal impact on performance. Resource upsizing is done more aggressively (compared to downsizing) using the relative rate of blocked dispatches to limit the performance penalty. Our results are validated by the execution of SPEC 95 benchmark suite on a substantially modified version of Simplescalar simulator, where the IQ, the ROB, the LSQ and the register files are implemented as separate structures, as is the case with most practical implementations. For the SPEC 95 benchmarks, the use of our technique in a 4-way superscalar processor results in a power savings in excess of 70% within individual components and an average power savings of 53% for the IQ, LSQ and ROB combined for the entire benchmark suite with an average performance penalty of only 5%.

...read moreread less

176 citations

Proceedings Article•

A hybrid approach for the 0-1 multidimensional knapsack problem

[...]

Michel Vasquez, Jin-Kao Hao¹•Institutions (1)

University of Angers¹

04 Aug 2001

TL;DR: This work presents a hybrid approach for the 0-1 multidimensional knapsack problem that combines linear programming and Tabu Search and improves significantly on the best known results of a set of more than 150 benchmark instances.

...read moreread less

Abstract: We present a hybrid approach for the 0-1 multidimensional knapsack problem The proposed approach combines linear programming and Tabu Search The resulting algorithm improves significantly on the best known results of a set of more than 150 benchmark instances

...read moreread less

Journal Article•DOI•

Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

[...]

John Mellor-Crummey¹, David Whalley², Ken Kennedy¹•Institutions (2)

Rice University¹, Florida State University²

01 Jun 2001-International Journal of Parallel Programming

TL;DR: In this article, the authors investigate the impact of reordering on data reuse at different levels in the memory hierarchy and introduce a new architecture-independent multi-level blocking strategy for irregular applications.

...read moreread less

Abstract: The performance of irregular applications on modern computer systems is hurt by the wide gap between CPU and memory speeds because these applications typically under-utilize multi-level memory hierarchies, which help hide this gap. This paper investigates using data and computation reorderings to improve memory hierarchy utilization for irregular applications. We evaluate the impact of reordering on data reuse at different levels in the memory hierarchy. We focus on coordinated data and computation reordering based on space-filling curves and we introduce a new architecture-independent multi-level blocking strategy for irregular applications. For two particle codes we studied, the most effective reorderings reduced overall execution time by a factor of two and four, respectively. Preliminary experience with a scatter benchmark derived from a large unstructured mesh application showed that careful data and computation ordering reduced primary cache misses by a factor of two compared to a random ordering.

...read moreread less

Book Chapter•DOI•

XMach-1: A Benchmark for XML Data Management

[...]

Timo Böhme¹, Erhard Rahm¹•Institutions (1)

Leipzig University¹

07 Mar 2001

TL;DR: A scaleable multi-user benchmark called XMach-1 (AML Data Management benchmark) is proposed, based on a web application, that considers different types of XML data, in particular text documents, schema-less data and structured data, and measures the query throughput of a system under response time constraints.

...read moreread less

Abstract: We propose a scaleable multi-user benchmark called XMach-1 (AML Data Management benchmark) for evaluating the performance of XML data management systems. It is based on a web application and considers different types of XML data, in particular text documents, schema-less data and structured data. We specify the structure of the benchmark database and the generation of its contents. Furthermore, we define a mix of XML queries and update operations for which system performance is determined. The primary performance metric, Xqps, measures the query throughput of a system under response time constraints. We will use XMach-1 to evaluate both native XML data management systems and XML-enabled relational DBMS.

...read moreread less

Journal Article•DOI•

Balancing of U-type assembly systems using simulated annealing

[...]

Erdal Erel, Ihsan Sabuncuoglu, B. A. Aksu

01 Jan 2001-International Journal of Production Research

TL;DR: The paper presents a new simulated annealing (SA)-based algorithm for the assembly line-balancing problem with a U-type configuration that employs an intelligent mechanism to search a large solution space.

...read moreread less

Abstract: The paper presents a new simulated annealing (SA)-based algorithm for the assembly line-balancing problem with a U-type configuration. The proposed algorithm employs an intelligent mechanism to search a large solution space. U-type assembly systems are becoming increasingly popular in today's modern production environments since they are more general than the traditional assembly systems. In these systems, tasks are to be allocated into stations by moving forward and backward through the precedence diagram in contrast to a typical forward move in the traditional assembly systems. The performance of the algorithm is measured by solving a large number of benchmark problems available in the literature. The results of the computational experiments indicate that the proposed SA-based algorithm performs quite effectively. It also yields the optimal solution for most problem instances. Future research directions and a comprehensive bibliography are also provided here.

...read moreread less

Journal Article•DOI•

Benchmark Dose Calculation from Epidemiological Data

[...]

Esben Budtz-Jørgensen¹, Niels Keiding¹, Philippe Grandjean²•Institutions (2)

University of Copenhagen¹, University of Southern Denmark²

01 Sep 2001-Biometrics

TL;DR: It is found that the benchmark calculations are highly dependent on the choice of the dose‐effect function and the definition of the benchmark dose, and it is recommended that several sets of biologically relevant default settings be used to illustrate the effect on the benchmark results.

...read moreread less

Abstract: A threshold for dose-dependent toxicity is crucial for standards setting but may not be possible to specify from empirical studies. Crump (1984) instead proposed calculating the lower statistical confidence bound of the benchmark dose, which he defined as the dose that causes a small excess risk. This concept has several advantages and has been adopted by regulatory agencies for establishing safe exposure limits for toxic substances such as mercury. We have examined the validity of this method as applied to an epidemiological study of continuous response data associated with mercury exposure. For models that are linear in the parameters, we derived an approximative expression for the lower confidence bound of the benchmark dose. We find that the benchmark calculations are highly dependent on the choice of the dose-effect function and the definition of the benchmark dose. We therefore recommend that several sets of biologically relevant default settings be used to illustrate the effect on the benchmark results and to stimulate research that will guide an a priori choice of proper default settings.

...read moreread less

Book Chapter•DOI•

Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research

[...]

A.J. KleinOsowski¹, John Flynn¹, Nancy Meares¹, David J. Lilja¹•Institutions (1)

University of Minnesota¹

04 Sep 2001

TL;DR: In the process of reducing and verifying the SPEC2000 Benchmark datasets, this work obtains instuction mix, memory behavior, and instructions per cycle characterization information about each benchmark program.

...read moreread less

Abstract: The large input datasets in the SPEC 2000 benchmark suite result in unreasonably long simulation times when using detailed execution-driven simulators for evaluating future computer architectures ideas. To address this problem, we have an ongoing project to reduce the execution times of the SPEC 2000 benchmarks in a quantitatively defensible way. Upon completion of this work 1, we will have smaller input datasets for several SPEC2000 benchmarks. The programs using our reduced input datasets will produce execution profiles that accurately reflect the program behavior of the full reference dataset, as measured using standard statistical tests. In the process of reducing and verifying the SPEC2000 Benchmark datasets, we also obtain instuction mix, memory behavior, and instructions per cycle characterization information about each benchmark program.

...read moreread less

Proceedings Article•DOI•

Pipelining in multi-query optimization

[...]

Nilesh N. Dalvi¹, Sumit Sanghai¹, Prasan Roy², Sundararajarao Sudarshan¹•Institutions (2)

Indian Institute of Technology Bombay¹, Bell Labs²

01 May 2001

TL;DR: This work presents a general model for schedules with pipelining, and shows that finding a valid schedule with minimum cost is NP-hard, and presents a greedy heuristic for finding good schedules.

...read moreread less

Abstract: Database systems frequently have to execute a set of related queries, which share several common subexpressions. Multi-query optimization exploits this, by finding evaluation plans that share common results. Current approaches to multi-query optimization assume that common subexpressions are materialized. Significant performance benefits can be had if common subexpressions are pipelined to their uses, without being materialized. However, plans with pipelining may not always be realizable with limited buffer space, as we show. We present a general model for schedules with pipelining, and present a necessary and sufficient condition for determining validity of a schedule under our model. We show that finding a valid schedule with minimum cost is NP-hard. We present a greedy heuristic for finding good schedules. Finally, we present a performance study that shows the benefit of our algorithms on batches of queries from the TPCD benchmark.

...read moreread less

Proceedings Article•DOI•

Reliable estimation of execution time of embedded software

[...]

Paolo Giusto¹, Grant Martin¹, Ed Harcourt¹•Institutions (1)

Cadence Design Systems¹

13 Mar 2001

TL;DR: A technique based upon a statistical approach that improves existing estimation techniques is described that provides a degree of reliability in the error of the estimated execution time of embedded software.

...read moreread less

Abstract: Estimates of execution time of embedded software play an important role in function-architecture co-design. This paper describes a technique based upon a statistical approach that improves existing estimation techniques. Our approach provides a degree of reliability in the error of the estimated execution time. We illustrate the technique using both control-oriented and computational-dominated benchmark programs.

...read moreread less

Journal Article•DOI•

Using data envelopment analysis to benchmark activities

[...]

Carsten Homburg¹•Institutions (1)

University of Cologne¹

31 Aug 2001-International Journal of Production Economics

TL;DR: The main advantage of the proposed procedure is that it identifies critical activities without requiring too much information, and it is shown that the information that DEA provides for inefficient DMUs is in general not sufficient to improve their activities.

...read moreread less

Journal Article•DOI•

Efficient solution strategies for building energy system simulation

[...]

Edward F. Sowell¹, Philip Haves²•Institutions (2)

California State University, Fullerton¹, Lawrence Berkeley National Laboratory²

01 Apr 2001-Energy and Buildings

TL;DR: In this article, the authors compared the performance of simulation problem analysis and research kernel (SPARK) and the HVACSIM+ programs by means of benchmark testing and showed that the graph-theoretic techniques employed in SPARK offer significant speed advantages over the other methods for significantly reducible problems and that even problem portions with little reduction potential can be solved efficiently.

...read moreread less

Proceedings Article•

The MOLEN rho-mu-Coded Processor

[...]

Stamatis Vassiliadis, Stephan Wong, Sorin Cotofana

27 Aug 2001

TL;DR: Using simulations, the performance potential of the MOLEN ρµ-coded processor, which comprises hardwired and microcoded reconfigurable units, is established and it is indicated that the execution cycles of the superscalar machine can be reduced by 30% for the JPEG benchmark and by 32%" for the MPEG-2 benchmark using the proposed processor organization.

...read moreread less

Abstract: In this paper, we introduce the MOLEN ρµ-coded processor which comprises hardwired and microcoded reconfigurable units At the expense of three new instructions, the proposed mechanisms allow instructions, entire pieces of code, or their combination to execute in a reconfigurable manner The reconfiguration of the hardware and the execution on the reconfigured hardware are performed by ρ-microcode (an extension of the classical microcode to allow reconfiguration capabilities) We include fixed and pageable microcode hardware features to extend the flexibility and improve the performance The scheme allows partial reconfiguration and includes caching mechanisms for nonfrequently used reconfiguration and execution microcode Using simulations, we establish the performance potential of the proposed processor assuming the JPEG and MPEG-2 benchmarks, the ALTERA APEX20K boards for the implementation, and a hardwired superscalar processor After implementation, cycle time estimations and normalization, our simulations indicate that the execution cycles of the superscalar machine can be reduced by 30% for the JPEG benchmark and by 32% for the MPEG-2 benchmark using the proposed processor organization

...read moreread less

Book Chapter•DOI•

Design of Iterated Local Search Algorithms

[...]

Matthijs den Besten¹, Thomas Stützle¹, Marco Dorigo²•Institutions (2)

Technische Universität Darmstadt¹, Université libre de Bruxelles²

18 Apr 2001

TL;DR: This paper systematically configure an ILS algorithms by optimizing the single procedures part of ILS and optimizing their interaction to come up with a highly effective ILS approach, which outperforms the implementation of the iterated dynasearch algorithm on the hardest benchmark instances.

...read moreread less

Abstract: In this article we investigate the application of iterated local search (ILS) to the single machine total weighted tardiness problem. Our research is inspired by the recently proposed iterated dynasearch approach, which was shown to be a very effective ILS algorithm for this problem. In this paper we systematically configure an ILS algorithms by optimizing the single procedures part of ILS and optimizing their interaction. We come up with a highly effective ILS approach, which outperforms our implementation of the iterated dynasearch algorithm on the hardest benchmark instances.

...read moreread less

Journal Article•DOI•

Oporto: A Realistic Scenario Generator for Moving Objects

[...]

Jean-Marc Saglio, José Moreira

01 Mar 2001-Geoinformatica

TL;DR: This paper offers a framework for a spatio-temporal data sets generator, a first step towards a full benchmark for the large real world application field of “smoothly” moving objects with few or no restrictions in motion.

...read moreread less

Abstract: The spatio-temporal database research community has just started to investigate benchmarking issues. On one hand we would rather have a benchmark that is representative of real world applications, in order to verify the expressiveness of proposed models. On the other hand, we would like a benchmark that offers a sizeable workload of data and query sets, which could obviously stress the strengths and weaknesses of a broad range of data access methods. This paper offers a framework for a spatio-temporal data sets generator, a first step towards a full benchmark for the large real world application field of “smoothly” moving objects with few or no restrictions in motion. The driving application is the modeling of fishing ships where the ships go in the direction of the most attractive shoals of fish while trying to avoid storm areas. Shoals are themselves attracted by plankton areas. Ships are moving points; plankton or storm areas are regions with fixed center but moving shape; and shoals are moving regions. The specification is written in such a way that the users can easily adjust generation model parameters.

...read moreread less

Journal Article•DOI•

Cache performance for selected SPEC CPU2000 benchmarks

[...]

Jason F. Cantin¹, Mark D. Hill¹•Institutions (1)

University of Wisconsin-Madison¹

01 Sep 2001-ACM Sigarch Computer Architecture News

TL;DR: Functional cache miss ratios and related statistics for selected benchmarks in the SPEC CPU2000 suite are presented.

...read moreread less

Abstract: The SPEC CPU2000 benchmark suite (http://www.spec.org/osg/cpu2000) is a collection of 26 compute-intensive, non-trivial programs used to evaluate the performance of a computer's CPU, memory system, and compilers. The benchmarks in this suite were chosen to represent real-world applications, and thus exhibit a wide range of runtime behaviors. On this webpage, we present functional cache miss ratios and related statistics for selected benchmarks in the SPEC CPU2000 suite. In particular, split L1 cache sizes ranging from 4KB to 1MB with 64B blocks and associativities of 1, 2, 4, 8 and full. Most of this data was collected at the University of Wisconsin-Madison with the aid of the Simplescalar toolset (http://www.simplescalar.org).

...read moreread less

Book Chapter•DOI•

The MOLEN ρμ-Coded Processor

[...]

Stamatis Vassiliadis¹, Stephan Wong¹, Sorin Cotofana¹•Institutions (1)

Delft University of Technology¹

27 Aug 2001

TL;DR: The MOLEN ρμ-coded processor as mentioned in this paper is a reconfigurable superscalar processor that includes fixed and pageable microcode hardware features to extend the flexibility and improve the performance.

...read moreread less

Abstract: In this paper, we introduce the MOLEN ρμ-coded processor which comprises hardwired and microcoded reconfigurable units. At the expense of three new instructions, the proposed mechanisms allow instructions, entire pieces of code, or their combination to execute in a reconfigurable manner. The reconfiguration of the hardware and the execution on the reconfigured hardware are performed by ρ-microcode (an extension of the classical microcode to allow reconfiguration capabilities). We include fixed and pageable microcode hardware features to extend the flexibility and improve the performance. The scheme allows partial reconfiguration and includes caching mechanisms for non-frequently used reconfiguration and execution microcode. Using simulations, we establish the performance potential of the proposed processor assuming the JPEG and MPEG-2 benchmarks, the ALTERA APEX20K boards for the implementation, and a hardwired superscalar processor. After implementation, cycle time estimations and normalization, our simulations indicate that the execution cycles of the superscalar machine can be reduced by 30% for the JPEG benchmark and by 32% for the MPEG-2 benchmark using the proposed processor organization.

...read moreread less

Journal Article•DOI•

A cost model for selecting checkpoint positions in time warp parallel simulation

[...]

Francesco Quaglia

01 Apr 2001-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A checkpointing technique in which the selection of the positions of checkpoints is based on a checkpointing-recovery cost model that allows faster execution and, in some cases, exhibits the additional advantage that less memory is required for recording state vectors.

...read moreread less

Abstract: Recent papers have shown that the performance of Time Warp simulators can be improved by appropriately selecting the positions of checkpoints, instead of taking them on a periodic basis. In this paper, we present a checkpointing technique in which the selection of the positions of checkpoints is based on a checkpointing-recovery cost model. Given the current state S, the model determines the convenience of recording S as a checkpoint before the next event is executed. This is done by taking into account the position of the last taken checkpoint, the granularity (i.e., the execution time) of intermediate events, and using an estimate of the probability that S will have to be restored due to rollback in the future of the execution. A synthetic benchmark in different configurations is used for evaluating and comparing this approach to classical periodic techniques. As a testing environment we used a cluster of PCs connected through a Myrinet switch coupled with a fast communication layer specifically designed to exploit the potential of this type of switch. The obtained results point out that our solution allows faster execution and, in some cases, exhibits the additional advantage that less memory is required for recording state vectors. This possibly contributes to further performance improvements when memory is a critical resource for the specific application. A performance study for the case of a cellular phone system simulation is finally reported to demonstrate the effectiveness of this solution for a real world application.

...read moreread less

Proceedings Article•DOI•

Data locality enhancement by memory reduction

[...]

Yonghong Song¹, Rong Xu², Cheng Wang², Zhiyuan Li²•Institutions (2)

Sun Microsystems¹, Purdue University²

17 Jun 2001

TL;DR: This paper presents an optimal algorithm to combine loop shifting, loop fusion and array contraction to reduce the temporary array storage required to execute a collection of loops.

...read moreread less

Abstract: In this paper, we propose memory reduction as a new approach to data locality enhancement. Under this approach, we use the compiler to reduce the size of the data repeatedly referenced in a collection of nested loops. Between their reuses, the data will more likely remain in higher-speed memory devices, such as the cache. Specifically, we present an optimal algorithm to combine loop shifting, loop fusion and array contraction to reduce the temporary array storage required to execute a collection of loops. When applied to 20 benchmark programs, our technique reduces the memory requirement, counting both the data and the code, by 51% on average. The transformed programs gain a speedup of 1.40 on average, due to the reduced footprint and, consequently, the improved data locality.

...read moreread less

Least squares support vector machines classifiers : a multi two-spiral benchmark problem

[...]

L. Lukas, Johan A. K. Suykens, Joos Vandewalle

01 Jan 2001

Journal Article•DOI•

Portfolio Evaluation and Benchmark Selection: A Mathematical Programming Approach

[...]

Kathryn Wilkens¹, Joe Zhu¹•Institutions (1)

Worcester Polytechnic Institute¹

30 Jun 2001-The Journal of Alternative Investments

TL;DR: This paper illustrates a new approach to evaluating portfolios in the context of multiple performance measures based upon linear programming techniques and identifies the n-dimensional efficient portfolio frontier.

...read moreread less

Abstract: This paper illustrates a new approach to evaluating portfolios in the context of multiple performance measures. The approach is based upon linear programming techniques and identifies the n-dimensional efficient portfolio frontier. An illustrative example with commodity trading advisor (CTA) returns shows that benchmarks can be identified for each individual portfolio.

...read moreread less

Journal Article•

The MOLEN ρμ-coded processor

[...]

Stamatis Vassiliadis¹, Stephan Wong¹, Sorin Cotofana¹•Institutions (1)

Delft University of Technology¹

01 Jan 2001-Lecture Notes in Computer Science

TL;DR: Using simulations, the performance potential and performance potential of the MOLEN ρμ-coded processor, which comprises hardwired and microcoded reconfigurable units, are established and it is indicated that the execution cycles of the superscalar machine can be reduced by 30% for the JPEG benchmark and by 32% by using the proposed processor organization.

...read moreread less

Abstract: In this paper, we introduce the MOLEN ρμ-coded processor which comprises hardwired and microcoded reconfigurable units. At the expense of three new instructions, the proposed mechanisms allow instructions, entire pieces of code, or their combination to execute in a reconfigurable manner. The reconfiguration of the hardware and the execution on the reconfigured hardware are performed by p-microcode (an extension of the classical microcode to allow reconfiguration capabilities). We include fixed and pageable microcode hardware features to extend the flexibility and improve the performance. The scheme allows partial reconfiguration and includes caching mechanisms for non-frequently used reconfiguration and execution microcode. Using simulations, we establish the performance potential of the proposed processor assuming the JPEG and MPEG-2 benchmarks, the ALTERA APEX20K boards for the implementation, and a hardwired superscalar processor. After implementation, cycle time estimations and normalization, our simulations indicate that the execution cycles of the superscalar machine can be reduced by 30% for the JPEG benchmark and by 32% for the MPEG-2 benchmark using the proposed processor organization.

...read moreread less

Collapse