scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 2006"


Journal ArticleDOI
TL;DR: The results show that the algorithm with self-adaptive control parameter settings is better than, or at least comparable to, the standard DE algorithm and evolutionary algorithms from literature when considering the quality of the solutions obtained.
Abstract: We describe an efficient technique for adapting control parameter settings associated with differential evolution (DE). The DE algorithm has been used in many practical cases and has demonstrated good convergence properties. It has only a few control parameters, which are kept fixed throughout the entire evolutionary process. However, it is not an easy task to properly set control parameters in DE. We present an algorithm-a new version of the DE algorithm-for obtaining self-adaptive control parameter settings that show good performance on numerical benchmark problems. The results show that our algorithm with self-adaptive control parameter settings is better than, or at least comparable to, the standard DE algorithm and evolutionary algorithms from literature when considering the quality of the solutions obtained

2,820 citations


Journal ArticleDOI
John L. Henning1
TL;DR: On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006, which replaces CPU2000, and the SPEC CPU benchmarks are widely used in both industry and academia.
Abstract: On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006 [2], which replaces CPU2000. The SPEC CPU benchmarks are widely used in both industry and academia [3].

1,864 citations


Journal ArticleDOI
TL;DR: Algorithmic techniques are presented that substantially improve the running time of the loopy belief propagation approach and reduce the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as image restoration that have a large label set.
Abstract: Markov random field models provide a robust and unified framework for early vision problems such as stereo and image restoration. Inference algorithms based on graph cuts and belief propagation have been found to yield accurate results, but despite recent advances are often too slow for practical use. In this paper we present some algorithmic techniques that substantially improve the running time of the loopy belief propagation approach. One of the techniques reduces the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as image restoration that have a large label set. Another technique speeds up and reduces the memory requirements of belief propagation on grid graphs. A third technique is a multi-grid method that makes it possible to obtain good results with a small fixed number of message passing iterations, independent of the size of the input images. Taken together these techniques speed up the standard algorithm by several orders of magnitude. In practice we obtain results that are as accurate as those of other global methods (e.g., using the Middlebury stereo benchmark) while being nearly as fast as purely local methods.

1,560 citations


Proceedings ArticleDOI
09 Dec 2006
TL;DR: The results show that the best architected policies can come within 1% of the performance of an ideal oracle, while meeting a given chip-level power budget, and are significantly better than static management, even if static scheduling is given oracular knowledge.
Abstract: Chip-level power and thermal implications will continue to rule as one of the primary design constraints and performance limiters. The gap between average and peak power actually widens with increased levels of core integration. As such, if per-core control of power levels (modes) is possible, a global power manager should be able to dynamically set the modes suitably. This would be done in tune with the workload characteristics, in order to always maintain a chip-level power that is below the specified budget. Furthermore, this should be possible without significant degradation of chip-level throughput performance. We analyze and validate this concept in detail in this paper. We assume a per-core DVFS (dynamic voltage and frequency scaling) knob to be available to such a conceptual global power manager. We evaluate several different policies for global multi-core power management. In this analysis, we consider various different objectives such as prioritization and optimized throughput. Overall, our results show that in the context of a workload comprised of SPEC benchmark threads, our best architected policies can come within 1% of the performance of an ideal oracle, while meeting a given chip-level power budget. Furthermore, we show that these global dynamic management policies perform significantly better than static management, even if static scheduling is given oracular knowledge.

667 citations


Journal ArticleDOI
TL;DR: It is found that background correction, one of the main steps in preprocessing, has the largest effect on performance and, in particular, background correction appears to improve accuracy but, in general, worsen precision.
Abstract: Motivation: In the Affymetrix GeneChip system, preprocessing occurs before one obtains expression level measurements. Because the number of competing preprocessing methods was large and growing we developed a benchmark to help users identify the best method for their application. A webtool was made available for developers to benchmark their procedures. At the time of writing over 50 methods had been submitted. Results: We benchmarked 31 probe set algorithms using a U95A dataset of spike in controls. Using this dataset, we found that background correction, one of the main steps in preprocessing, has the largest effect on performance. In particular, background correction appears to improve accuracy but, in general, worsen precision. The benchmark results put this balance in perspective. Furthermore, we have improved some of the original benchmark metrics to provide more detailed information regarding precision and accuracy. A handful of methods stand out as providing the best balance using spike-in data with the older U95A array, although different experiments on more current arrays may benchmark differently. Availability: The affycomp package, now version 1.5.2, continues to be available as part of the Bioconductor project (http://www.bioconductor.org). The webtool continues to be available at http://affycomp.biostat.jhsph.edu Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

349 citations


Proceedings ArticleDOI
14 Jun 2006
TL;DR: The key idea is simple: an adaptive aggregation step in a dynamic-programming (DP) stereo framework is introduced, which reduces the typical "streaking" artifacts without the penalty of blurry object boundaries.
Abstract: We present a stereo algorithm that achieves high quality results while maintaining real-time performance. The key idea is simple: we introduce an adaptive aggregation step in a dynamic-programming (DP) stereo framework. The per-pixel matching cost is aggregated in the vertical direction only. Compared to traditional DP, our approach reduces the typical "streaking" artifacts without the penalty of blurry object boundaries. Evaluation using the benchmark Middlebury stereo database shows that our approach is among the best (ranked first in the new evaluation system) for DP-based approaches. The performance gain mainly comes from a computationally expensive weighting scheme based on color and distance proximity. We utilize the vector processing capability and parallelism in commodity graphics hardware to speed up this process over two orders of magnitude. Over 50 million disparity evaluations per second (MDE/s)1 are achieved in our current implementation.

317 citations


Journal ArticleDOI
TL;DR: This paper introduces a modification to original DE that enhances the convergence rate without compromising on solution quality and utilizes only one set of population as against two sets in original DE at any given point of time in a generation.

286 citations


Journal ArticleDOI
TL;DR: A new engineering shape benchmark is developed and an understanding of the effectiveness of different shape representations for classes of engineering parts is understood, finding that view-based representations yielded better retrieval results for a majority of shape classes.
Abstract: Three-dimensional shape retrieval is a problem of current interest in several different fields, especially in the mechanical engineering domain. There exists a large body of work in developing representations for 3D shapes. However, there has been limited work done in developing domain-dependent benchmark databases for 3D shape searching. We propose a benchmark database for evaluating shape-based search methods relevant to the mechanical engineering domain. Twelve different shape descriptors belonging to three categories, namely: (1) feature vector-based, (2) histogram-based, and (3) view-based, are compared using the benchmark database. The main contributions of this paper are the development of a new engineering shape benchmark and an understanding of the effectiveness of different shape representations for classes of engineering parts. Overall, it was found that view-based representations yielded better retrieval results for a majority of shape classes, while no single method performed best for all shape categories.

265 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present the benchmark problem definition for seismically excited base-isolated buildings and provide a well-defined base isolated building with a broad set of carefully chosen parameter sets, performance measures and guidelines to the participants, so that they can evaluate their control algorithms.
Abstract: This paper presents the benchmark problem definition for seismically excited base-isolated buildings. The objective of this benchmark study is to provide a well-defined base-isolated building with a broad set of carefully chosen parameter sets, performance measures and guidelines to the participants, so that they can evaluate their control algorithms. The control algorithms may be passive, active or semi-active. The benchmark structure considered is an eight-storey base-isolated building similar to existing buildings in Los Angeles, California. The base isolation system includes both linear and nonlinear bearings and control devices. The superstructure is considered to be a linear elastic system with lateral–torsional behavior. A new nonlinear dynamic analysis program has been developed and made available to facilitate direct comparison of results of different control algorithms. Copyright © 2005 John Wiley & Sons, Ltd.

247 citations


Proceedings ArticleDOI
01 Oct 2006
TL;DR: MineBench is presented, a publicly available benchmark suite containing fifteen representative data mining applications belonging to various categories such as clustering, classification, and association rule mining that will be of use to those looking to characterize and accelerate data mining workloads.
Abstract: Data mining constitutes an important class of scientific and commercial applications. Recent advances in data extraction techniques have created vast data sets, which require increasingly complex data mining algorithms to sift through them to generate meaningful information. The disproportionately slower rate of growth of computer systems has led to a sizeable performance gap between data mining systems and algorithms. The first step in closing this gap is to analyze these algorithms and understand their bottlenecks. With this knowledge, current computer architectures can be optimized for data mining applications. In this paper, we present MineBench, a publicly available benchmark mark suite containing fifteen representative data mining applications belonging to various categories such as clustering, classification, and association rule mining. We believe that MineBench will be of use to those looking to characterize and accelerate data mining workloads.

242 citations


Book ChapterDOI
Li Ma1, Yang Yang1, Zhaoming Qiu1, Guotong Xie1, Yue Pan1, Shengping Liu1 
11 Jun 2006
TL;DR: In this article, the authors extend the Lehigh University Ontology Benchmark (UOBM) in terms of inference and scalability testing to include both OWL Lite and OWL DL ontologies.
Abstract: Aiming to build a complete benchmark for better evaluation of existing ontology systems, we extend the well-known Lehigh University Benchmark in terms of inference and scalability testing. The extended benchmark, named University Ontology Benchmark (UOBM), includes both OWL Lite and OWL DL ontologies covering a complete set of OWL Lite and DL constructs, respectively. We also add necessary properties to construct effective instance links and improve instance generation methods to make the scalability testing more convincing. Several well-known ontology systems are evaluated on the extended benchmark and detailed discussions on both existing ontology systems and future benchmark development are presented.

Proceedings ArticleDOI
11 Sep 2006
TL;DR: A new fitness value is placed in the normalized fitness-constraint violation space, and two penalty values are applied to infeasible individuals so that the algorithm would be able to identify the best infeasibility individuals in the current population.
Abstract: This paper proposes a self adaptive penalty function for solving constrained optimization problems using genetic algorithms. In the proposed method, a new fitness value, called distance value, in the normalized fitness-constraint violation space, and two penalty values are applied to infeasible individuals so that the algorithm would be able to identify the best infeasible individuals in the current population. The method aims to encourage infeasible individuals with low objective function value and low constraint violation. The number of feasible individuals in the population is used to guide the search process either toward finding more feasible solutions or toward finding the optimum solution. The proposed method is simple to implement and does not need parameter tuning. The performance of the algorithm is tested on 13 benchmark functions in the literature. The results show that the approach is able to find very good solutions comparable to other state-of-the-art designs. Furthermore, it is able to find feasible solutions in every run for all of the benchmark functions.

Proceedings ArticleDOI
16 Oct 2006
TL;DR: In this article, the authors proposed a benchmark for integrating distributed generation in medium voltage distribution networks, which is representative of a real network while it is also designed for ease of use.
Abstract: The widespread use of distributed generation (DG) relies on methods and techniques aimed at facilitating the network integration of DG. In this context a methodology for the evaluation of the quality and relative merits of these methods and techniques is missing. CIGRE Task Force C6.04.02, which is affiliated with CIGRE Study Committee C6, has addressed this problem by proposing a set of resource and network benchmarks. In the present paper, the benchmark for integrating DG in medium voltage distribution networks is described. The proposed benchmark is representative of a real network while it is also designed for ease of use. The application of the benchmark is described through several case studies that show the impact of DG on power flow and voltage profiles at the medium voltage level.

Journal ArticleDOI
TL;DR: This paper proposes to use the gradient information derived from the constraint set to systematically repair infeasible solutions by embedded into a simple GA as a special operator.

Journal ArticleDOI
TL;DR: In this paper, a new computer model called Genetic Algorithm Pipe Network Optimization Model (GENOME) has been developed with the aim of optimizing the design of new looped irrigation water distribution networks.
Abstract: [1] A new computer model called Genetic Algorithm Pipe Network Optimization Model (GENOME) has been developed with the aim of optimizing the design of new looped irrigation water distribution networks. The model is based on a genetic algorithm method, although relevant modifications and improvements have been implemented to adapt the model to this specific problem. It makes use of the robust network solver EPANET. The model has been tested and validated by applying it to the least cost optimization of several benchmark networks reported in the literature. The results obtained with GENOME have been compared with those found in previous works, obtaining the same results as the best published in the literature to date. Once the model was validated, the optimization of a real complex irrigation network has been carried out to evaluate the potential of the genetic algorithm for the optimal design of large-scale networks. Although satisfactory results have been obtained, some adjustments would be desirable to improve the performance of genetic algorithms when the complexity of the network requires it.

Journal Article
TL;DR: The extended benchmark, named University Ontology Benchmark (UOBM), includes both OWL Lite and OWL DL ontologies covering a complete set of OWLlite and DL constructs, respectively, and adds necessary properties to construct effective instance links and improve instance generation methods to make the scalability testing more convincing.
Abstract: Aiming to build a complete benchmark for better evaluation of existing ontology systems, we extend the well-known Lehigh University Benchmark in terms of inference and scalability testing. The extended benchmark, named University Ontology Benchmark (UOBM), includes both OWL Lite and OWL DL ontologies covering a complete set of OWL Lite and DL constructs, respectively. We also add necessary properties to construct effective instance links and improve instance generation methods to make the scalability testing more convincing. Several well-known ontology systems are evaluated on the extended benchmark and detailed discussions on both existing ontology systems and future benchmark development are presented.

Proceedings ArticleDOI
11 Nov 2006
TL;DR: An MPI runtime system that dynamically reduces CPU performance during communication phases in MPI programs and, without profiling or training, selects the CPU frequency in order to minimize energy-delay product is presented.
Abstract: Although users of high-performance computing are most interested in raw performance, both energy and power consumption have become critical concerns. Some microprocessors allow frequency and voltage scaling, which enables a system to reduce CPU performance and power when the CPU is not on the critical path. When properly directed, such dynamic frequency and voltage scaling can produce significant energy savings with little performance penalty. This paper presents an MPI runtime system that dynamically reduces CPU performance during communication phases in MPI programs. It dynamically identifies such phases and, without profiling or training, selects the CPU frequency in order to minimize energy-delay product. All analysis and subsequent frequency and voltage scaling is within MPI and so is entirely transparent to the application. This means that the large number of existing MPI programs, as well as new ones being developed, can use our system without modification. Results show that the average reduction in energy-delay product over the NAS benchmark suite is 10% - the average energy reduction is 12% while the average execution time increase is only 2.1%

Journal ArticleDOI
TL;DR: Since benchmarks drive computer science research and industry product development, which ones the authors use and how they evaluate them are key questions for the community.
Abstract: Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs ...

Proceedings ArticleDOI
27 Mar 2006
TL;DR: Experimental results indicate that FASER achieves good accuracy compared to the SPICE-based simulation method, and can be further improved by more accurate cell library characterization.
Abstract: This paper is concerned with statically analyzing the susceptibility of arbitrary combinational circuits to single event upsets that are becoming a significant concern for reliability of commercial electronics. For the first time, a fast and accurate methodology FASER based on static, vector-less analysis of error rates due to single event upsets in general combinational circuits is proposed. Accurate models are based on STA-like pre-characterization methods, and logical masking is computed via binary decision diagrams with circuit partitioning. Experimental results indicate that FASER achieves good accuracy compared to the SPICE-based simulation method. The average error across the benchmark circuits is 12% at over 90,000X speed-up. The accuracy can be further improved by more accurate cell library characterization. The run-time for ISCAS '85 benchmark circuits ranges from 10 to 120 minutes. The estimated bit error rate (BER) for the ISCAS'85 benchmark circuits implemented in the 100nm CMOS technology is about 10/sup -5/ FIT.

Proceedings ArticleDOI
16 Sep 2006
TL;DR: This paper proposes and evaluates three approaches to transform the raw data set of microarchitecture-independent characteristics into a benchmark space in which the relative distance is a measure for the relative performance differences.
Abstract: A key challenge in benchmarking is to predict the performance of an application of interest on a number of platforms in order to determine which platform yields the best performance. This paper proposes an approach for doing this. We measure a number of microarchitecture-independent characteristics from the application of interest, and relate these characteristics to the characteristics of the programs from a previously profiled benchmark suite. Based on the similarity of the application of interest with programs in the benchmark suite, we make a performance prediction of the application of interest. We propose and evaluate three approaches (normalization, principal components analysis and genetic algorithm) to transform the raw data set of microarchitecture-independent characteristics into a benchmark space in which the relative distance is a measure for the relative performance differences. We evaluate our approach using all of the SPEC CPU2000 benchmarks and real hardware performance numbers from the SPEC website. Our framework estimates per-benchmark machine ranks with a 0.89 average and a 0.80 worst case rank correlation coefficient.

Book ChapterDOI
02 Nov 2006
TL;DR: An unbalanced tree search benchmark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing, and creates versions of UTS in two parallel languages, OpenMP and Unified Parallel C, using work stealing as the mechanism for reducing load imbalance.
Abstract: This paper presents an unbalanced tree search (UTS) benchmark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing. We describe algorithms for building a variety of unbalanced search trees to simulate different forms of load imbalance. We created versions of UTS in two parallel languages, OpenMP and Unified Parallel C (UPC), using work stealing as the mechanism for reducing load imbalance. We benchmarked the performance of UTS on various parallel architectures, including shared-memory systems and PC clusters. We found it simple to implement UTS in both UPC and OpenMP, due to UPC's shared-memory abstractions. Results show that both UPC and OpenMP can support efficient dynamic load balancing on shared-memory architectures. However, UPC cannot alleviate the underlying communication costs of distributed-memory systems. Since dynamic load balancing requires intensive communication, performance portability remains difficult for applications such as UTS and performance degrades on PC clusters. By varying key work stealing parameters, we expose important tradeoffs between the granularity of load balance, the degree of parallelism, and communication costs.

Journal ArticleDOI
TL;DR: A new heuristic algorithm is presented for the two-dimensional irregular stock-cutting problem, which generates significantly better results than the previous state of the art on a wide range of established benchmark problems.
Abstract: This paper presents a new heuristic algorithm for the two-dimensional irregular stock-cutting problem, which generates significantly better results than the previous state of the art on a wide range of established benchmark problems. The developed algorithm is able to pack shapes with a traditional line representation, and it can also pack shapes that incorporate circular arcs and holes. This in itself represents a significant improvement upon the state of the art. By utilising hill climbing and tabu local search methods, the proposed technique produces 25 new best solutions for 26 previously reported benchmark problems drawn from over 20 years of cutting and packing research. These solutions are obtained using reasonable time frames, the majority of problems being solved within five minutes. In addition to this, we also present 10 new benchmark problems, which involve both circular arcs and holes. These are provided because of a shortage of realistic industrial style benchmark problems within the literature and to encourage further research and greater comparison between this and future methods.

Journal ArticleDOI
TL;DR: The first phase of the seismic excited base-isolated benchmark building was received well by the structural control community, culminating in the March 2006 journal special issue as mentioned in this paper, which contained contributions from over dozen participants world-wide.
Abstract: SUMMARY The first phase of the seismically excited base-isolated benchmark building was received well by the structural control community, culminating in the March 2006 journal special issue. The special issue contained contributions from over dozen participants world-wide. While the focus of the Phase I effort was on linear isolation systems, Phase II attempts to galvanize research efforts on control of base-isolated buildings with nonlinear isolation systems. Primarily, friction and hysteretic lead–rubber-bearing (LRB) isolation systems are included in this effort. The superstructure and the control framework remains the same as the Phase I benchmark. The main difference will be in the nonlinear isolation systems used, and consequently the controllers necessary to control such systems. The primary objective of this paper is to present the Phase II benchmark problem definition along with a sample controller for friction isolation system. A sample controller for the LRB isolation system was presented earlier as a part of the Phase I special issue. Included in this paper is a broad set of carefully chosen performance measures, which remain the same as Phase I, so that the participants may evaluate their respective control designs. The control algorithms may be passive, active or semiactive. The benchmark structure considered in the Phase II study is an eight-story base-isolated building that is identical to the one considered for the Phase I study. The base isolation system consists of a combination of linear, nonlinear bearings and control devices. The superstructure is considered to be a linear elastic system with lateral–torsional behavior. The nonlinearities due to the isolators and control devices are limited to the isolation level only. A nonlinear dynamic analysis program and sample controllers are made available to the participants to facilitate direct comparison of results of different control algorithms. Copyright r 2008 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The motivation for the extension is the increasing interest and need to operate and control wastewater treatment systems not only at an individual process level but also on a plant-wide basis and to facilitate the changes, the evaluation period has been extended to one year.

Proceedings ArticleDOI
09 Dec 2006
TL;DR: The use of empirical non-linear modeling techniques to assist processor architects in making design decisions and resolving complex trade-offs and can potentially replace detailed simulation for common tasks such as the analysis of key microarchitectural trends or searches for optimal processor design points.
Abstract: Designing and optimizing high performance microprocessors is an increasingly difficult task due to the size and complexity of the processor design space, high cost of detailed simulation and several constraints that a processor design must satisfy. In this paper, we propose the use of empirical non-linear modeling techniques to assist processor architects in making design decisions and resolving complex trade-offs. We propose a procedure for building accurate non-linear models that consists of the following steps: (i) selection of a small set of representative design points spread across processor design space using latin hypercube sampling, (ii) obtaining performance measures at the selected design points using detailed simulation, (iii) building non-linear models for performance using the function approximation capabilities of radial basis function networks, and (iv) validating the models using an independently and randomly generated set of design points. We evaluate our model building procedure by constructing non-linear performance models for programs from the SPEC CPU2000 benchmark suite with a microarchitectural design space that consists of 9 key parameters. Our results show that the models, built using a relatively small number of simulations, achieve high prediction accuracy (only 2.8% error in CPI estimates on average) across a large processor design space. Our models can potentially replace detailed simulation for common tasks such as the analysis of key microarchitectural trends or searches for optimal processor design points.

Journal ArticleDOI
TL;DR: Computational results on various test problems generalized from a set of static benchmark problems in the literature show that the column-generation-based dynamic approach outperforms the insertion-based heuristic on most test problems.
Abstract: We consider a dynamic vehicle routing problem with hard time windows, in which a set of customer orders arrives randomly over time to be picked up within their time windows. The dispatcher does not have any deterministic or probabilistic information on the location and size of a customer order until it arrives. The objective is to minimize the sum of the total distance of the routes used to cover all the orders. We propose a column-generation-based dynamic approach for the problem. The approach generates single-vehicle trips (i.e., columns) over time in a real-time fashion by utilizing existing columns, and solves at each decision epoch a set-partitioning-type formulation of the static problem consisting of the columns generated up to this time point. We evaluate the performance of our approach by comparing it to an insertion-based heuristic and an approach similar to ours, but without computational time limit for handling the static problem at each decision epoch. Computational results on various test problems generalized from a set of static benchmark problems in the literature show that our approach outperforms the insertion-based heuristic on most test problems.

Journal ArticleDOI
TL;DR: This paper proposes a fully dynamic and online algorithm selection technique, with no separate training phase: all candidate algorithms are run in parallel, while a model incrementally learns their runtime distributions.
Abstract: Algorithm selection can be performed using a model of runtime distribution, learned during a preliminary training phase. There is a trade-off between the performance of model-based algorithm selection, and the cost of learning the model. In this paper, we treat this trade-off in the context of bandit problems. We propose a fully dynamic and online algorithm selection technique, with no separate training phase: all candidate algorithms are run in parallel, while a model incrementally learns their runtime distributions. A redundant set of time allocators uses the partially trained model to propose machine time shares for the algorithms. A bandit problem solver mixes the model-based shares with a uniform share, gradually increasing the impact of the best time allocators as the model improves. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark; and with a set of solvers for the Auction Winner Determination problem.

Journal ArticleDOI
TL;DR: From the study of the similarity between the four generations of SPEC CPU benchmark suites, it is found that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged.
Abstract: This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution of four generations of SPEC CPU benchmark suites. Using the proposed methodology, we find a representative subset of programs from three popular benchmark suites - SPEC CPU2000, MediaBench, and MiBench. We show that this subset of representative programs can be effectively used to estimate the average benchmark suite IPC, L1 data cache miss-rates, and speedup on 11 machines with different ISAs and microarchitectures - this enables one to save simulation time with little loss in accuracy. From our study of the similarity between the four generations of SPEC CPU benchmark suites, we find that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged

Journal ArticleDOI
TL;DR: The experiences gained from a Matlab/Simulink implementation of ADM1 into the extended COST/IWA Benchmark Simulation Model (BSM2) are presented and the main conclusion is that if implemented properly, the ADm1 will also produce high-quality results in dynamic plant-wide simulations including noise, discrete sub-systems, etc. without imposing any major restrictions due to extensive computational efforts.

Journal Article
TL;DR: Parallel software for solving the quadratic program arising in training support vector machines for classification problems implements an iterative decomposition technique and exploits both the storage and the computing resources available on multiprocessor systems.
Abstract: Parallel software for solving the quadratic program arising in training support vector machines for classification problems is introduced. The software implements an iterative decomposition technique and exploits both the storage and the computing resources available on multiprocessor systems, by distributing the heaviest computational tasks of each decomposition iteration. Based on a wide range of recent theoretical advances, relevant decomposition issues, such as the quadratic subproblem solution, the gradient updating, the working set selection, are systematically described and their careful combination to get an effective parallel tool is discussed. A comparison with state-of-the-art packages on benchmark problems demonstrates the good accuracy and the remarkable time saving achieved by the proposed software. Furthermore, challenging experiments on real-world data sets with millions training samples highlight how the software makes large scale standard nonlinear support vector machines effectively tractable on common multiprocessor systems. This feature is not shown by any of the available codes.