scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 2011"


Journal ArticleDOI
TL;DR: This work introduces a novel architecture that reduces the usually required large number of elements to a single nonlinear node with delayed feedback and proves that delay-dynamical systems, even in their simplest manifestation, can perform efficient information processing.
Abstract: Novel methods for information processing are highly desired in our information-driven society. Inspired by the brain's ability to process information, the recently introduced paradigm known as 'reservoir computing' shows that complex networks can efficiently perform computation. Here we introduce a novel architecture that reduces the usually required large number of elements to a single nonlinear node with delayed feedback. Through an electronic implementation, we experimentally and numerically demonstrate excellent performance in a speech recognition benchmark. Complementary numerical studies also show excellent performance for a time series prediction benchmark. These results prove that delay-dynamical systems, even in their simplest manifestation, can perform efficient information processing. This finding paves the way to feasible and resource-efficient technological implementations of reservoir computing.

1,121 citations


01 Jan 2011
TL;DR: A methodology to design effective benchmark suites is developed and its effectiveness is demonstrated by developing and deploying a benchmark suite for evaluating multiprocessors called PARSEC, which has been adopted by many architecture groups in both research and industry.
Abstract: Benchmarking has become one of the most important methods for quantitative performance evaluation of processor and computer system designs. Benchmarking of modern multiprocessors such as chip multiprocessors is challenging because of their application domain, scalability and parallelism requirements. In my thesis, I have developed a methodology to design effective benchmark suites and demonstrated its effectiveness by developing and deploying a benchmark suite for evaluating multiprocessors. More specifically, this thesis includes several contributions. First, the thesis shows that a new benchmark suite for multiprocessors is needed because the behavior of modern parallel programs is significantly different from those represented by SPLASH-2, the most popular parallel benchmark suite developed over ten years ago. Second, the thesis quantitatively describes the requirements and characteristics of a set of multithreaded programs and their underlying technology trends. Third, the thesis presents a systematic approach to scale and select benchmark inputs with the goal of optimizing benchmarking accuracy subject to constrained execution or simulation time. Finally, the thesis describes a parallel benchmark suite called PARSEC for evaluating modern shared-memory multiprocessors. Since its initial release, PARSEC has been adopted by many architecture groups in both research and industry.

1,043 citations


Journal ArticleDOI
TL;DR: A benchmark for evaluating the performance of large-scale sketch-based image retrieval systems is introduced and new descriptors based on the bag-of-features approach are developed that significantly outperform other descriptors in the literature.
Abstract: We introduce a benchmark for evaluating the performance of large-scale sketch-based image retrieval systems. The necessary data are acquired in a controlled user study where subjects rate how well given sketch/image pairs match. We suggest how to use the data for evaluating the performance of sketch-based image retrieval systems. The benchmark data as well as the large image database are made publicly available for further studies of this type. Furthermore, we develop new descriptors based on the bag-of-features approach and use the benchmark to demonstrate that they significantly outperform other descriptors in the literature.

419 citations


Journal ArticleDOI
TL;DR: In this paper, the main components, operation/protection modes, and control layers/schemes of medium and high-power PV systems are introduced to assist power engineers in developing circuit-based simulation models for impact assessment studies, analysis, and identification of potential issues with respect to the grid integration of PV systems.
Abstract: This paper presents modeling guidelines and a benchmark system for power system simulation studies of grid-connected, three-phase, single-stage Photovoltaic (PV) systems that employ a voltage-sourced converter (VSC) as the power processor. The objective of this work is to introduce the main components, operation/protection modes, and control layers/schemes of medium- and high-power PV systems, to assist power engineers in developing circuit-based simulation models for impact assessment studies, analysis, and identification of potential issues with respect to the grid integration of PV systems. Parameter selection, control tuning, and design guidelines are also briefly discussed. The usefulness of the benchmark system is demonstrated through a fairly comprehensive set of test cases, conducted in the PSCAD/EMTDC software environment. However, the models and techniques presented in this paper are independent of any specific circuit simulation software package. Also, they may not fully conform to the methods exercised by all manufacturers, due to the proprietary nature of the industry.

348 citations


01 Jan 2011
TL;DR: LegUp as discussed by the authors is a high-level synthesis tool that allows software techniques to be used for hardware design, which can synthesize most of the C language to hardware, including fixed-sized multi-dimensional arrays, structs, global variables and pointer arithmetic.
Abstract: It is generally accepted that a custom hardware implementation of a set of computations will provide superior speed and energy-efficiency relative to a software implementation. However, the cost and difficulty of hardware design is often prohibitive, and consequently, a software approach is used for most applications. In this paper, we introduce a new high-level synthesis tool called LegUp that allows software techniques to be used for hardware design. LegUp accepts a standard C program as input and automatically compiles the program to a hybrid architecture containing an FPGA-based MIPS soft processor and custom hardware accelerators that communicate through a standard bus interface. In the hybrid processor/accelerator architecture, program segments that are unsuitable for hardware implementation can execute in software on the processor. LegUp can synthesize most of the C language to hardware, including fixed-sized multi-dimensional arrays, structs, global variables and pointer arithmetic. Results show that the tool produces hardware solutions of comparable quality to a commercial high-level synthesis tool. We also give results demonstrating the ability of the tool to explore the hardware/software co-design space by varying the amount of a program that runs in software vs. hardware. LegUp, along with a set of benchmark C programs, is open source and freely downloadable, providing a powerful platform that can be leveraged for new research on a wide range of high-level synthesis topics.

250 citations


Journal ArticleDOI
TL;DR: Standard methods for empirical evaluation of multiobjective reinforcement learning algorithms are proposed, and appropriate evaluation metrics and methodologies are proposed for each class.
Abstract: While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms.

248 citations


Journal ArticleDOI
TL;DR: A discrete artificial bee colony algorithm hybridized with a variant of iterated greedy algorithms to find the permutation that gives the smallest total flowtime is presented.

241 citations


Journal ArticleDOI
TL;DR: Milepost GCC is described, the first publicly-available open-source machine learning-based compiler that automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture.
Abstract: Tuning compiler optimizations for rapidly evolving hardware makes porting and extending an optimizing compiler for each new platform extremely challenging. Iterative optimization is a popular approach to adapting programs to a new architecture automatically using feedback-directed compilation. However, the large number of evaluations required for each program has prevented iterative compilation from widespread take-up in production compilers. Machine learning has been proposed to tune optimizations across programs systematically but is currently limited to a few transformations, long training phases and critically lacks publicly released, stable tools. Our approach is to develop a modular, extensible, self-tuning optimization infrastructure to automatically learn the best optimizations across multiple programs and architectures based on the correlation between program features, run-time behavior and optimizations. In this paper we describe Milepost GCC, the first publicly-available open-source machine learning-based compiler. It consists of an Interactive Compilation Interface (ICI) and plugins to extract program features and exchange optimization data with the cTuning.org open public repository. It automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture. Part of the MILEPOST technology together with low-level ICI-inspired plugin framework is now included in the mainline GCC. We developed machine learning plugins based on probabilistic and transductive approaches to predict good combinations of optimizations. Our preliminary experimental results show that it is possible to automatically reduce the execution time of individual MiBench programs, some by more than a factor of 2, while also improving compilation time and code size. On average we are able to reduce the execution time of the MiBench benchmark suite by 11% for the ARC reconfigurable processor. We also present a realistic multi-objective optimization scenario for Berkeley DB library using Milepost GCC and improve execution time by approximately 17%, while reducing compilation time and code size by 12% and 7% respectively on Intel Xeon processor.

230 citations


Journal ArticleDOI
TL;DR: This study provides the first cardiac tissue electrophysiology simulation benchmark to allow these codes to be verified and was successfully evaluated on 11 simulation platforms to generate a consensus gold-standard converged solution.
Abstract: Ongoing developments in cardiac modelling have resulted, in particular, in the development of advanced and increasingly complex computational frameworks for simulating cardiac tissue electrophysiology. The goal of these simulations is often to represent the detailed physiology and pathologies of the heart using codes that exploit the computational potential of high-performance computing architectures. These developments have rapidly progressed the simulation capacity of cardiac virtual physiological human style models; however, they have also made it increasingly challenging to verify that a given code provides a faithful representation of the purported governing equations and corresponding solution techniques. This study provides the first cardiac tissue electrophysiology simulation benchmark to allow these codes to be verified. The benchmark was successfully evaluated on 11 simulation platforms to generate a consensus gold-standard converged solution. The benchmark definition in combination with the gold-standard solution can now be used to verify new simulation codes and numerical methods in the future.

229 citations


Proceedings ArticleDOI
10 Oct 2011
TL;DR: Evaluated how well compilers vectorize a synthetic benchmark consisting of 151 loops, two application from Petascale Application Collaboration Teams (PACT), and eight applications from Media Bench II shows that despite all the work done in vectorization in the last 40 years 45-71% of the loops in the synthetic benchmark and only a few loops from the real applications are vectorized by the compilers.
Abstract: Most of today's processors include vector units that have been designed to speedup single threaded programs. Although vector instructions can deliver high performance, writing vector code in assembly language or using intrinsics in high level languages is a time consuming and error-prone task. The alternative is to automate the process of vectorization by using vectorizing compilers. This paper evaluates how well compilers vectorize a synthetic benchmark consisting of 151 loops, two application from Petascale Application Collaboration Teams (PACT), and eight applications from Media Bench II. We evaluated three compilers: GCC (version 4.7.0), ICC (version 12.0) and XLC (version 11.01). Our results show that despite all the work done in vectorization in the last 40 years 45-71% of the loops in the synthetic benchmark and only a few loops from the real applications are vectorized by the compilers we evaluated.

209 citations


Journal ArticleDOI
TL;DR: This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision.
Abstract: Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches--in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire high-quality control behaviour from scratch. This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed. For all four benchmark problems, extensive and detailed information is provided with which to carry out the evaluations outlined in this article. A close evaluation of our own RL learning scheme, NFQCA (Neural Fitted Q Iteration with Continuous Actions), in acordance with the proposed scheme on all four benchmarks, thereby provides performance figures on both control quality and learning behavior.

Journal ArticleDOI
TL;DR: An effective memetic differential evolution (DE) algorithm that utilizes a chaotic local search (CLS) with a 'shrinking' strategy that is significantly better than, or at least comparable to, the other optimizers in terms of convergence performance and solution accuracy.

Book ChapterDOI
01 Jan 2011
TL;DR: This chapter presents the state of the art probabilistic models used in activity recognition and shows their performance on several real world datasets so that they can be used as a baseline for comparing the performance of other pattern recognition methods.
Abstract: Although activity recognition is an active area of research no common benchmark for evaluating the performance of activity recognition methods exists. In this chapter we present the state of the art probabilistic models used in activity recognition and show their performance on several real world datasets. Our results can be used as a baseline for comparing the performance of other pattern recognition methods (both probabilistic and non-probabilistic). The datasets used in this chapter are made public, together with the source code of the probabilistic models used.

Proceedings ArticleDOI
12 Jun 2011
TL;DR: This paper compares data generated with existing RDF benchmarks and data found in widely used real RDF datasets and shows that simple primitive data metrics are inadequate to flesh out the fundamental differences between real and benchmark data.
Abstract: The widespread adoption of the Resource Description Framework (RDF) for the representation of both open web and enterprise data is the driving force behind the increasing research interest in RDF data management. As RDF data management systems proliferate, so are benchmarks to test the scalability and performance of these systems under data and workloads with various characteristics.In this paper, we compare data generated with existing RDF benchmarks and data found in widely used real RDF datasets. The results of our comparison illustrate that existing benchmark data have little in common with real data. Therefore any conclusions drawn from existing benchmark tests might not actually translate to expected behaviours in real settings. In terms of the comparison itself, we show that simple primitive data metrics are inadequate to flesh out the fundamental differences between real and benchmark data. We make two contributions in this paper: (1) To address the limitations of the primitive metrics, we introduce intuitive and novel metrics that can indeed highlight the key differences between distinct datasets; (2) To address the limitations of existing benchmarks, we introduce a new benchmark generator with the following novel characteristics: (a) the generator can use any (real or synthetic) dataset and convert it into a benchmark dataset; (b) the generator can generate data that mimic the characteristics of real datasets with user-specified data properties. On the technical side, we formulate the benchmark generation problem as an integer programming problem whose solution provides us with the desired benchmark datasets. To our knowledge, this is the first methodological study of RDF benchmarks, as well as the first attempt on generating RDF benchmarks in a principled way.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper characterize the microarchitectural behavior of representative smartphone applications on a current-generation mobile platform to identify trends that might impact future designs, and measures a suite of widely available mobile applications for audio, video, and interactive gaming.
Abstract: Smartphones have recently overtaken PCs as the primary consumer computing device in terms of annual unit shipments. Given this rapid market growth, it is important that mobile system designers and computer architects analyze the characteristics of the interactive applications users have come to expect on these platforms. With the introduction of high-performance, low-power, general purpose CPUs in the latest smartphone models, users now expect PC-like performance and a rich user experience, including high-definition audio and video, high-quality multimedia, dynamic web content, responsive user interfaces, and 3D graphics. In this paper, we characterize the microarchitectural behavior of representative smartphone applications on a current-generation mobile platform to identify trends that might impact future designs. To this end, we measure a suite of widely available mobile applications for audio, video, and interactive gaming. To complete this suite we developed BBench, a new fully-automated benchmark to assess a web-browser's performance when rendering some of the most popular and complex sites on the web. We contrast these applications' characteristics with those of the SPEC CPU2006 benchmark suite. We demonstrate that real-world interactive smartphone applications differ markedly from the SPEC suite. Specifically the instruction cache, instruction TLB, and branch predictor suffer from poor performance. We conjecture that this is due to the applications' reliance on numerous high level software abstractions (shared libraries and OS services). Similar trends have been observed for UI-intensive interactive applications on the desktop.

Journal ArticleDOI
TL;DR: The objective of this paper is to design and implement in a four-tank process several distributed control algorithms that are under investigation in the research groups of the authors within the European project HD-MPC.

Book ChapterDOI
23 Oct 2011
TL;DR: This paper presents FedBench, a comprehensive benchmark suite for testing and analyzing the performance of federated query processing strategies on semantic data, which can be customized to accommodate a variety of use cases and compare competing approaches.
Abstract: In this paper we present FedBench, a comprehensive benchmark suite for testing and analyzing the performance of federated query processing strategies on semantic data. The major challenge lies in the heterogeneity of semantic data use cases, where applications may face different settings at both the data and query level, such as varying data access interfaces, incomplete knowledge about data sources, availability of different statistics, and varying degrees of query expressiveness. Accounting for this heterogeneity, we present a highly flexible benchmark suite, which can be customized to accommodate a variety of use cases and compare competing approaches. We discuss design decisions, highlight the flexibility in customization, and elaborate on the choice of data and query sets. The practicability of our benchmark is demonstrated by a rigorous evaluation of various application scenarios, where we indicate both the benefits as well as limitations of the state-of-the-art federated query processing strategies for semantic data.

Journal ArticleDOI
TL;DR: An algorithm framework that uses multiple search operators in each generation, which demonstrated that both GA and DE based algorithms show competitive, if not better, performance as compared to the state of the art algorithms.

Book ChapterDOI
05 Sep 2011
TL;DR: A multi-level graph partitioning algorithm using novel local improvement algorithms and global search strategies transferred from multigrid linear solvers that is fast on the one hand and able to improve the best known partitioning results for many inputs.
Abstract: We present a multi-level graph partitioning algorithm using novel local improvement algorithms and global search strategies transferred from multigrid linear solvers. Local improvement algorithms are based on max-flow min-cut computations and more localized FM searches. By combining these techniques, we obtain an algorithm that is fast on the one hand and on the other hand is able to improve the best known partitioning results for many inputs. For example, in Walshaw's well known benchmark tables we achieve 317 improvements for the tables at 1%, 3% and 5% imbalance. Moreover, in 118 out of the 295 remaining cases we have been able to reproduce the best cut in this benchmark.

Proceedings ArticleDOI
10 Apr 2011
TL;DR: A benchmark that simulates the feature detection and description stages of feature-based shape retrieval algorithms under a wide variety of transformations is presented.
Abstract: Feature-based approaches have recently become very popular in computer vision and image analysis applications, and are becoming a promising direction in shape retrieval. SHREC'11 robust feature detection and description benchmark simulates the feature detection and description stages of feature-based shape retrieval algorithms. The benchmark tests the performance of shape feature detectors and descriptors under a wide variety of transformations. The benchmark allows evaluating how algorithms cope with certain classes of transformations and strength of the transformations that can be dealt with. The present paper is a report of the SHREC'11 robust feature detection and description benchmark results

Journal ArticleDOI
TL;DR: This paper introduces Map-Join-Reduce, a system that extends and improves MapReduce runtime framework to efficiently process complex data analysis tasks on large clusters and presents a new data processing strategy which performs filtering-join-aggregation tasks in two successive Map Reduce jobs.
Abstract: Data analysis is an important functionality in cloud computing which allows a huge amount of data to be processed over very large clusters. MapReduce is recognized as a popular way to handle data in the cloud environment due to its excellent scalability and good fault tolerance. However, compared to parallel databases, the performance of MapReduce is slower when it is adopted to perform complex data analysis tasks that require the joining of multiple data sets in order to compute certain aggregates. A common concern is whether MapReduce can be improved to produce a system with both scalability and efficiency. In this paper, we introduce Map-Join-Reduce, a system that extends and improves MapReduce runtime framework to efficiently process complex data analysis tasks on large clusters. We first propose a filtering-join-aggregation programming model, a natural extension of MapReduce's filtering-aggregation programming model. Then, we present a new data processing strategy which performs filtering-join-aggregation tasks in two successive MapReduce jobs. The first job applies filtering logic to all the data sets in parallel, joins the qualified tuples, and pushes the join results to the reducers for partial aggregation. The second job combines all partial aggregation results and produces the final answer. The advantage of our approach is that we join multiple data sets in one go and thus avoid frequent checkpointing and shuffling of intermediate results, a major performance bottleneck in most of the current MapReduce-based systems. We benchmark our system against Hive, a state-of-the-art MapReduce-based data warehouse on a 100-node cluster on Amazon EC2 using TPC-H benchmark. The results show that our approach significantly boosts the performance of complex analysis queries.

Journal ArticleDOI
TL;DR: A novel multi-objective evolutionary algorithm is proposed, which incorporates methods for measuring the similarity of solutions, to solve the multi- objective problem and achieves highly competitive results compared with previously published studies and those from a popular evolutionary multi-Objective optimizer.

Posted Content
TL;DR: HyFlex is presented, a software framework for the development of cross-domain search methodologies that features a common software interface for dealing with different combinatorial optimisation problems and provides the algorithm components that are problem specific.
Abstract: Automating the design of heuristic search methods is an active research field within computer science, artificial intelligence and operational research. In order to make these methods more generally applicable, it is important to eliminate or reduce the role of the human expert in the process of designing an effective methodology to solve a given computational search problem. Researchers developing such methodologies are often constrained on the number of problem domains on which to test their adaptive, self-configuring algorithms; which can be explained by the inherent difficulty of implementing their corresponding domain specific software components. This paper presents HyFlex, a software framework for the development of cross-domain search methodologies. The framework features a common software interface for dealing with different combinatorial optimisation problems, and provides the algorithm components that are problem specific. In this way, the algorithm designer does not require a detailed knowledge the problem domains, and thus can concentrate his/her efforts in designing adaptive general-purpose heuristic search algorithms. Four hard combinatorial problems are fully implemented (maximum satisfiability, one dimensional bin packing, permutation flow shop and personnel scheduling), each containing a varied set of instance data (including real-world industrial applications) and an extensive set of problem specific heuristics and search operators. The framework forms the basis for the first International Cross-domain Heuristic Search Challenge (CHeSC), and it is currently in use by the international research community. In summary, HyFlex represents a valuable new benchmark of heuristic search generality, with which adaptive cross-domain algorithms are being easily developed, and reliably compared.

Journal ArticleDOI
TL;DR: A tool of this kind based on graph theory is developed and demonstrated that divides the system into clusters according to the flow directions in pipes and can be utilized for different purposes such as water security enhancements by sensor placements at clusters, or efficient isolation of a contaminant intrusion.
Abstract: Municipal water distribution systems may consist of thousands to tens of thousands of hydraulic components such as pipelines, valves, tanks, hydrants, and pumping units. With the capabilities of today's computers and database management software, ''all pipe'' hydraulic simulation models can be easily constructed. However, the uncertainty and complexity of water distribution systems interrelationships makes it difficult to predict its performances under various conditions such as failure scenarios, detection of sources of contamination intrusions, sensor placement locations, etc. A possible way to cope with these difficulties is to gain insight in to the system behavior by simplifying its operation through topological/connectivity analysis. In this study a tool of this kind based on graph theory is developed and demonstrated. The algorithm divides the system into clusters according to the flow directions in pipes. The resulted clustering is generic and can be utilized for different purposes such as water security enhancements by sensor placements at clusters, or efficient isolation of a contaminant intrusion. The methodology is demonstrated on a benchmark water distribution system from the research literature.

Journal ArticleDOI
TL;DR: Techniques used to implement an unstructured grid solver on modern graphics hardware, and the performance of the solver is demonstrated on two benchmark cases: a NACA0012 wing and a missile.
Abstract: Techniques used to implement an unstructured grid solver on modern graphics hardware are described. The three-dimensional Euler equations for inviscid, compressible ow are considered. Eective memory bandwidth is improved by reducing total global memory access and overlapping redundant computation, as well as using an appropriate numbering scheme and data layout. The applicability of per-block shared memory is also considered. The performance of the solver is demonstrated on two benchmark cases: a missile and the NACA0012 wing. For a variety of mesh sizes, an average speed-up factor of roughly 9.5x is observed over the equivalent parallelized OpenMP-code running on a quad-core CPU, and roughly 33x over the equivalent code running in serial.

Journal ArticleDOI
TL;DR: In this paper, the authors define benchmark models for SUSY searches at the LHC, including the CMSSM, NUHM, mGMSB, mAMSB, MM-AMSB and p19MSSM.
Abstract: We define benchmark models for SUSY searches at the LHC, including the CMSSM, NUHM, mGMSB, mAMSB, MM-AMSB and p19MSSM, as well as models with R-parity violation and the NMSSM. Within the parameter spaces of these models, we propose benchmark subspaces, including planes, lines and points along them. The planes may be useful for presenting results of the experimental searches in different SUSY scenarios, while the specific benchmark points may serve for more detailed detector performance tests and comparisons. We also describe algorithms for defining suitable benchmark points along the proposed lines in the parameter spaces, and we define a few benchmark points motivated by recent fits to existing experimental data.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: In this paper, the authors compare Response Surface (RS) and Differential Evolutionary (DE) algorithms on a permanent magnet synchronous motor (PMSM) design with 5 independent variables and a strong non-linear multi-objective Pareto front and on a function with 11 independent variables.
Abstract: The paper systematically covers the significant developments of the last decade, including surrogate modelling of electrical machines, direct and stochastic search algorithms for both single- and multi- objective design optimization problems. The specific challenges and the dedicated algorithms for electric machine design are discussed, followed by benchmark studies comparing Response Surface (RS) and Differential Evolutionary (DE) algorithms on a permanent magnet synchronous motor (PMSM) design with 5 independent variables and a strong non-linear multi-objective Pareto front and on a function with 11 independent variables. The results show that RS and DE are comparable when the optimization employs only a small number of design candidates and DE performs better when more candidates are included.

Journal ArticleDOI
TL;DR: In this article, the authors define benchmark models for SUSY searches at the LHC, including the CMSSM, NUHM, mGMSB, mAMSB, MM-AMSB and p19MSSM.
Abstract: We define benchmark models for SUSY searches at the LHC, including the CMSSM, NUHM, mGMSB, mAMSB, MM-AMSB and p19MSSM, as well as models with R-parity violation and the NMSSM. Within the parameter spaces of these models, we propose benchmark subspaces, including planes, lines and points along them. The planes may be useful for presenting results of the experimental searches in different SUSY scenarios, while the specific benchmark points may serve for more detailed detector performance tests and comparisons. We also describe algorithms for defining suitable benchmark points along the proposed lines in the parameter spaces, and we define a few benchmark points motivated by recent fits to existing experimental data.

Proceedings ArticleDOI
19 Jul 2011
TL;DR: This paper empirically characterize and analyze the efficacy of AMD Fusion, an architecture that combines general-purposex86 cores and programmable accelerator cores on the same silicon die, and characterize its performance via a set of micro-benchmarks.
Abstract: The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers between the CPU and GPU over PCIe. Emerging heterogeneous computing architectures that "fuse" the functionality of the CPU and GPU, e.g., AMD Fusion and Intel Knights Ferry, hold the promise of addressing the PCIe bottleneck. In this paper, we empirically characterize and analyze the efficacy of AMD Fusion, an architecture that combines general-purposex86 cores and programmable accelerator cores on the same silicon die. We characterize its performance via a set of micro-benchmarks (e.g., PCIe data transfer), kernel benchmarks(e.g., reduction), and actual applications (e.g., molecular dynamics). Depending on the benchmark, our results show that Fusion produces a 1.7 to 6.0-fold improvement in the data-transfer time, when compared to a discrete GPU. In turn, this improvement in data-transfer performance can significantly enhance application performance. For example, running a reduction benchmark on AMD Fusion with its mere 80 GPU cores improves performance by 3.5-fold over the discrete AMD Radeon HD 5870 GPU with its 1600 more powerful GPU cores.

Journal ArticleDOI
TL;DR: This paper presents the initial version of a benchmark set of testing methods for calculating free energies of molecular transformation in solution based on molecular changes common to many molecular design problems, and demonstrates that bootstrap error estimation is a robust and useful technique for estimating statistical variance for all free energy methods studied.
Abstract: There is a significant need for improved tools to validate thermophysical quantities computed via molecular simulation. In this paper we present the initial version of a benchmark set of testing methods for calculating free energies of molecular transformation in solution. This set is based on molecular changes common to many molecular design problems, such as insertion and deletion of atomic sites and changing atomic partial charges. We use this benchmark set to compare the statistical efficiency, reliability, and quality of uncertainty estimates for a number of published free energy methods, including thermodynamic integration, free energy perturbation, the Bennett acceptance ratio (BAR) and its multistate equivalent MBAR. We identify MBAR as the consistently best performing method, though other methods are frequently comparable in reliability and accuracy in many cases. We demonstrate that assumptions of Gaussian distributed errors in free energies are usually valid for most methods studied. We demonstrate that bootstrap error estimation is a robust and useful technique for estimating statistical variance for all free energy methods studied. This benchmark set is provided in a number of different file formats with the hope of becoming a useful and general tool for method comparisons.