scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 2010"


Proceedings ArticleDOI
Brian F. Cooper1, Adam Silberstein1, Erwin Tam1, Raghu Ramakrishnan1, Russell Sears1 
10 Jun 2010
TL;DR: This work presents the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems, and defines a core set of benchmarks and reports results for four widely used systems.
Abstract: While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely recognized and studied, we have recently seen an explosion in the number of systems developed for cloud data serving. These newer systems address "cloud OLTP" applications, though they typically do not support ACID transactions. Examples of systems proposed for cloud serving use include BigTable, PNUTS, Cassandra, HBase, Azure, CouchDB, SimpleDB, Voldemort, and many others. Further, they are being applied to a diverse range of applications that differ considerably from traditional (e.g., TPC-C like) serving workloads. The number of emerging cloud serving systems and the wide range of proposed applications, coupled with a lack of apples-to-apples performance comparisons, makes it difficult to understand the tradeoffs between systems and the workloads for which they are suited. We present the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems. We define a core set of benchmarks and report results for four widely used systems: Cassandra, HBase, Yahoo!'s PNUTS, and a simple sharded MySQL implementation. We also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source. In this regard, a key feature of the YCSB framework/tool is that it is extensible--it supports easy definition of new workloads, in addition to making it easy to benchmark new systems.

3,276 citations


Journal ArticleDOI
TL;DR: The aim of this paper is to present OpenMEEG, both from the theoretical and the practical point of view, and to compare its performances with other competing software packages, to show that it represents the state of the art for forward computations.
Abstract: Interpreting and controlling bioelectromagnetic phenomena require realistic physiological models and accurate numerical solvers. A semi-realistic model often used in practise is the piecewise constant conductivity model, for which only the interfaces have to be meshed. This simplified model makes it possible to use Boundary Element Methods. Unfortunately, most Boundary Element solutions are confronted with accuracy issues when the conductivity ratio between neighboring tissues is high, as for instance the scalp/skull conductivity ratio in electro-encephalography. To overcome this difficulty, we proposed a new method called the symmetric BEM, which is implemented in the OpenMEEG software. The aim of this paper is to present OpenMEEG, both from the theoretical and the practical point of view, and to compare its performances with other competing software packages. We have run a benchmark study in the field of electro- and magneto-encephalography, in order to compare the accuracy of OpenMEEG with other freely distributed forward solvers. We considered spherical models, for which analytical solutions exist, and we designed randomized meshes to assess the variability of the accuracy. Two measures were used to characterize the accuracy. the Relative Difference Measure and the Magnitude ratio. The comparisons were run, either with a constant number of mesh nodes, or a constant number of unknowns across methods. Computing times were also compared. We observed more pronounced differences in accuracy in electroencephalography than in magnetoencephalography. The methods could be classified in three categories: the linear collocation methods, that run very fast but with low accuracy, the linear collocation methods with isolated skull approach for which the accuracy is improved, and OpenMEEG that clearly outperforms the others. As far as speed is concerned, OpenMEEG is on par with the other methods for a constant number of unknowns, and is hence faster for a prescribed accuracy level. This study clearly shows that OpenMEEG represents the state of the art for forward computations. Moreover, our software development strategies have made it handy to use and to integrate with other packages. The bioelectromagnetic research community should therefore be able to benefit from OpenMEEG with a limited development effort.

914 citations


Proceedings ArticleDOI
Shengsheng Huang1, Jie Huang1, Jinquan Dai1, Tao Xie1, Bo Huang1 
01 Mar 2010
TL;DR: This paper presents the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce, and introduces HiBench, a new benchmark suite for Hadoops, which evaluates and characterize theHadoop framework in terms of speed, throughput, and system resource utilizations.
Abstract: The MapReduce model is becoming prominent for the large-scale data analysis in the cloud. In this paper, we present the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce. We first introduce HiBench, a new benchmark suite for Hadoop. It consists of a set of Hadoop programs, including both synthetic micro-benchmarks and real-world Hadoop applications. We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.

750 citations


Proceedings ArticleDOI
14 Mar 2010
TL;DR: This paper develops light-weight cooperative cache management algorithms aimed at maximizing the traffic volume served from cache and minimizing the bandwidth cost, and establishes that the performance of the proposed algorithms is guaranteed to be within a constant factor from the globally optimal performance.
Abstract: The delivery of video content is expected to gain huge momentum, fueled by the popularity of user-generated clips, growth of VoD libraries, and wide-spread deployment of IPTV services with features such as CatchUp/PauseLive TV and NPVR capabilities. The `time-shifted' nature of these personalized applications defies the broadcast paradigm underlying conventional TV networks, and increases the overall bandwidth demands by orders of magnitude. Caching strategies provide an effective mechanism for mitigating these massive bandwidth requirements by replicating the most popular content closer to the network edge, rather than storing it in a central site. The reduction in the traffic load lessens the required transport capacity and capital expense, and alleviates performance bottlenecks. In the present paper, we develop light-weight cooperative cache management algorithms aimed at maximizing the traffic volume served from cache and minimizing the bandwidth cost. As a canonical scenario, we focus on a cluster of distributed caches, either connected directly or via a parent node, and formulate the content placement problem as a linear program in order to benchmark the globally optimal performance. Under certain symmetry assumptions, the optimal solution of the linear program is shown to have a rather simple structure. Besides interesting in its own right, the optimal structure offers valuable guidance for the design of low-complexity cache management and replacement algorithms. We establish that the performance of the proposed algorithms is guaranteed to be within a constant factor from the globally optimal performance, with far more benign worst-case ratios than in prior work, even in asymmetric scenarios. Numerical experiments for typical popularity distributions reveal that the actual performance is far better than the worst-case conditions indicate.

727 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method, using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark.
Abstract: Strand-specific, massively parallel cDNA sequencing (RNA-seq) is a powerful tool for transcript discovery, genome annotation and expression profiling. There are multiple published methods for strand-specific RNA-seq, but no consensus exists as to how to choose between them. Here we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library-construction protocols, including both published and our own methods. We found marked differences in strand specificity, library complexity, evenness and continuity of coverage, agreement with known annotations and accuracy for expression profiling. Weighing each method's performance and ease, we identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.

714 citations


Proceedings ArticleDOI
14 Mar 2010
TL;DR: The Scalable HeterOgeneous Computing benchmark suite (SHOC) is a spectrum of programs that test the performance and stability of scalable heterogeneous computing systems and includes benchmark implementations in both OpenCL and CUDA in order to provide a comparison of these programming models.
Abstract: Scalable heterogeneous computing systems, which are composed of a mix of compute devices, such as commodity multicore processors, graphics processors, reconfigurable processors, and others, are gaining attention as one approach to continuing performance improvement while managing the new challenge of energy efficiency. As these systems become more common, it is important to be able to compare and contrast architectural designs and programming systems in a fair and open forum. To this end, we have designed the Scalable HeterOgeneous Computing benchmark suite (SHOC). SHOC's initial focus is on systems containing graphics processing units (GPUs) and multi-core processors, and on the new OpenCL programming standard. SHOC is a spectrum of programs that test the performance and stability of these scalable heterogeneous computing systems. At the lowest level, SHOC uses microbenchmarks to assess architectural features of the system. At higher levels, SHOC uses application kernels to determine system-wide performance including many system features such as intranode and internode communication among devices. SHOC includes benchmark implementations in both OpenCL and CUDA in order to provide a comparison of these programming models.

620 citations


Journal ArticleDOI
01 Sep 2010
TL;DR: By carefully tuning these factors, the overall performance of Hadoop can be improved by a factor of 2.5 to 3.5, and is thus more comparable to that of parallel database systems.
Abstract: MapReduce has been widely used for large-scale data analysis in the Cloud. The system is well recognized for its elastic scalability and fine-grained fault tolerance although its performance has been noted to be suboptimal in the database context. According to a recent study [19], Hadoop, an open source implementation of MapReduce, is slower than two state-of-the-art parallel database systems in performing a variety of analytical tasks by a factor of 3.1 to 6.5. MapReduce can achieve better performance with the allocation of more compute nodes from the cloud to speed up computation; however, this approach of "renting more nodes" is not cost effective in a pay-as-you-go environment. Users desire an economical elastically scalable data processing system, and therefore, are interested in whether MapReduce can offer both elastic scalability and efficiency.In this paper, we conduct a performance study of MapReduce (Hadoop) on a 100-node cluster of Amazon EC2 with various levels of parallelism. We identify five design factors that affect the performance of Hadoop, and investigate alternative but known methods for each factor. We show that by carefully tuning these factors, the overall performance of Hadoop can be improved by a factor of 2.5 to 3.5 for the same benchmark used in [19], and is thus more comparable to that of parallel database systems. Our results show that it is therefore possible to build a cloud data processing system that is both elastically scalable and efficient.

426 citations


Journal ArticleDOI
TL;DR: In this article, a quantum-inspired particle swarm optimization (QPSO) is proposed, which has stronger search ability and quicker convergence speed, not only because of the introduction of quantum computing theory, but also due to two special implementations: self-adaptive probability selection and chaotic sequences mutation.
Abstract: Economic load dispatch (ELD) is an important topic in the operation of power plants which can help to build up effective generating management plans. The ELD problem has nonsmooth cost function with equality and inequality constraints which make it difficult to be effectively solved. Different heuristic optimization methods have been proposed to solve this problem in previous study. In this paper, quantum-inspired particle swarm optimization (QPSO) is proposed, which has stronger search ability and quicker convergence speed, not only because of the introduction of quantum computing theory, but also due to two special implementations: self-adaptive probability selection and chaotic sequences mutation. The proposed approach is tested with five standard benchmark functions and three power system cases consisting of 3, 13, and 40 thermal units. Comparisons with similar approaches including the evolutionary programming (EP), genetic algorithm (GA), immune algorithm (IA), and other versions of particle swarm optimization (PSO) are given. The promising results illustrate the efficiency of the proposed method and show that it could be used as a reliable tool for solving ELD problems.

288 citations


ReportDOI
01 Sep 2010
TL;DR: The House Simulation Protocol document as mentioned in this paper was developed to track and manage progress toward Building America's multi-year, average whole-building energy reduction research goals for new construction and existing homes, using a consistent analytical reference point.
Abstract: The House Simulation Protocol document was developed to track and manage progress toward Building America's multi-year, average whole-building energy reduction research goals for new construction and existing homes, using a consistent analytical reference point. This report summarizes the guidelines for developing and reporting these analytical results in a consistent and meaningful manner for all home energy uses using standard operating conditions.

284 citations


Proceedings ArticleDOI
16 May 2010
TL;DR: It is shown that Benchmark results of execution time and memory for the Google Chrome v8 Benchmark suite show that the approach is practical for a mainstream browser setting, and how secure multi-execution can be extended to handle language features like exceptions, concurrency or no determinism is discussed.
Abstract: A program is defined to be noninterferent if its outputs cannot be influenced by inputs at a higher security level than their own. Various researchers have demonstrated how this property (or closely related properties) can be achieved through information flow analysis, using either a static analysis (with a type system or otherwise), or using a dynamic monitoring system. We propose an alternative approach, based on a technique we call secure multi-execution. The main idea is to execute a program multiple times, once for each security level, using special rules for I/O operations. Outputs are only produced in the execution linked to their security level. Inputs are replaced by default inputs except in executions linked to their security level or higher. Input side effects are supported by making higher-security-level executions reuse inputs obtained in lower-security-level threads. We show that this approach is interesting from both a theoretical and practical viewpoint. Theoretically, we prove for a simple deterministic language with I/O operations, that this approach guarantees complete soundness (even for the timing and termination covert channels), as well as good precision (identical I/O for terminating runs of termination-sensitively noninterferent programs). On the practical side, we present an experiment implementing secure multi-execution in the mainstream Spider-monkey Javascript engine, exploiting parallelism on a current multi-core computer. Benchmark results of execution time and memory for the Google Chrome v8 Benchmark suite show that the approach is practical for a mainstream browser setting. Certain programs are even executed faster under secure multi-execution than under the standard execution. We discuss challenges and propose possible solutions for implementing the technique in a real browser, in particular handling the DOM tree and browser callback functions. Finally, we discuss how secure multi-execution can be extended to handle language features like exceptions, concurrency or no determinism.

278 citations


Journal ArticleDOI
TL;DR: The proposed algorithm not only improved several of the known solutions, but also presented a very satisfying scalability.

Journal ArticleDOI
TL;DR: The finalised plant layout is summarised and, as was done for BSM1, a default control strategy is proposed and a demonstration of how BSM2 can be used to evaluate control strategies is also given.

Proceedings ArticleDOI
01 May 2010
TL;DR: A benchmark is presented, which is created through the manual inspection of a statistically significant number of e-mails pertaining to six unrelated software systems, and is used to measure the effectiveness of a number of approaches, ranging from lightweight approaches based on regular expressions to full-fledged information retrieval approaches.
Abstract: E-mails concerning the development issues of a system constitute an important source of information about high-level design decisions, low-level implementation concerns, and the social structure of developers. Establishing links between e-mails and the software artifacts they discuss is a non-trivial problem, due to the inherently informal nature of human communication. Different approaches can be brought into play to tackle this trace-ability issue, but the question of how they can be evaluated remains unaddressed, as there is no recognized benchmark against which they can be compared. In this article we present such a benchmark, which we created through the manual inspection of a statistically significant number of e-mails pertaining to six unrelated software systems. We then use our benchmark to measure the effectiveness of a number of approaches, ranging from lightweight approaches based on regular expressions to full-fledged information retrieval approaches.

Journal ArticleDOI
TL;DR: A smooth state-feedback controller is given to guarantee that the solution process is bounded in probability and the error signal between the output and the reference signal can be regulated into a small neighborhood of the origin in probability.
Abstract: This note considers output tracking of high-order stochastic nonlinear systems without imposing any restriction on the high-order and the drift and diffusion terms. By using the backstepping design technique, a smooth state-feedback controller is given to guarantee that the solution process is bounded in probability and the error signal between the output and the reference signal can be regulated into a small neighborhood of the origin in probability. A practical example of stochastic benchmark mechanical system and simulation are provided to demonstrate the effectiveness of the control scheme.

Proceedings ArticleDOI
11 Sep 2010
TL;DR: This work characterize a large set of stream programs that was implemented directly in a stream programming language, allowing new insights into the high-level structure and behavior of the applications.
Abstract: Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. In order to develop effective compilation techniques for the streaming domain, it is important to understand the common characteristics of these programs. Prior characterizations of stream programs have examined legacy implementations in C, C++, or FORTRAN, making it difficult to extract the high-level properties of the algorithms. In this work, we characterize a large set of stream programs that was implemented directly in a stream programming language, allowing new insights into the high-level structure and behavior of the applications. We utilize the StreamIt benchmark suite, consisting of 65 programs and 33,600 lines of code. We characterize the bottlenecks to parallelism, the data reference patterns, the input/output rates, and other properties. The lessons learned have implications for the design of future architectures, languages and compilers for the streaming domain.

Proceedings ArticleDOI
19 Apr 2010
TL;DR: A novel power modelling technique, VMeter, based on online monitoring of system-resources having high correlation with the total power consumption is presented, which predicts instantaneous power consumption of an individual VM hosted on a physical node besides the full system power consumption.
Abstract: Datacenters are seeing unprecedented growth in recent years. The energy requirements to operate these large scale facilities are increasing significantly both in terms of operation cost as well as their indirect impact on ecology due to high carbon emissions. There are several ongoing research efforts towards the development of an integrated cloud management system to provide comprehensive online monitoring of resource utilization along with the implementation of power-aware policies to reduce the total energy consumption. However, most of these techniques provide online power monitoring based on the power consumption of a physical node running one or more Virtual Machines (VM). They lack a fine-grained mechanism to profile the power of an individual hosted VM. In this work we present a novel power modelling technique, VMeter, based on online monitoring of system-resources having high correlation with the total power consumption. The monitored system sub-components include: CPU, cache, disk, and DRAM. The proposed model predicts instantaneous power consumption of an individual VM hosted on a physical node besides the full system power consumption. Our model is validated using computationally diverse and industry standard benchmark programs. Our evaluation results show that our model is able to predict instantaneous power with an average mean and median accuracy of 93% and 94%, respectively, against the actual measured power using an externally attached power meter.

Journal ArticleDOI
TL;DR: The proposed ACO algorithm is applied on some real world benchmark datasets to validate the feasibility and efficiency, which shows that the new ACO-SVM model can yield promising results.
Abstract: One of the significant research problems in support vector machines (SVM) is the selection of optimal parameters that can establish an efficient SVM so as to attain desired output with an acceptable level of accuracy. The present study adopts ant colony optimization (ACO) algorithm to develop a novel ACO-SVM model to solve this problem. The proposed algorithm is applied on some real world benchmark datasets to validate the feasibility and efficiency, which shows that the new ACO-SVM model can yield promising results.

Proceedings ArticleDOI
02 Jun 2010
TL;DR: This paper presents a methodology to produce decomposable PMC-based power models on current multicore architectures and demonstrates that the proposed methodology produces more accurate and responsive power models.
Abstract: Power modeling based on performance monitoring counters (PMCs) attracted the interest of researchers since it became a quick approach to understand and analyse power behavior on real systems. As a result, several power-aware policies use power models to guide their decisions and to trigger low-level mechanisms such as voltage and frequency scaling. Hence, the presence of power models that are informative, accurate and capable of detecting power phases is critical to increase the power-aware research chances and to improve the success of power-saving techniques based on them. In addition, the design of current processors has varied considerably with the inclusion of multiple cores with some resources shared on a single die. As a result, PMC-based power models warrant further investigation on current energy-efficient multi-core processors.In this paper, we present a methodology to produce decomposable PMC-based power models on current multicore architectures. Apart from being able to estimate the power consumption accurately, the models provide per component power consumption, supplying extra insights about power behavior. Moreover, we validate their responsiveness -the capacity to detect power phases-. Specifically, we produce a set of power models for an Intel® Core™ 2 Duo. We model one and two cores for a wide set of DVFS configurations. The models are empirically validated by using the SPEC-cpu2006 benchmark suite and we compare them to other models built using existing approaches. Overall, we demonstrate that the proposed methodology produces more accurate and responsive power models. Concretely, our models show a [1.89--6]% error range and almost 100% accuracy in detecting phase variations above 0.5 watts.

Proceedings ArticleDOI
13 Sep 2010
TL;DR: A decentralized affinity-aware migration technique that incorporates heterogeneity and dynamism in network topology and job communication patterns to allocate virtual machines on the available physical resources and provides both good performance and low network contention with minimal overhead is presented.
Abstract: Virtualization is being widely used in large-scale computing environments, such as clouds, data centers, and grids, to provide application portability and facilitate resource multiplexing while retaining application isolation. In many existing virtualized platforms, it has been found that the network bandwidth often becomes the bottleneck resource, causing both high network contention and reduced performance for communication and data-intensive applications. In this paper, we present a decentralized affinity-aware migration technique that incorporates heterogeneity and dynamism in network topology and job communication patterns to allocate virtual machines on the available physical resources. Our technique monitors network affinity between pairs of VMs and uses a distributed bartering algorithm, coupled with migration, to dynamically adjust VM placement such that communication overhead is minimized. Our experimental results running the Intel MPI benchmark and a scientific application on a 7-node Xen cluster show that we can get up to 42% improvement in the runtime of the application over a no-migration technique, while achieving up to 85% reduction in network communication cost. In addition, our technique is able to adjust to dynamic variations in communication patterns and provides both good performance and low network contention with minimal overhead.

Proceedings ArticleDOI
01 Mar 2010
TL;DR: Parallax builds on recently-developed techniques for estimating the progress of single-site SQL queries, but focuses on the challenges related to parallelism and variable execution speeds.
Abstract: In parallel query-processing environments, accurate, time-oriented progress indicators could provide much utility given that inter- and intra-query execution times can have high variance. However, none of the techniques used by existing tools or available in the literature provide non-trivial progress estimation for parallel queries. In this paper, we introduce Parallax, the first such indicator. While several parallel data processing systems exist, the work in this paper targets environments where queries consist of a series of MapReduce jobs. Parallax builds on recently-developed techniques for estimating the progress of single-site SQL queries, but focuses on the challenges related to parallelism and variable execution speeds. We have implemented our estimator in the Pig system and demonstrate its performance through experiments with the PigMix benchmark and other queries running in a real, small-scale cluster.

Posted Content
TL;DR: In this article, a multi-level graph partitioning algorithm using novel local improvement algorithms and global search strategies transferred from the multi-grid community is presented, which is based on max-flow min-cut computations and more localized FM searches.
Abstract: We present a multi-level graph partitioning algorithm using novel local improvement algorithms and global search strategies transferred from the multi-grid community. Local improvement algorithms are based max-flow min-cut computations and more localized FM searches. By combining these techniques, we obtain an algorithm that is fast on the one hand and on the other hand is able to improve the best known partitioning results for many inputs. For example, in Walshaw's well known benchmark tables we achieve 317 improvements for the tables 1%, 3% and 5% imbalance. Moreover, in 118 additional cases we have been able to reproduce the best cut in this benchmark.

Journal ArticleDOI
TL;DR: A multiobjective genetic fuzzy system (GFS) to learn the granularities of fuzzy partitions, tuning the membership functions (MFs), and learning the fuzzy rules is presented that uses dynamic constraints, which enable three-parameter MF tuning to improve the accuracy while guaranteeing the transparency of fuzzy partition transparency.
Abstract: In this paper, a multiobjective genetic fuzzy system (GFS) to learn the granularities of fuzzy partitions, tuning the membership functions (MFs), and learning the fuzzy rules is presented. It uses dynamic constraints, which enable three-parameter MF tuning to improve the accuracy while guaranteeing the transparency of fuzzy partitions. The fuzzy models (FMs) are initialized by a method that combines the benefits of Wang-Mendel (WM) and decision-tree algorithms. Thus, the initial FMs have less rules, rule conditions, and input variables than if WM initialization were to be used. Moreover, the fuzzy partitions of initial FMs are always transparent. Our approach is tested against recent multiobjective and monoobjective GFSs on six benchmark problems. It is concluded that the accuracy and interpretability of our FMs are always comparable or better than those in the comparative studies. Furthermore, on some benchmark problems, our approach clearly outperforms some comparative approaches. Suitability of our approach for higher dimensional problems is shown by studying three benchmark problems that have up to 21 input variables.

Journal ArticleDOI
TL;DR: A new iterative route construction and improvement algorithm to solve vehicle routing problems with soft time windows that is intuitive and able to accommodate general cost and penalty functions is proposed.
Abstract: The solution of routing problems with soft time windows has valuable practical applications. Soft time window solutions are needed when: (a) the number of routes needed for hard time windows exceeds the number of available vehicles, (b) a study of cost-service tradeoffs is required, or (c) the dispatcher has qualitative information regarding the relative importance of hard time-window constraints across customers. This paper proposes a new iterative route construction and improvement algorithm to solve vehicle routing problems with soft time windows. Due to its modular and hierarchical design, the solution algorithm is intuitive and able to accommodate general cost and penalty functions. Experimental results indicate that the average run time performance is of order O(n2). The solution quality and computational time of the new algorithm has been compared against existing results on benchmark problems. The presented algorithm has improved thirty benchmark problem solutions for the vehicle routing problems with soft time windows.

Journal ArticleDOI
TL;DR: This work shows that effective use of the GPU requires a novel reformulation of the Smith-Waterman algorithm, and indicates that for large problems a single GPU is up to 45 times faster than a CPU for this application.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: The RAMP Gold prototype is a high-throughput, cycle-accurate full-system simulator that runs on a single Xilinx Virtex-5 FPGA board, and which simulates a 64-core shared-memory target machine capable of booting real operating systems.
Abstract: We present RAMP Gold, an economical FPGA-based architecture simulator that allows rapid early design-space exploration of manycore systems. The RAMP Gold prototype is a high-throughput, cycle-accurate full-system simulator that runs on a single Xilinx Virtex-5 FPGA board, and which simulates a 64-core shared-memory target machine capable of booting real operating systems. To improve FPGA implementation efficiency, functionality and timing are modeled separately and host multithreading is used in both models. We evaluate the prototype's performance using a modern parallel benchmark suite running on our manycore research operating system, achieving two orders of magnitude speedup compared to a widely-used software-based architecture simulator.

Journal ArticleDOI
TL;DR: The effect of parameter scalability in a number of state-of-the-art multiobjective metaheuristics is studied and it is concluded that the two analyzed algorithms based on particle swarm optimization and differential evolution yield the best overall results.
Abstract: To evaluate the search capabilities of a multiobjective algorithm, the usual approach is to choose a benchmark of known problems, to perform a fixed number of function evaluations, and to apply a set of quality indicators. However, while real problems could have hundreds or even thousands of decision variables, current benchmarks are normally adopted with relatively few decision variables (normally from 10 to 30). Furthermore, performing a constant number of evaluations does not provide information about the effort required by an algorithm to get a satisfactory set of solutions; this information would also be of interest in real scenarios, where evaluating the functions defining the problem can be computationally expensive. In this paper, we study the effect of parameter scalability in a number of state-of-the-art multiobjective metaheuristics. We adopt a benchmark of parameter-wise scalable problems (the Zitzler-Deb-Thiele test suite) and analyze the behavior of eight multiobjective metaheuristics on these test problems when using a number of decision variables that range from 8 up to 2048. By using the hypervolume indicator as a stopping condition, we also analyze the computational effort required by each algorithm in order to reach the Pareto front. We conclude that the two analyzed algorithms based on particle swarm optimization and differential evolution yield the best overall results.

Journal ArticleDOI
01 Sep 2010
TL;DR: The method is tested with several multimodal benchmark functions and the results show it usually converges to the global minima faster than other evolutionary methods such as Genetic Algorithm and Artificial Bee Colony.
Abstract: This work presents a new optimization technique called Grenade Explosion Method (GEM). The fundamental concepts and ideas which underlie the method are fully explained. It is seen that this simple and robust algorithm is quite powerful in finding all global and some local optima of multimodal functions. The method is tested with several multimodal benchmark functions and the results show it usually converges to the global minima faster than other evolutionary methods such as Genetic Algorithm (GA) and Artificial Bee Colony (ABC). Based on the performance on classical benchmark functions, the efficiency of the method in solving engineering applications can be highly appreciated.

Dissertation
01 Jan 2010
TL;DR: A new dynamic selection procedure improves the Bees Algorithm by reducing the number of parameters and new neighbourhood search methods are adopted to optimise the Pareto front.
Abstract: In the real world, there are many problems requiring the best solution to satisfy numerous objectives and therefore a need for suitable Multi-Objective Optimisation methods. Various Multi-Objective solvers have been developed recently. The classical method is easily implemented but requires repetitive program runs and does not generate a true "Pareto" optimal set. Intelligent methods are increasingly employed, especially population-based optimisation methods to generate the Pareto front in a single run. The Bees Algorithm is a newly developed population-based optimisation algorithm which has been verified in many fields. However, it is limited to solving single optimisation problems. To apply the Bees Algorithm to a Multi- Objective Optimisation Problem, either the problem is converted to single objective optimisation or the Bees Algorithm modified to function as a Multi- Objective solver. To make a problem into a single objective one, the weighted sum method is employed. However, due to failings of this classical method, a new approach is developed to generate a true Pareto front by a single run. This work also introduces an enhanced Bees Algorithm. A new dynamic selection procedure improves the Bees Algorithm by reducing the number of parameters and new neighbourhood search methods are adopted to optimise the Pareto front. The enhanced algorithm has been tested on Multi-Objective benchmark functions and the classical Environmental/Economic power Dispatch Problem (EEDP). The results obtained compare well with those produced by other population- based algorithms. Due to recent trends in renewable energy systems, it is necessary to have a new model of the EEDP. Therefore, the EEDP was amended in conjunction with the Bees Algorithm to identify the best design in terms of energy performance and carbon emission reduction by adopting zero and low carbon technologies. This computer-based tool supports the decision making process in the design of a Low-Carbon City.

Book ChapterDOI
11 Sep 2010
TL;DR: Methods are presented which can be used to analyse the raw data of a benchmark experiment and derive some insight regarding the answers to two basic questions that arise when benchmarking optimization algorithms.
Abstract: We present methods to answer two basic questions that arise when benchmarking optimization algorithms. The first one is: which algorithm is the 'best' one? and the second one: which algorithm should I use for my real world problem? Both are connected and neither is easy to answer. We present methods which can be used to analyse the raw data of a benchmark experiment and derive some insight regarding the answers to these questions. We employ the presented methods to analyse the BBOB'09 benchmark results and present some initial findings.

Proceedings ArticleDOI
18 Jul 2010
TL;DR: Sferes v2 as mentioned in this paper is a C++ framework designed to help researchers in evolutionary computation to make their code run as fast as possible on a multi-core computer, which is based on three main concepts: (1) including multcore optimizations from the start of the design process; (2) providing state-of-the-art implementations of well-selected current evolutionary algorithms (EA), and especially multiobjective EAs; (3) being based on modern (template-based) C++ techniques to be both abstract and efficient.
Abstract: This paper introduces and benchmarks Sferes v2 , a C++ framework designed to help researchers in evolutionary computation to make their code run as fast as possible on a multi-core computer It is based on three main concepts: (1) including multi-core optimizations from the start of the design process; (2) providing state-of-the art implementations of well-selected current evolutionary algorithms (EA), and especially multiobjective EAs; (3) being based on modern (template-based) C++ techniques to be both abstract and efficient Benchmark results show that when a single core is used, running time of classic EAs included in Sferes v2 (NSGA-2 and CMA-ES) are of the same order of magnitude than specialized C code When n cores are used, typical speed-ups range from 075n to 09n; however, parallelization efficiency critically depends on the time to evaluate the fitness function