scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 2012"


Journal ArticleDOI
TL;DR: A comparison study on the basic data-driven methods for process monitoring and fault diagnosis (PM–FD) based on the original ideas, implementation conditions, off-line design and on-line computation algorithms as well as computation complexity are discussed in detail.

1,116 citations


Proceedings ArticleDOI
18 Jun 2012
TL;DR: A new dataset - recorded from 18 activities performed by 9 subjects, wearing 3 IMUs and a HR-monitor - is created and made publicly available, showing the difficulty of the classification tasks and exposes new challenges for physical activity monitoring.
Abstract: This paper addresses the lack of a commonly used, standard dataset and established benchmarking problems for physical activity monitoring. A new dataset -- recorded from 18 activities performed by 9 subjects, wearing 3 IMUs and a HR-monitor -- is created and made publicly available. Moreover, 4 classification problems are benchmarked on the dataset, using a standard data processing chain and 5 different classifiers. The benchmark shows the difficulty of the classification tasks and exposes new challenges for physical activity monitoring.

902 citations


Proceedings ArticleDOI
03 Mar 2012
TL;DR: This work identifies the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.
Abstract: Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads.In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor micro-architecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture. Moreover, while today's predominant micro-architecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.

860 citations


01 Jan 2012
TL;DR: By including versions of varying levels of optimization of the same fundamental algorithm, the Parboil benchmarks present opportunities to demonstrate tools and architectures that help programmers get the most out of their parallel hardware.
Abstract: The Parboil benchmarks are a set of throughput computing applications useful for studying the performance of throughput computing architecture and compilers. The name comes from the culinary term for a partial cooking process, which represents our belief that useful throughput computing benchmarks must be “cooked”, or preselected to implement a scalable algorithm with fine-grained paralle l tasks. But useful benchmarks for this field cannot be “fully cooked”, because the architectures and programming models and supporting tools are evolving rapidly enough that static benchmark codes will lose relevance very quickly. We have collected benchmarks from throughput computing application researchers in many different scientific and commercial fields including image processing, biomolec ular simulation, fluid dynamics, and astronomy. Each benchmark includes several implementations. Some implementations we provide as readable base implementations from which new optimization efforts can begin, and others as examples of the current state-of-the-art targeting specific CPU and GPU architectures. As we continue to optimiz e these benchmarks for new and existing architectures ourselves, we will also gladly accept new implementations and benchmark contributions from developers to recognize those at the frontier of performance optimization on each architecture. Finally, by including versions of varying levels of optimization of the same fundamental algorithm, the benchmarks present opportunities to demonstrate tools and architectures that help programmers get the most out of their parallel hardware. Less optimized versions are presented as challenges to the compiler and architecture research communities: to develop the technology that automatically raises the performance of simpler implementations to the performance level of sophisticated programmer-optimized implementations, or demonstrate any other performance or programmability improvements. We hope that these benchmarks will facilitate effective demonstrations of such technology.

695 citations


13 Jan 2012
TL;DR: A benchmark data set containing 300 natural images with eye tracking data from 39 observers is proposed to compare model performances and it is shown that human performance increases with the number of humans to a limit.
Abstract: Many computational models of visual attention have been created from a wide variety of different approaches to predict where people look in images. Each model is usually introduced by demonstrating performances on new images, and it is hard to make immediate comparisons between models. To alleviate this problem, we propose a benchmark data set containing 300 natural images with eye tracking data from 39 observers to compare model performances. We calculate the performance of 10 models at predicting ground truth fixations using three different metrics. We provide a way for people to submit new models for evaluation online. We find that the Judd et al. and Graph-based visual saliency models perform best. In general, models with blurrier maps and models that include a center bias perform well. We add and optimize a blur and center bias for each model and show improvements. We compare performances to baseline models of chance, center and human performance. We show that human performance increases with the number of humans to a limit. We analyze the similarity of different models using multidimensional scaling and explore the relationship between model performance and fixation consistency. Finally, we offer observations about how to improve saliency models in the future.

564 citations


Journal ArticleDOI
TL;DR: The results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.
Abstract: Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.

536 citations


Proceedings ArticleDOI
19 Sep 2012
TL;DR: This paper presents Multi2Sim, an open-source, modular, and fully configurable toolset that enables ISA-level simulation of an ×86 CPU and an AMD Evergreen GPU, and addresses program emulation correctness, as well as architectural simulation accuracy, using AMD's OpenCL benchmark suite.
Abstract: Accurate simulation is essential for the proper design and evaluation of any computing platform. Upon the current move toward the CPU-GPU heterogeneous computing era, researchers need a simulation framework that can model both kinds of computing devices and their interaction. In this paper, we present Multi2Sim, an open-source, modular, and fully configurable toolset that enables ISA-level simulation of an ×86 CPU and an AMD Evergreen GPU. Focusing on a model of the AMD Radeon 5870 GPU, we address program emulation correctness, as well as architectural simulation accuracy, using AMD's OpenCL benchmark suite. Simulation capabilities are demonstrated with a preliminary architectural exploration study, and workload characterization examples. The project source code, benchmark packages, and a detailed user's guide are publicly available at www.multi2sim.org.

440 citations


Posted Content
TL;DR: Heuristic search value iteration (HSVI) as mentioned in this paper is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy, which can be used to solve POMDP problems.
Abstract: We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI).HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.

439 citations


Journal ArticleDOI
TL;DR: This paper presents an efficient implementation of the particle–particle particle-mesh method based on the work by Harvey and De Fabritiis, and provides a performance comparison of the same kernels compiled with both CUDA and OpenCL.

381 citations


Proceedings ArticleDOI
25 Mar 2012
TL;DR: This work proposes an efficient online algorithm that employs the real data center traffic traces under a spectrum of elephant and mice flows and demonstrates a consistent and significant improvement over the benchmark achieved by common heuristics.
Abstract: Today's data centers need efficient traffic management to improve resource utilization in their networks In this work, we study a joint tenant (eg, server or virtual machine) placement and routing problem to minimize traffic costs These two complementary degrees of freedom—placement and routing—are mutually-dependent, however, are often optimized separately in today's data centers Leveraging and expanding the technique of Markov approximation, we propose an efficient online algorithm in a dynamic environment under changing traffic loads The algorithm requires a very small number of virtual machine migrations and is easy to implement in practice Performance evaluation that employs the real data center traffic traces under a spectrum of elephant and mice flows, demonstrates a consistent and significant improvement over the benchmark achieved by common heuristics

377 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work proposes a new technique to extend an existing training set that allows to explicitly control pose and shape variations and defines a new challenge of combined articulated human detection and pose estimation in real-world scenes.
Abstract: State-of-the-art methods for human detection and pose estimation require many training samples for best performance. While large, manually collected datasets exist, the captured variations w.r.t. appearance, shape and pose are often uncontrolled thus limiting the overall performance. In order to overcome this limitation we propose a new technique to extend an existing training set that allows to explicitly control pose and shape variations. For this we build on recent advances in computer graphics to generate samples with realistic appearance and background while modifying body shape and pose. We validate the effectiveness of our approach on the task of articulated human detection and articulated pose estimation. We report close to state of the art results on the popular Image Parsing [25] human pose estimation benchmark and demonstrate superior performance for articulated human detection. In addition we define a new challenge of combined articulated human detection and pose estimation in real-world scenes.

Journal ArticleDOI
TL;DR: This article proposes a distributed technique to perform materialization under the RDFS and OWL ter Horst semantics using the MapReduce programming model and shows that it scales linearly and vastly outperforms current systems in terms of maximum data size and inference speed.

Proceedings ArticleDOI
Jung-Won Kim1, Sangmin Seo1, Jun Lee1, Jeongho Nah1, Gangwon Jo1, Jaejin Lee1 
25 Jun 2012
TL;DR: It is shown that the original OpenCL semantics naturally fits to the heterogeneous cluster programming environment, and the framework achieves high performance and ease of programming.
Abstract: In this paper, we propose SnuCL, an OpenCL framework for heterogeneous CPU/GPU clusters. We show that the original OpenCL semantics naturally fits to the heterogeneous cluster programming environment, and the framework achieves high performance and ease of programming. The target cluster architecture consists of a designated, single host node and many compute nodes. They are connected by an interconnection network, such as Gigabit Ethernet and InfiniBand switches. Each compute node is equipped with multicore CPUs and multiple GPUs. A set of CPU cores or each GPU becomes an OpenCL compute device. The host node executes the host program in an OpenCL application. SnuCL provides a system image running a single operating system instance for heterogeneous CPU/GPU clusters to the user. It allows the application to utilize compute devices in a compute node as if they were in the host node. No communication API, such as the MPI library, is required in the application source. SnuCL also provides collective communication extensions to OpenCL to facilitate manipulating memory objects. With SnuCL, an OpenCL application becomes portable not only between heterogeneous devices in a single node, but also between compute devices in the cluster environment. We implement SnuCL and evaluate its performance using eleven OpenCL benchmark applications.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper proposes a fusion algorithm which outputs enhanced metrics by combining multiple given metrics (similarity measures) through diffusion process in an unsupervised way and has a wide range of applications in machine learning and computer vision.
Abstract: Metric learning is a fundamental problem in computer vision. Different features and algorithms may tackle a problem from different angles, and thus often provide complementary information. In this paper, we propose a fusion algorithm which outputs enhanced metrics by combining multiple given metrics (similarity measures). Unlike traditional co-training style algorithms where multi-view features or multiple data subsets are used for classification or regression, we focus on fusing multiple given metrics through diffusion process in an unsupervised way. Our algorithm has its particular advantage when the input similarity matrices are the outputs from diverse algorithms. We provide both theoretical and empirical explanations to our method. Significant improvements over the state-of-the-art results have been observed on various benchmark datasets. For example, we have achieved 100% accuracy (no longer the bull's eye measure) on the MPEG-7 shape dataset. Our method has a wide range of applications in machine learning and computer vision.

Journal ArticleDOI
TL;DR: The Mario AI benchmark is described, a game-based benchmark for reinforcement learning algorithms and game AI techniques developed by the authors, intended as the definitive point of reference for those using the benchmark for research or teaching.
Abstract: This paper describes the Mario AI benchmark, a game-based benchmark for reinforcement learning algorithms and game AI techniques developed by the authors. The benchmark is based on a public domain clone of Nintendo's classic platform game Super Mario Bros, and completely open source. During the last two years, the benchmark has been used in a number of competitions associated with international conferences, and researchers and students from around the world have contributed diverse solutions to try to beat the benchmark. The paper summarizes these contributions, gives an overview of the state of the art in Mario-playing AIs, and chronicles the development of the benchmark. This paper is intended as the definitive point of reference for those using the benchmark for research or teaching.

Journal ArticleDOI
TL;DR: This paper compares the performance of RCCRO with a large number of optimization techniques on a large set of standard continuous benchmark functions and finds that RCC RO outperforms all the others on the average, showing that CRO is suitable for solving problems in the continuous domain.
Abstract: Optimization problems can generally be classified as continuous and discrete, based on the nature of the solution space. A recently developed chemical-reaction-inspired metaheuristic, called chemical reaction optimization (CRO), has been shown to perform well in many optimization problems in the discrete domain. This paper is dedicated to proposing a real-coded version of CRO, namely, RCCRO, to solve continuous optimization problems. We compare the performance of RCCRO with a large number of optimization techniques on a large set of standard continuous benchmark functions. We find that RCCRO outperforms all the others on the average. We also propose an adaptive scheme for RCCRO which can improve the performance effectively. This shows that CRO is suitable for solving problems in the continuous domain.

Journal ArticleDOI
TL;DR: In this article, an algorithm that can tackle time dependent vehicle routing problems with hard or soft time windows without any alteration in its structure is presented, and experimental results indicate that average computational time increases proportionally to the number of customers squared.
Abstract: An algorithm that can tackle time dependent vehicle routing problems with hard or soft time windows without any alteration in its structure is presented. Analytical and experimental results indicate that average computational time increases proportionally to the number of customers squared. New replicable test problems that capture the typical speed variations of congested urban settings are proposed. Solution quality, time window perturbations, and computational time results are discussed as well as a method to study the impact of perturbations by problem type. The algorithm efficiency and simplicity is well suited for urban areas where fast running times may be required.

Journal ArticleDOI
TL;DR: A Variable Neighborhood Search (VNS) procedure based on the idea of exploring, most of the time, granular instead of complete neighborhoods in order to improve the algorithm’s efficiency without loosing effectiveness is proposed.

Journal ArticleDOI
TL;DR: The proposed hybrid algorithm is composed by an Iterated Local Search (ILS) based heuristic and a Set Partitioning (SP) formulation, which is solved by means of a Mixed Integer Programming solver that interactively calls the ILS heuristic during its execution.

Journal ArticleDOI
TL;DR: This paper aims at providing a picture – as complete as possible – of the present state of the art in the semi-active suspension control field in terms of comfort and road-holding performance evaluation and trade-off.

Book ChapterDOI
11 Nov 2012
TL;DR: SRBench is introduced, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud, which defines a concise, yet comprehensive set of queries that cover the major aspects of strRS processing.
Abstract: We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet comprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art.

Proceedings ArticleDOI
11 Jun 2012
TL;DR: This paper presents a dynamic program analysis that supports the programmer in finding accuracy problems and uses binary translation to perform every floating-point computation side by side in higher precision and a lightweight slicing approach to track the evolution of errors.
Abstract: Programs using floating-point arithmetic are prone to accuracy problems caused by rounding and catastrophic cancellation. These phenomena provoke bugs that are notoriously hard to track down: the program does not necessarily crash and the results are not necessarily obviously wrong, but often subtly inaccurate. Further use of these values can lead to catastrophic errors.In this paper, we present a dynamic program analysis that supports the programmer in finding accuracy problems. Our analysis uses binary translation to perform every floating-point computation side by side in higher precision. Furthermore, we use a lightweight slicing approach to track the evolution of errors.We evaluate our analysis by demonstrating that it catches wellknown floating-point accuracy problems and by analyzing the Spec CFP2006 floating-point benchmark. In the latter, we show how our tool tracks down a catastrophic cancellation that causes a complete loss of accuracy leading to a meaningless program result. Finally, we apply our program to a complex, real-world bioinformatics application in which our program detected a serious cancellation. Correcting the instability led not only to improved quality of the result, but also to an improvement of the program's run time.In this paper, we present a dynamic program analysis that supports the programmer in finding accuracy problems. Our analysis uses binary translation to perform every floating-point computation side by side in higher precision. Furthermore, we use a lightweight slicing approach to track the evolution of errors. We evaluate our analysis by demonstrating that it catches wellknown floating-point accuracy problems and by analyzing the SpecfiCFP2006 floating-point benchmark. In the latter, we show how our tool tracks down a catastrophic cancellation that causes a complete loss of accuracy leading to a meaningless program result. Finally, we apply our program to a complex, real-world bioinformatics application in which our program detected a serious cancellation. Correcting the instability led not only to improved quality of the result, but also to an improvement of the program's run time.

Proceedings ArticleDOI
24 Jun 2012
TL;DR: This work presents the SmartScale automated scaling framework, a combination of vertical and horizontal scaling that ensures that the application is scaled in a manner that optimizes both resource usage and the reconfiguration cost incurred due to scaling.
Abstract: Enterprise clouds today support an on demand resource allocation model and can provide resources requested by applications in a near online manner using virtual machine resizing or cloning. However, in order to take advantage of an on demand resource model, enterprise applications need to be automatically scaled in a way that makes the most efficient use of resources. In this work, we present the SmartScale automated scaling framework. SmartScale uses a combination of vertical (adding more resources to existing VM instances) and horizontal (adding more VM instances) scaling to ensure that the application is scaled in a manner that optimizes both resource usage and the reconfiguration cost incurred due to scaling. The SmartScale methodology is proactive and ensures that the application converges quickly to the desired scaling level even when the workload intensity changes significantly. We evaluate SmartScale using real production traces on Olio, an emerging cloud benchmark, running on a ???-based cloud testbed. We present both theoretical and experimental evidence that comprehensively establish the effectiveness of SmartScale.

Proceedings ArticleDOI
17 Sep 2012
TL;DR: The PRISM benchmark suite is presented: a collection of probabilistic models and property specifications, designed to facilitate testing, benchmarking and comparisons of Probabilistic verification tools and implementations.
Abstract: We present the PRISM benchmark suite: a collection of probabilistic models and property specifications, designed to facilitate testing, benchmarking and comparisons of probabilistic verification tools and implementations.

Journal ArticleDOI
01 Feb 2012
TL;DR: SharedDB as mentioned in this paper is a new database architecture that is based on batching queries and shared computation across possibly hundreds of concurrent queries and updates, which is robust across a wide range of dynamic workloads.
Abstract: Traditional database systems are built around the query-at-a-time model. This approach tries to optimize performance in a best-effort way. Unfortunately, best effort is not good enough for many modern applications. These applications require response time guarantees in high load situations. This paper describes the design of a new database architecture that is based on batching queries and shared computation across possibly hundreds of concurrent queries and updates. Performance experiments with the TPC-W benchmark show that the performance of our implementation, SharedDB, is indeed robust across a wide range of dynamic workloads.

Posted Content
TL;DR: This paper describes the design of a new database architecture that is based on batching queries and shared computation across possibly hundreds of concurrent queries and updates, and shows that the implementation, SharedDB, is robust across a wide range of dynamic workloads.
Abstract: Traditional database systems are built around the query-at-a-time model. This approach tries to optimize performance in a best-effort way. Unfortunately, best effort is not good enough for many modern applications. These applications require response time guarantees in high load situations. This paper describes the design of a new database architecture that is based on batching queries and shared computation across possibly hundreds of concurrent queries and updates. Performance experiments with the TPC-W benchmark show that the performance of our implementation, SharedDB, is indeed robust across a wide range of dynamic workloads.

Journal ArticleDOI
TL;DR: This paper proposes an analysis and comparison of state-of-the-art algorithms for full search equivalent pattern matching and proposes extensions of the evaluated algorithms that show that they outperform the original formulations.
Abstract: Pattern matching is widely used in signal processing, computer vision, and image and video processing. Full search equivalent algorithms accelerate the pattern matching process and, in the meantime, yield exactly the same result as the full search. This paper proposes an analysis and comparison of state-of-the-art algorithms for full search equivalent pattern matching. Our intention is that the data sets and tests used in our evaluation will be a benchmark for testing future pattern matching algorithms, and that the analysis concerning state-of-the-art algorithms could inspire new fast algorithms. We also propose extensions of the evaluated algorithms and show that they outperform the original formulations.

Journal ArticleDOI
TL;DR: In this paper, a new heuristic random search algorithm named state transition algorithm is proposed for continuous function optimization problems, four special transformation operators called rotation, translation, expansion and axesion are designed.
Abstract: In terms of the concepts of state and state transition, a new heuristic random search algorithm named state transition algorithm is proposed. For continuous function optimization problems, four special transformation operators called rotation, translation, expansion and axesion are designed. Adjusting measures of the transformations are mainly studied to keep the balance of exploration and exploitation. Convergence analysis is also discussed about the algorithm based on random search theory. In the meanwhile, to strengthen the search ability in high dimensional space, communication strategy is introduced into the basic algorithm and intermittent exchange is presented to prevent premature convergence. Finally, experiments are carried out for the algorithms. With 10 common benchmark unconstrained continuous functions used to test the performance, the results show that state transition algorithms are promising algorithms due to their good global search capability and convergence property when compared with some popular algorithms.

Proceedings ArticleDOI
05 Sep 2012
TL;DR: This work introduces an open benchmark dataset to investigate inertial sensor displacement effects in activity recognition, and introduces a concept of gradual sensor displacement conditions, including ideal, self-placement of a user, and mutual displacement deployments.
Abstract: This work introduces an open benchmark dataset to investigate inertial sensor displacement effects in activity recognition. While sensor position displacements such as rotations and translations have been recognised as a key limitation for the deployment of wearable systems, a realistic dataset is lacking. We introduce a concept of gradual sensor displacement conditions, including ideal, self-placement of a user, and mutual displacement deployments. These conditions were analysed in the dataset considering 33 fitness activities, recorded using 9 inertial sensor units from 17 participants. Our statistical analysis of acceleration features quantified relative effects of the displacement conditions. We expect that the dataset can be used to benchmark and compare recognition algorithms in the future.

Journal ArticleDOI
Ling Wang1, Shengyao Wang1, Ye Xu1, Gang Zhou1, Min Liu1 
TL;DR: Comparisons between BEDA and some existing algorithms as well as the single-population based EDA demonstrate the effectiveness of the proposed B EDA in solving the FJSP.