Showing papers on "Software portability published in 2012"

PDF

Open Access

Book•

[...]

William E. Hart¹, Carl D. Laird¹, Jean-Paul Watson², David L. Woodruff³•Institutions (3)

Texas A&M University¹, Sandia National Laboratories², University of California, Davis³

15 Feb 2012

TL;DR: This book provides a complete and comprehensive reference/guide to Pyomo (Python Optimization Modeling Objects) for both beginning and advanced modelers, including students at the undergraduate and graduate levels, academic researchers, and practitioners.

...read moreread less

Abstract: This book provides a complete and comprehensive reference/guide to Pyomo (Python Optimization Modeling Objects) for both beginning and advanced modelers, including students at the undergraduate and graduate levels, academic researchers, and practitioners. The text illustrates the breadth of the modeling and analysis capabilities that are supported by the software and support of complex real-world applications. Pyomo is an open source software package for formulating and solving large-scale optimization and operations research problems. The text begins with a tutorial on simple linear and integer programming models. A detailed reference of Pyomo's modeling components is illustrated with extensive examples, including a discussion of how to load data from data sources like spreadsheets and databases. Chapters describing advanced modeling capabilities for nonlinear and stochastic optimization are also included. The Pyomo software provides familiar modeling features within Python, a powerful dynamic programming language that has a very clear, readable syntax and intuitive object orientation. Pyomo includes Python classes for defining sparse sets, parameters, and variables, which can be used to formulate algebraic expressions that define objectives and constraints. Moreover, Pyomo can be used from a command-line interface and within Python's interactive command environment, which makes it easy to create Pyomo models, apply a variety of optimizers, and examine solutions. The software supports a different modeling approach than commercial AML (Algebraic Modeling Languages) tools, and is designed for flexibility, extensibility, portability, and maintainability but also maintains the central ideas in modern AMLs.

...read moreread less

683 citations

Journal Article•DOI•

Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge

[...]

Efraim Rotem¹, Alon Naveh¹, Doron Rajwan¹, Avinash N. Ananthakrishnan¹, Eliezer Weissmann¹ - Show less +1 more•Institutions (1)

Intel¹

01 Mar 2012-IEEE Micro

TL;DR: This article describes power-management innovations introduced on Intel's Sandy Bridge microprocessor, and suggests an architectural approach that's adaptive to and cognizant of workload behavior and platform physical constraints is indispensable to meeting performance and efficiency goals.

...read moreread less

Abstract: Modern microprocessors are evolving into system-on-a-chip designs with high integration levels, catering to ever-shrinking form factors. Portability without compromising performance is a driving market need. An architectural approach that's adaptive to and cognizant of workload behavior and platform physical constraints is indispensable to meeting these performance and efficiency goals. This article describes power-management innovations introduced on Intel's Sandy Bridge microprocessor.

...read moreread less

452 citations

Journal Article•DOI•

Arduino: a low-cost multipurpose lab equipment.

[...]

Alessandro D'Ausilio¹•Institutions (1)

Istituto Italiano di Tecnologia¹

01 Jun 2012-Behavior Research Methods

TL;DR: Accuracy tests show that Arduino boards may be an inexpensive tool for many psychological and neurophysiological labs and may be useful in many lab environments.

...read moreread less

Abstract: Typical experiments in psychological and neurophysiological settings often require the accurate control of multiple input and output signals. These signals are often generated or recorded via computer software and/or external dedicated hardware. Dedicated hardware is usually very expensive and requires additional software to control its behavior. In the present article, I present some accuracy tests on a low-cost and open-source I/O board (Arduino family) that may be useful in many lab environments. One of the strengths of Arduinos is the possibility they afford to load the experimental script on the board’s memory and let it run without interfacing with computers or external software, thus granting complete independence, portability, and accuracy. Furthermore, a large community has arisen around the Arduino idea and offers many hardware add-ons and hundreds of free scripts for different projects. Accuracy tests show that Arduino boards may be an inexpensive tool for many psychological and neurophysiological labs.

...read moreread less

329 citations

Journal Article•DOI•

Decoupling algorithms from schedules for easy optimization of image processing pipelines

[...]

Jonathan Ragan-Kelley¹, Andrew Adams¹, Sylvain Paris², Marc Levoy³, Saman Amarasinghe¹, Frédo Durand¹ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, Adobe Systems², Stanford University³

01 Jul 2012

TL;DR: This work proposes a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity, and demonstrates the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide and compiling them for ARM, x86, and GPUs.

...read moreread less

Abstract: Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code.We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.

...read moreread less

256 citations

Journal Article•DOI•

Portable Cloud Services Using TOSCA

[...]

Tobias Binz¹, Gerd Breiter², F. Leyman¹, Thomas Spatzier²•Institutions (2)

University of Stuttgart¹, IBM²

01 May 2012-IEEE Internet Computing

TL;DR: The authors show how plans in the Topology and Orchestration Specification for Cloud Applications (TOSCA) can enable portability of these operational aspects of the application components themselves.

...read moreread less

Abstract: For cloud services to be portable, their management must also be portable to the targeted environment, as must the application components themselves. Here, the authors show how plans in the Topology and Orchestration Specification for Cloud Applications (TOSCA) can enable portability of these operational aspects.

...read moreread less

233 citations

Proceedings Article•DOI•

The state of the art of mobile application usability evaluation

[...]

Fatih Nayebi¹, Jean-Marc Desharnais¹, Alain Abran¹•Institutions (1)

École de technologie supérieure¹

22 Oct 2012

TL;DR: This paper presents the state of the art of the evaluation and measurement of mobile application usability, and proposes methods to evaluate it.

...read moreread less

Abstract: Mobile devices and applications provide significant advantages to their users, in terms of portability, location awareness, and accessibility. A number of studies have examined usability challenges in the mobile context, and proposed definitions of mobile application usability and methods to evaluate it. This paper presents the state of the art of the evaluation and measurement of mobile application usability.

...read moreread less

224 citations

Proceedings Article•DOI•

MODAClouds: a model-driven approach for the design and execution of applications on multiple clouds

[...]

Danilo Ardagna¹, Elisabetta Di Nitto¹, Giuliano Casale², Dana Petcu, Parastoo Mohagheghi³, Sébastien Mosser³, Peter Matthews, Anke Gericke, Cyril Ballagny⁴, Francesco D'Andria⁵, Cosmin-Septimiu Nechifor⁶, Craig Sheridan - Show less +8 more•Institutions (6)

Polytechnic University of Milan¹, Imperial College London², SINTEF³, Softeam⁴, Atos⁵, Siemens⁶

02 Jun 2012

TL;DR: It is argued that Model-Driven Development can be helpful in this context as it would allow developers to design software systems in a cloud-agnostic way and to be supported by model transformation techniques into the process of instantiating the system into specific, possibly, multiple Clouds.

...read moreread less

Abstract: Cloud computing is emerging as a major trend in the ICT industry. While most of the attention of the research community is focused on considering the perspective of the Cloud providers, offering mechanisms to support scaling of resources and interoperability and federation between Clouds, the perspective of developers and operators willing to choose the Cloud without being strictly bound to a specific solution is mostly neglected. We argue that Model-Driven Development can be helpful in this context as it would allow developers to design software systems in a cloud-agnostic way and to be supported by model transformation techniques into the process of instantiating the system into specific, possibly, multiple Clouds. The MODAClouds (MOdel-Driven Approach for the design and execution of applications on multiple Clouds) approach we present here is based on these principles and aims at supporting system developers and operators in exploiting multiple Clouds for the same system and in migrating (part of) their systems from Cloud to Cloud as needed. MODAClouds offers a quality-driven design, development and operation method and features a Decision Support System to enable risk analysis for the selection of Cloud providers and for the evaluation of the Cloud adoption impact on internal business processes. Furthermore, MODAClouds offers a run-time environment for observing the system under execution and for enabling a feedback loop with the design environment. This allows system developers to react to performance fluctuations and to re-deploy applications on different Clouds on the long term.

...read moreread less

223 citations

Proceedings Article•DOI•

Brief announcement: the problem based benchmark suite

[...]

Julian Shun¹, Guy E. Blelloch¹, Jeremy T. Fineman², Phillip B. Gibbons³, Aapo Kyrola¹, Harsha Vardhan Simhadri¹, Kanat Tangwongsan¹ - Show less +3 more•Institutions (3)

Carnegie Mellon University¹, Georgetown University², Intel³

25 Jun 2012

TL;DR: This announcement describes the problem based benchmark suite (PBBS), a set of benchmarks designed for comparing parallel algorithmic approaches, parallel programming language styles, and machine architectures across a broad set of problems.

...read moreread less

Abstract: This announcement describes the problem based benchmark suite (PBBS). PBBS is a set of benchmarks designed for comparing parallel algorithmic approaches, parallel programming language styles, and machine architectures across a broad set of problems. Each benchmark is defined concretely in terms of a problem specification and a set of input distributions. No requirements are made in terms of algorithmic approach, programming language, or machine architecture. The goal of the benchmarks is not only to compare runtimes, but also to be able to compare code and other aspects of an implementation (e.g., portability, robustness, determinism, and generality). As such the code for an implementation of a benchmark is as important as its runtime, and the public PBBS repository will include both code and performance results.The benchmarks are designed to make it easy for others to try their own implementations, or to add new benchmark problems. Each benchmark problem includes the problem specification, the specification of input and output file formats, default input generators, test codes that check the correctness of the output for a given input, driver code that can be linked with implementations, a baseline sequential implementation, a baseline multicore implementation, and scripts for running timings (and checks) and outputting the results in a standard format. The current suite includes the following problems: integer sort, comparison sort, remove duplicates, dictionary, breadth first search, spanning forest, minimum spanning forest, maximal independent set, maximal matching, K-nearest neighbors, Delaunay triangulation, convex hull, suffix arrays, n-body, and ray casting. For each problem, we report the performance of our baseline multicore implementation on a 40-core machine.

...read moreread less

196 citations

Proceedings Article•DOI•

Go Ahead: A Partial Reconfiguration Framework

[...]

Christian Beckhoff¹, Dirk Koch¹, Jim Torresen¹•Institutions (1)

University of Oslo¹

29 Apr 2012

TL;DR: The tool Go Ahead is introduced that is able to implement run-time reconfigurable systems for all recent Xilinx FPGAs and provides a scripting interface and all features can be accessed remotely.

...read moreread less

Abstract: Exploiting the benefits of partial run-time reconfiguration requires efficient tools. In this paper, we introduce the tool Go Ahead that is able to implement run-time reconfigurable systems for all recent Xilinx FPGAs. This includes in particular support for low cost and low power Spartan-6 FPGAs. Go Ahead assists during floor planning and automates the constraint generation. It interacts with the Xilinx vendor tools and triggers the physical implementation phases all the way down to the final configuration bit streams. Go Ahead enables the building of flexible systems for integrating many reconfigurable modules very efficiently into a system. The tool targets (re)usability, portability to future devices, and migration paths among reconfigurable systems featuring different FPGAs or even FPGA families. Moreover, it provides a scripting interface and all features can be accessed remotely.

...read moreread less

138 citations

Proceedings Article•DOI•

Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs

[...]

Dan Han¹, Chenlei Zhang¹, Xiaochao Fan¹, Abram Hindle¹, Kenny Wong¹, Eleni Stroulia¹ - Show less +2 more•Institutions (1)

University of Alberta¹

15 Oct 2012

TL;DR: This paper finds out how fragmentation is manifested within the Android project and a method for tracking fragmentation using feature analysis on project repositories is proposed and it is found that Labeled-LDA produced better, i.e., more feature oriented, topics than LDA.

...read moreread less

Abstract: The fragmentation of the Android ecosystem causes portability and compatibility issues within the entire Android platform, which increases developer workload, delays application deployment, and ultimately disappoints users. This subject is discussed in the press and in scientific publications but it has yet to be systematically examined. The Android bug reports, as submitted by Android-device users, span across operating-system versions and hardware platforms and can provide interesting evidence about the problem. In this paper, we analyze the bug reports related to two popular vendors, HTC and Motorola. First, we manually label the bug reports. Next, we use Labeled-LDA (Latent Dirichlet Allocation) on the labeled data and LDA on the original data, to infer topics. Finally, by examining the relevance of the top 18 bug topics for each vendor's bug reports over time, we classify topics as common or unique (vendor-specific). The latter category constitutes evidence of fragmentation and lack of portability. By comparing Labeled-LDA against LDA, we find that Labeled-LDA produced better, i.e., more feature oriented, topics than LDA. In this paper we find out how fragmentation is manifested within the Android project and we propose a method for tracking fragmentation using feature analysis on project repositories.

...read moreread less

125 citations

Proceedings Article•DOI•

NeuFlow: Dataflow vision processing system-on-a-chip

[...]

Phi-Hung Pham¹, Darko Jelaca¹, Clement Farabet², Berin Martini¹, Yann LeCun², Eugenio Culurciello¹ - Show less +2 more•Institutions (2)

Purdue University¹, New York University²

05 Sep 2012

TL;DR: The neuFlow SoC was designed to accelerate neural networks and other complex vision algorithms based on large numbers of convolutions and matrix-to-matrix operations and post-layout characterization shows that the system delivers up to 320 GOPS with an average power consumption of 0.6 W.

...read moreread less

Abstract: This paper presents a bio-inspired vision system-on-a-chip - neuFlow SoC implemented in the IBM 45 nm SOI process. The neuFlow SoC was designed to accelerate neural networks and other complex vision algorithms based on large numbers of convolutions and matrix-to-matrix operations. Post-layout characterization shows that the system delivers up to 320 GOPS with an average power consumption of 0.6 W. The power-efficiency and portability of this system is ideal for embedded vision-based devices, such as driver assistance, and robotic vision.

...read moreread less

Journal Article•DOI•

Model-driven engineering techniques for the development of multi-agent systems

[...]

José Manuel Gascueña, Elena Navarro¹, Antonio Fernández-Caballero¹•Institutions (1)

University of Castilla–La Mancha¹

01 Feb 2012-Engineering Applications of Artificial Intelligence

TL;DR: In this paper, agent-oriented software development (AOSD) and MDE paradigms are fully integrated for the development of MAS and meta-modeling techniques are explicitly used to speed up several phases of the process.

...read moreread less

Proceedings Article•DOI•

OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures

[...]

Gihan R. Mudalige¹, Michael B. Giles¹, Istvan Z. Reguly², Carlo Bertolli³, Paul H. J. Kelly³ - Show less +1 more•Institutions (3)

University of Oxford¹, Pázmány Péter Catholic University², Imperial College London³

13 May 2012

TL;DR: It is demonstrated that an application written once at a high-level using the OP2 API can be easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.

...read moreread less

Abstract: OP2 is an “active” library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into different parallel implementations for execution on different back-end hardware platforms. In this paper we present the design of the current OP2 library, and investigate its capabilities in achieving performance portability, near-optimal performance, and scaling on modern multi-core and many-core processor based systems. A key feature of this work is OP2's recent extension facilitating the development and execution of applications on a distributed memory cluster of GPUs. We discuss the main design issues in parallelizing unstructured mesh based applications on heterogeneous platforms. These include handling data dependencies in accessing indirectly referenced data, the impact of unstructured mesh data layouts (array of structs vs. struct of arrays) and design considerations in generating code for execution on a cluster of GPUs. A representative CFD application written using the OP2 framework is utilized to provide a contrasting benchmarking and performance analysis study on a range of multi-core/many-core systems. These include multi-core CPUs from Intel (Westmere and Sandy Bridge) and AMD (Magny-Cours), GPUs from NVIDIA (GTX560Ti, Tesla C2070), a distributed memory CPU cluster (Cray XE6) and a distributed memory GPU cluster (Tesla C2050 GPUs with InfiniBand). OP2's design choices are explored with quantitative insights into their contributions to performance. We demonstrate that an application written once at a high-level using the OP2 API can be easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.

...read moreread less

Proceedings Article•DOI•

Accelerating Hydrocodes with OpenACC, OpeCL and CUDA

[...]

J. A. Herdman, Wayne Gaudin, Simon McIntosh-Smith¹, Michael Boulton¹, David Beckingsale², A. C. Mallinson², Stephen A. Jarvis² - Show less +3 more•Institutions (2)

University of Bristol¹, University of Warwick²

10 Nov 2012

TL;DR: It is found that OpenACC is an extremely viable programming model for accelerator devices, improving programmer productivity and achieving better performance than OpenCL and CUDA.

...read moreread less

Abstract: Hardware accelerators such as GPGPUs are becoming increasingly common in HPC platforms and their use is widely recognised as being one of the most promising approaches for reaching exascale levels of performance. Large HPC centres, such as AWE, have made huge investments in maintaining their existing scientific software codebases, the vast majority of which were not designed to effectively utilise accelerator devices. Consequently, HPC centres will have to decide how to develop their existing applications to take best advantage of future HPC system architectures. Given limited development and financial resources, it is unlikely that all potential approaches will be evaluated for each application. We are interested in how this decision making can be improved, and this work seeks to directly evaluate three candidate technologies-OpenACC, OpenCL and CUDA-in terms of performance, programmer productivity, and portability using a recently developed Lagrangian-Eulerian explicit hydrodynamics mini-application. We find that OpenACC is an extremely viable programming model for accelerator devices, improving programmer productivity and achieving better performance than OpenCL and CUDA.

...read moreread less

Journal Article•DOI•

Algorithmic skeletons for multi-core, multi-GPU systems and clusters

[...]

Steffen Ernsting¹, Herbert Kuchen¹•Institutions (1)

University of Münster¹

01 Apr 2012

TL;DR: This paper presents the skeleton library Muesli, which not only simplifies parallel programming but also allows to write a single application that may be executed on a variety of parallel machines ranging from simple multi-core processors with shared memory to clusters of multi-and many- core processors with distributed memory as well as multi-GPU systems and GPU clusters.

...read moreread less

Abstract: Due to the lack of high-level abstractions, developers of parallel applications have to deal with low-level details such as coordinating threads or synchronising processes. Thus, parallel programming still remains a difficult and error-prone task. In order to shield the user from these low-level details, algorithmic skeletons have been proposed. They encapsulate typical parallel programming patterns and have emerged to be an efficient approach to simplifying the development of parallel applications. In this paper, we present our skeleton library Muesli, which not only simplifies parallel programming. Additionally, it allows to write a single application that may be executed on a variety of parallel machines ranging from simple multi-core processors with shared memory to clusters of multi-and many-core processors with distributed memory as well as multi-GPU systems and GPU clusters. The level of platform independence is not reached by other existing approaches, that simplify parallel programming. Internally, the skeletons are based on MPI, OpenMP and CUDA. We demonstrate portability and efficiency of our approach by providing experimental results.

...read moreread less

Proceedings Article•DOI•

PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes

[...]

Florian Rathgeber¹, Graham Markall¹, Lawrence Mitchell², Nicolas Loriant¹, David A. Ham¹, Carlo Bertolli¹, Paul H. J. Kelly¹ - Show less +3 more•Institutions (2)

Imperial College London¹, University of Edinburgh²

10 Nov 2012

TL;DR: This work presents work in progress on PyOP2, a high-level embedded domain-specific language for mesh-based simulation codes that executes numerical kernels in parallel over unstructured meshes that generates kernels for finite element computations automatically from equations given in the domain- specific Unified Form Language.

...read moreread less

Abstract: Emerging many-core platforms are very difficult to program in a performance portable manner whilst achieving high efficiency on a diverse range of architectures. We present work in progress on PyOP2, a high-level embedded domain-specific language for mesh-based simulation codes that executes numerical kernels in parallel over unstructured meshes. Just-in-time kernel compilation and parallel scheduling are delayed until runtime, when problem-specific parameters are available. Using generative metaprogramming, performance portability is achieved, while details of the parallel implementation are abstracted from the programmer. PyOP2 kernels for finite element computations can be generated automatically from equations given in the domain-specific Unified Form Language. Interfacing to the multi-phase CFD code Fluidity through a very thin layer on top of PyOP2 yields a general purpose finite element solver with an input notation very close to mathematical formulae. Preliminary performance figures show speedups of up to 3.4× compared to Fluidity's built-in solvers when running in parallel.

...read moreread less

Proceedings Article•DOI•

Open data kit sensors: a sensor integration framework for android at the application-level

[...]

Waylon Brunette¹, Rita Sodt¹, Rohit Chaudhri¹, Mayank Goel¹, Michael Falcone¹, Jaylen Van Orden¹, Gaetano Borriello¹ - Show less +3 more•Institutions (1)

University of Washington¹

25 Jun 2012

TL;DR: A framework to simplify the interface between a variety of external sensors and consumer Android devices is presented and three alternative architectures for application-level drivers are explored to understand trade-offs in performance, device portability, simplicity, and deployment ease.

...read moreread less

Abstract: Smartphones can now connect to a variety of external sensors over wired and wireless channels. However, ensuring proper device interaction can be burdensome, especially when a single application needs to integrate with a number of sensors using different communication channels and data formats. This paper presents a framework to simplify the interface between a variety of external sensors and consumer Android devices. The framework simplifies both application and driver development with abstractions that separate responsibilities between the user application, sensor framework, and device driver. These abstractions facilitate a componentized framework that allows developers to focus on writing minimal pieces of sensor-specific code enabling an ecosystem of reusable sensor drivers. The paper explores three alternative architectures for application-level drivers to understand trade-offs in performance, device portability, simplicity, and deployment ease. We explore these tradeoffs in the context of four sensing applications designed to support our work in the developing world. They highlight a range of sensor usage models for our application-level driver framework that vary data types, configuration methods, communication channels, and sampling rates to demonstrate the framework's effectiveness.

...read moreread less

Proceedings Article•DOI•

A work-stealing scheduler for X10's task parallelism with suspension

[...]

Olivier Tardieu¹, Haichuan Wang², Haibo Lin¹•Institutions (2)

IBM¹, University of Illinois at Urbana–Champaign²

25 Feb 2012

TL;DR: It is demonstrated that work-stealing scheduling principles are applicable to a rich programming language such as X10, achieving performance at scale without compromising expressivity, ease of use, or portability.

...read moreread less

Abstract: The X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well as execution across multiple address spaces. A crucial aspect of X10's runtime system is the scheduling of concurrent tasks. Work-stealing schedulers have been shown to efficiently load balance fine-grain divide-and-conquer task-parallel program on SMPs and multicores. But X10 is not limited to shared-memory fork-join parallelism. X10 permits tasks to suspend and synchronize by means of conditional atomic blocks and remote task invocations.In this paper, we demonstrate that work-stealing scheduling principles are applicable to a rich programming language such as X10, achieving performance at scale without compromising expressivity, ease of use, or portability. We design and implement a portable work-stealing execution engine for X10. While this engine is biased toward the efficient execution of fork-join parallelism in shared memory, it handles the full X10 language, especially conditional atomic blocks and distribution.We show that this engine improves the run time of a series of benchmark programs by several orders of magnitude when used in combination with the C++ backend compiler and runtime for X10. It achieves scaling comparable to state-of-the art work-stealing scheduler implementations---the Cilk++ compiler and the Java fork/join framework---despite the dramatic increase in generality.

...read moreread less

Journal Article•DOI•

Compiler mitigations for time attacks on modern x86 processors

[...]

Jeroen Van Cleemput¹, Bart Coppens¹, Bjorn De Sutter¹•Institutions (1)

Ghent University¹

26 Jan 2012

TL;DR: The extent to which automated compiler techniques can defend against timing-based side channel attacks on modern x86 processors is evaluated and the extent towhich compiler backends are a suitable tool to provide automated support for the proposed mitigations are discussed.

...read moreread less

Abstract: This paper studies and evaluates the extent to which automated compiler techniques can defend against timing-based side channel attacks on modern x86 processors. We study how modern x86 processors can leak timing information through side channels that relate to data flow. We study the efficiency, effectiveness, portability, predictability and sensitivity of several mitigating code transformations that eliminate or minimize key-dependent execution time variations. Furthermore, we discuss the extent to which compiler backends are a suitable tool to provide automated support for the proposed mitigations.

...read moreread less

Proceedings Article•DOI•

MAClets: active MAC protocols over hard-coded devices

[...]

Giuseppe Bianchi¹, Pierluigi Gallo², Domenico Garlisi², Fabrizio Giuliano², Francesco Gringoli³, Ilenia Tinnirello² - Show less +2 more•Institutions (3)

University of Rome Tor Vergata¹, University of Palermo², University of Brescia³

10 Dec 2012

TL;DR: This work introduces MAClets, software programs uploaded and executed on-demand over wireless cards, and devised to change the card's real-time medium access control operation, and envision a new architecture for wireless cards based on a protocol interpreter and a powerful API.

...read moreread less

Abstract: We introduce MAClets, software programs uploaded and executed on-demand over wireless cards, and devised to change the card's real-time medium access control operation. MAClets permit seamless reconfiguration of the MAC stack, so as to adapt it to mutated context and spectrum conditions and perform tailored performance optimizations hardly accountable by an once-for-all protocol stack design. Following traditional active networking principles, MAClets can be directly conveyed within data packets and executed on hard-coded devices acting as virtual MAC machines. Indeed, rather than executing a pre-defined protocol, we envision a new architecture for wireless cards based on a protocol interpreter (enabling code portability) and a powerful API. Experiments involving the distribution of MAClets within data packets, and their execution over commodity WLAN cards, show the flexibility and viability of the proposed concept.

...read moreread less

Journal Article•DOI•

Parallel computing in experimental mechanics and optical measurement: A review

[...]

Wenjing Gao¹, Qian Kemao¹•Institutions (1)

Nanyang Technological University¹

01 Apr 2012-Optics and Lasers in Engineering

TL;DR: The effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging are reviewed.

...read moreread less

Proceedings Article•DOI•

OpenCL and the 13 dwarfs: a work in progress

[...]

Wu-chun Feng¹, Heshan Lin¹, Thomas R. W. Scogland¹, Jing Zhang¹•Institutions (1)

Virginia Tech¹

22 Apr 2012

TL;DR: The goal of this combination "Work-in-Progress and Vision" paper is to delineate application requirements in a manner that is not overly specific to individual applications or the optimizations used for certain hardware platforms, so that the authors can draw broader conclusions about hardware requirements.

...read moreread less

Abstract: In the past, evaluating the architectural innovation of parallel computing devices relied on a benchmark suite based on existing programs, e.g., EEMBC or SPEC. However, with the growing ubiquity of parallel computing devices, we argue that it is unclear how best to express parallel computation, and hence, a need exists to identify a higher level of abstraction for reasoning about parallel application requirements. Therefore, the goal of this combination "Work-in-Progress and Vision" paper is to delineate application requirements in a manner that is not overly specific to individual applications or the optimizations used for certain hardware platforms, so that we can draw broader conclusions about hardware requirements. Our initial effort, dubbed "OpenCL and the 13 Dwarfs" or OCD for short, realizes Berkeley's 13 computational dwarfs of scientific computing in OpenCL, where each dwarf captures a pattern of computation and communication that is common to a class of important applications.

...read moreread less

Book Chapter•DOI•

BPMN4TOSCA: A Domain-Specific Language to Model Management Plans for Composite Applications

[...]

Oliver Kopp¹, Tobias Binz¹, Uwe Breitenbücher¹, Frank Leymann¹•Institutions (1)

University of Stuttgart¹

12 Sep 2012

TL;DR: This paper analyzes TOSCA with the focus on requirements on workflow modeling languages to come up with a strong link to the application topology with the goal to improve modeling support.

...read moreread less

Abstract: TOSCA is an upcoming standard to capture cloud application topologies and their management in a portable way. Management aspects include provisioning, operation and deprovisioning of an application. Management plans capture these aspects in workflows. BPMN 2.0 as general-purpose language can be used to model these workflows. There is, however, no tailored support for management plans in BPMN. This paper analyzes TOSCA with the focus on requirements on workflow modeling languages to come up with a strong link to the application topology with the goal to improve modeling support. To simplify the modeling of management plans, we introduce BPMN4TOSCA, which extends BPMN with four TOSCA-specific elements: TOSCA Topology Management Task, TOSCA Node Management Task, TOSCA Script Task, and TOSCA Data Object. Portability is ensured by a transformation of BPMN4TOSCA to plain BPMN. A prototypical modeling tool supports the strong link between the management plan and the TOSCA topology.

...read moreread less

Journal Article•DOI•

Towards a Platform-Independent Cooperative Human Robot Interaction System: III An Architecture for Learning and Executing Actions and Shared Plans

[...]

Stephane Lallee¹, Ugo Pattacini², Séverin Lemaignan, Alexander Lenz, Chris Melhuish, Lorenzo Natale², Sergey Skachek, Katharina Hamann³, Jasmin Steinwender³, Emrah Akin Sisbot, Giorgio Metta², Julien Guitton, Rachid Alami, Matthieu Warnier, Tony Pipe, Felix Warneken⁴, Peter Ford Dominey¹ - Show less +13 more•Institutions (4)

French Institute of Health and Medical Research¹, Istituto Italiano di Tecnologia², Max Planck Society³, Harvard University⁴

01 Sep 2012-IEEE Transactions on Autonomous Mental Development

TL;DR: A cooperative human-robot interaction system that has been specifically developed for portability between different humanoid platforms, by abstraction layers at the perceptual and motor interfaces is presented.

...read moreread less

Abstract: Robots should be capable of interacting in a cooperative and adaptive manner with their human counterparts in open-ended tasks that can change in real-time. An important aspect of the robot behavior will be the ability to acquire new knowledge of the cooperative tasks by observing and interacting with humans. The current research addresses this challenge. We present results from a cooperative human-robot interaction system that has been specifically developed for portability between different humanoid platforms, by abstraction layers at the perceptual and motor interfaces. In the perceptual domain, the resulting system is demonstrated to learn to recognize objects and to recognize actions as sequences of perceptual primitives, and to transfer this learning, and recognition, between different robotic platforms. For execution, composite actions and plans are shown to be learnt on one robot and executed successfully on a different one. Most importantly, the system provides the ability to link actions into shared plans, that form the basis of human-robot cooperation, applying principles from human cognitive development to the domain of robot cognitive systems.

...read moreread less

Journal Article•DOI•

COSMOS: A lightweight coastal video monitoring system

[...]

Rui Taborda¹, Ana Silva¹•Institutions (1)

University of Lisbon¹

01 Dec 2012-Computers & Geosciences

TL;DR: A new lightweight video monitoring system (COSMOS) that has been developed to target several key characteristics including portability, low-cost, robustness and easy installation is presented.

...read moreread less

Proceedings Article•DOI•

Programmability and performance portability aspects of heterogeneous multi-/manycore systems

[...]

Christoph Kessler¹, Usman Dastgeer¹, Samuel Thibault², Raymond Namyst², Andrew Richards, Uwe Dolinsky, Siegfried Benkner³, Jesper Larsson Träff³, Sabri Pllana³ - Show less +5 more•Institutions (3)

Linköping University¹, University of Bordeaux², University of Vienna³

12 Mar 2012

TL;DR: Three complementary approaches that can provide both portability and an increased level of abstraction for the programming of heterogeneous multicore systems are discussed and it is shown how they could complement each other in an integrational programming framework for heterogeneous Multicore systems.

...read moreread less

Abstract: We discuss three complementary approaches that can provide both portability and an increased level of abstraction for the programming of heterogeneous multicore systems. Together, these approaches also support performance portability, as currently investigated in the EU FP7 project PEPPHER. In particular, we consider (1) a library-based approach, here represented by the integration of the SkePU C++ skeleton programming library with the StarPU runtime system for dynamic scheduling and dynamic selection of suitable execution units for parallel tasks; (2) a language-based approach, here represented by the Offload-C++ high-level language extensions and Offload compiler to generate platform-specific code; and (3) a component-based approach, specifically the PEPPHER component system for annotating user-level application components with performance metadata, thereby preparing them for performance-aware composition. We discuss the strengths and weaknesses of these approaches and show how they could complement each other in an integrational programming framework for heterogeneous multicore systems.

...read moreread less

Proceedings Article•DOI•

VirtualRC: a virtual FPGA platform for applications and tools portability

[...]

Robert Kirchgessner¹, Greg Stitt¹, Alan D. George¹, Herman Lam¹•Institutions (1)

University of Florida¹

22 Feb 2012

TL;DR: This paper addresses the portability challenge by introducing a framework of architecture and middleware for virtualization of FPGA platforms, collectively named VirtualRC, and enabling portability of 11 applications and two high-level synthesis tools across three physical platforms.

...read moreread less

Abstract: Numerous studies have shown significant performance and power benefits of field-programmable gate arrays (FPGAs). Despite these benefits, FPGA usage has been limited by application design complexity caused largely by the lack of code and tool portability across different FPGA platforms, which prevents design reuse. This paper addresses the portability challenge by introducing a framework of architecture and middleware for virtualization of FPGA platforms, collectively named VirtualRC. Experiments show modest overhead of 5-6% in performance and 1% in area, while enabling portability of 11 applications and two high-level synthesis tools across three physical platforms.

...read moreread less

Proceedings Article•DOI•

Performance Portability with the Chapel Language

[...]

Albert Sidelnik¹, Saeed Maleki¹, Bradford L. Chamberlain², Maria J. Garzar'n¹, David Padua¹ - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Cray²

21 May 2012

TL;DR: Novel methods and compiler transformations that increase programmer productivity by enabling users of the language Chapel to provide a single code implementation that the compiler can then use to target not only conventional multiprocessors, but also high-throughput and hybrid machines are presented.

...read moreread less

Abstract: It has been widely shown that high-throughput computing architectures such as GPUs offer large performance gains compared with their traditional low-latency counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, loss of portability across different architectures, explicit data movement, and challenges in performance optimization. This paper presents novel methods and compiler transformations that increase programmer productivity by enabling users of the language Chapel to provide a single code implementation that the compiler can then use to target not only conventional multiprocessors, but also high-throughput and hybrid machines. Rather than resorting to different parallel libraries or annotations for a given parallel platform, this work leverages a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of providing portability across different parallel architectures. Finally, this work presents experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA on both GPUs and multicore platforms.

...read moreread less

Journal Article•DOI•

Manycore performance-portability: Kokkos multidimensional array library

[...]

H. Carter Edwards¹, Daniel Sunderland¹, Vicki Porter¹, Chris Amsler², Sam Mish³ - Show less +1 more•Institutions (3)

Sandia National Laboratories¹, Kansas State University², California State University, Los Angeles³

01 Apr 2012-Scientific Programming

TL;DR: The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices.

...read moreread less

Abstract: Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces APIs, and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: 1 manycore compute devices each with its own memory space, 2 data parallel kernels and 3 multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices --potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by 1 separating data access patterns from computational kernels through a multidimensional array API and 2 introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].

...read moreread less

Journal Article•DOI•

Review: Energy-aware performance analysis methodologies for HPC architectures-An exploratory study

[...]

Shajulin Benedict¹•Institutions (1)

Anna University¹

01 Nov 2012-Journal of Network and Computer Applications

TL;DR: This research work will promote HPC application developers to select an apt monitoring mechanism and HPC tool developers to augment required energy monitoring mechanisms which fit well with their basic monitoring infrastructures and validate the existing tools in terms of overhead, portability, and user-friendly parameters.

...read moreread less

Collapse