Showing papers on "Software portability published in 2018"

PDF

Open Access

Proceedings Article•DOI•

TVM: an automated end-to-end optimizing compiler for deep learning

[...]

Tianqi Chen¹, Thierry Moreau¹, Ziheng Jiang¹, Lianmin Zheng², Eddie Yan¹, Meghan Cowan¹, Haichen Shen¹, Leyuan Wang³, Yuwei Hu⁴, Luis Ceze¹, Carlos Guestrin¹, Arvind Krishnamurthy¹ - Show less +8 more•Institutions (4)

University of Washington¹, Shanghai Jiao Tong University², University of California, Davis³, Cornell University⁴

08 Oct 2018

TL;DR: TVM as discussed by the authors is a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends, such as mobile phones, embedded devices, and accelerators.

...read moreread less

Abstract: There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms - such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) - requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.

...read moreread less

991 citations

Journal Article•DOI•

RIOT: An Open Source Operating System for Low-End Embedded Devices in the IoT

[...]

Emmanuel Baccelli¹, Cenk Gündoğan², Oliver Hahm, Peter Kietzmann², Martine Lenders³, Hauke Petersen³, Kaspar Schleiser, Thomas C. Schmidt², Matthias Wählisch³ - Show less +5 more•Institutions (3)

French Institute for Research in Computer Science and Automation¹, Hamburg University of Applied Sciences², Free University of Berlin³

13 Mar 2018-IEEE Internet of Things Journal

TL;DR: This paper provides the first comprehensive overview of RIOT, covering the key components of interest to potential developers and users: the kernel, hardware abstraction, and software modularity, both conceptually and in practice for various example configurations.

...read moreread less

Abstract: As the Internet of Things (IoT) emerges, compact operating systems (OSs) are required on low-end devices to ease development and portability of IoT applications. RIOT is a prominent free and open source OS in this space. In this paper, we provide the first comprehensive overview of RIOT. We cover the key components of interest to potential developers and users: the kernel, hardware abstraction, and software modularity, both conceptually and in practice for various example configurations. We explain operational aspects like system boot-up, timers, power management, and the use of networking. Finally, the relevant APIs as exposed by the OS are discussed along with the larger ecosystem around RIOT, including development and open source community aspects.

...read moreread less

181 citations

Posted Content•

TVM: End-to-End Optimization Stack for Deep Learning

[...]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy - Show less +6 more

12 Feb 2018-arXiv: Learning

TL;DR: TVM is proposed, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and discusses the optimization challenges specific toDeep learning that TVM solves.

...read moreread less

Abstract: Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current popularity and utility of deep learning. However, these frameworks are optimized for a narrow range of server-class GPUs and deploying workloads to other platforms such as mobile phones, embedded devices, and specialized accelerators (e.g., FPGAs, ASICs) requires laborious manual effort. We propose TVM, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. We discuss the optimization challenges specific to deep learning that TVM solves: high-level operator fusion, low-level memory reuse across threads, mapping to arbitrary hardware primitives, and memory latency hiding. Experimental results demonstrate that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art libraries for low-power CPU and server-class GPUs. We also demonstrate TVM's ability to target new hardware accelerator back-ends by targeting an FPGA-based generic deep learning accelerator. The compiler infrastructure is open sourced.

...read moreread less

161 citations

Posted Content•

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

[...]

University of Washington¹, Shanghai Jiao Tong University², University of California, Davis³, Cornell University⁴

12 Feb 2018-arXiv: Learning

TL;DR: TVM is a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations.

...read moreread less

Abstract: There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.

...read moreread less

136 citations

Journal Article•DOI•

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

[...]

Leiming Yu¹, Fanny Nina-Paravecino¹, David Kaeli¹, Qianqian Fang¹•Institutions (1)

Northeastern University¹

01 Jan 2018-Journal of Biomedical Optics

TL;DR: Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends the existing graphics processing unit (GPU) accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability.

...read moreread less

Abstract: We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.

...read moreread less

130 citations

Proceedings Article•DOI•

Processing-in-memory for energy-efficient neural network training: a heterogeneous approach

[...]

Jiawen Liu¹, Hengyu Zhao¹, Matheus Ogleari¹, Dong Li¹, Jishen Zhao¹ - Show less +1 more•Institutions (1)

University of California¹

20 Oct 2018

TL;DR: This work proposes a software/hardware co-design of heterogeneous processing-in-memory (PIM) system that enables high program portability and easy program maintenance across various heterogeneous hardware, optimize system energy efficiency, and improve hardware utilization.

...read moreread less

Abstract: Neural networks (NNs) have been adopted in a wide range of application domains, such as image classification, speech recognition, object detection, and computer vision. However, training NNs - especially deep neural networks (DNNs) - can be energy and time consuming, because of frequent data movement between processor and memory. Furthermore, training involves massive fine-grained operations with various computation and memory access characteristics. Exploiting high parallelism with such diverse operations is challenging. To address these challenges, we propose a software/hardware co-design of heterogeneous processing-in-memory (PIM) system. Our hardware design incorporates hundreds of fix-function arithmetic units and ARM-based programmable cores on the logic layer of a 3D die-stacked memory to form a heterogeneous PIM architecture attached to CPU. Our software design offers a programming model and a runtime system that program, offload, and schedule various NN training operations across compute resources provided by CPU and heterogeneous PIM. By extending the OpenCL programming model and employing a hardware heterogeneity-aware runtime system, we enable high program portability and easy program maintenance across various heterogeneous hardware, optimize system energy efficiency, and improve hardware utilization.

...read moreread less

108 citations

Journal Article•DOI•

Software-defined Radios: Architecture, state-of-the-art, and challenges

[...]

Rami Akeela¹, Behnam Dezfouli¹•Institutions (1)

Santa Clara University¹

01 Sep 2018-Computer Communications

TL;DR: In this article, a survey of the state-of-the-art software-defined radio (SDR) platforms in the context of wireless communication protocols is presented, with a focus on programmability, flexibility, portability, and energy efficiency.

...read moreread less

91 citations

Journal Article•DOI•

Autotuning in High-Performance Computing Applications

[...]

Prasanna Balaprakash¹, Jack Dongarra², Todd Gamblin³, Mary Hall⁴, Jeffrey K. Hollingsworth⁵, Boyana Norris⁶, Richard Vuduc⁷ - Show less +3 more•Institutions (7)

Argonne National Laboratory¹, University of Tennessee², Lawrence Livermore National Laboratory³, University of Utah⁴, University of Maryland, College Park⁵, University of Oregon⁶, Georgia Institute of Technology⁷

31 Jul 2018

TL;DR: If autotuning is to be widely used in the HPC community, researchers must address the software engineering challenges, manage configuration overheads, and continue to demonstrate significant performance gains and portability across architectures.

...read moreread less

Abstract: Autotuning refers to the automatic generation of a search space of possible implementations of a computation that are evaluated through models and/or empirical measurement to identify the most desirable implementation. Autotuning has the potential to dramatically improve the performance portability of petascale and exascale applications. To date, autotuning has been used primarily in high-performance applications through tunable libraries or previously tuned application code that is integrated directly into the application. This paper draws on the authors’ extensive experience applying autotuning to high-performance applications, describing both successes and future challenges. If autotuning is to be widely used in the HPC community, researchers must address the software engineering challenges, manage configuration overheads, and continue to demonstrate significant performance gains and portability across architectures. In particular, tools that configure the application must be integrated into the application build process so that tuning can be reapplied as the application and target architectures evolve.

...read moreread less

87 citations

Journal Article•DOI•

SPIRAL: Extreme Performance Portability

[...]

Franz Franchetti¹, Tze Meng Low¹, Doru Thom Popovici¹, Richard Veras², Daniele G. Spampinato¹, Jeremy Johnson³, Markus Püschel⁴, James C. Hoe¹, Jose M. F. Moura¹ - Show less +5 more•Institutions (4)

Carnegie Mellon University¹, Louisiana State University², Drexel University³, ETH Zurich⁴

26 Oct 2018

TL;DR: This paper focuses on two fundamental problems that software developers are faced with: performance portability across the ever-changing landscape of parallel platforms and correctness guarantees for sophisticated floating-point code.

...read moreread less

Abstract: In this paper, we address the question of how to automatically map computational kernels to highly efficient code for a wide range of computing platforms and establish the correctness of the synthesized code. More specifically, we focus on two fundamental problems that software developers are faced with: performance portability across the ever-changing landscape of parallel platforms and correctness guarantees for sophisticated floating-point code. The problem is approached as follows: We develop a formal framework to capture computational algorithms, computing platforms, and program transformations of interest, using a unifying mathematical formalism we call operator language (OL). Then we cast the problem of synthesizing highly optimized computational kernels for a given machine as a strongly constrained optimization problem that is solved by search and a multistage rewriting system. Since all rewrite steps are semantics preserving, our approach establishes equivalence between the kernel specification and the synthesized program. This approach is implemented in the SPIRAL system, and we demonstrate it with a selection of computational kernels from the signal and image processing domain, software-defined radio, and robotic vehicle control. Our target platforms range from mobile devices, desktops, and server multicore processors to large-scale high-performance and supercomputing systems, and we demonstrate performance comparable to expertly hand-tuned code across kernels and platforms.

...read moreread less

84 citations

Proceedings Article•DOI•

High performance stencil code generation with Lift

[...]

Bastian Hagedorn¹, Larisa Stoltzfus², Michel Steuwer³, Sergei Gorlatch¹, Christophe Dubach² - Show less +1 more•Institutions (3)

University of Münster¹, University of Edinburgh², University of Glasgow³

24 Feb 2018

TL;DR: This paper demonstrates how complex multidimensional stencil code and optimizations such as tiling are expressible using compositions of simple 1D Lift primitives, and shows that this approach outperforms existing compiler approaches and hand-tuned codes.

...read moreread less

Abstract: Stencil computations are widely used from physical simulations to machine-learning. They are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing Units. Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging. Domain Specific Languages (DSLs) have raised the programming abstraction and offer good performance. However, this places the burden on DSL implementers who have to write almost full-fledged parallelizing compilers and optimizers. Lift has recently emerged as a promising approach to achieve performance portability and is based on a small set of reusable parallel primitives that DSL or library writers can build upon. Lift’s key novelty is in its encoding of optimizations as a system of extensible rewrite rules which are used to explore the optimization space. However, Lift has mostly focused on linear algebra operations and it remains to be seen whether this approach is applicable for other domains. This paper demonstrates how complex multidimensional stencil code and optimizations such as tiling are expressible using compositions of simple 1D Lift primitives. By leveraging existing Lift primitives and optimizations, we only require the addition of two primitives and one rewrite rule to do so. Our results show that this approach outperforms existing compiler approaches and hand-tuned codes.

...read moreread less

83 citations

Journal Article•DOI•

ToxPi Graphical User Interface 2.0: Dynamic exploration, visualization, and sharing of integrated data models

[...]

Skylar W. Marvel¹, Kimberly To¹, Fabian A. Grimm², Fred A. Wright¹, Ivan Rusyn², David M. Reif¹ - Show less +2 more•Institutions (2)

North Carolina State University¹, Texas A&M University²

05 Mar 2018-BMC Bioinformatics

TL;DR: A new graphical user interface for ToxPi (Toxicological Prioritization Index) is presented that provides interactive visualization, analysis, reporting, and portability and introduces several features, from flexible data import formats to similarity-based clustering to options for high-resolution graphical output.

...read moreread less

Abstract: Drawing integrated conclusions from diverse source data requires synthesis across multiple types of information. The ToxPi (Toxicological Prioritization Index) is an analytical framework that was developed to enable integration of multiple sources of evidence by transforming data into integrated, visual profiles. Methodological improvements have advanced ToxPi and expanded its applicability, necessitating a new, consolidated software platform to provide functionality, while preserving flexibility for future updates. We detail the implementation of a new graphical user interface for ToxPi (Toxicological Prioritization Index) that provides interactive visualization, analysis, reporting, and portability. The interface is deployed as a stand-alone, platform-independent Java application, with a modular design to accommodate inclusion of future analytics. The new ToxPi interface introduces several features, from flexible data import formats (including legacy formats that permit backward compatibility) to similarity-based clustering to options for high-resolution graphical output. We present the new ToxPi interface for dynamic exploration, visualization, and sharing of integrated data models. The ToxPi interface is freely-available as a single compressed download that includes the main Java executable, all libraries, example data files, and a complete user manual from http://toxpi.org .

...read moreread less

Proceedings Article•DOI•

Project Zanzibar: A Portable and Flexible Tangible Interaction Platform

[...]

Nicolas Villar¹, Daniel Cletheroe¹, Greg Saul¹, Christian Holz¹, Tim Regan¹, Oscar Salandin¹, Misha Sra¹, Hui-Shyong Yeo¹, William Field¹, Haiyan Zhang¹ - Show less +6 more•Institutions (1)

Microsoft¹

21 Apr 2018

TL;DR: This work presents Project Zanzibar: a flexible mat that can locate, uniquely identify and communicate with tangible objects placed on its surface, as well as sense a user's touch and hover hand gestures, and describes the underlying technical contributions.

...read moreread less

Abstract: We present Project Zanzibar: a flexible mat that can locate, uniquely identify and communicate with tangible objects placed on its surface, as well as sense a user's touch and hover hand gestures. We describe the underlying technical contributions: efficient and localised Near Field Communication (NFC) over a large surface area; object tracking combining NFC signal strength and capacitive footprint detection, and manufacturing techniques for a rollable device form-factor that enables portability, while providing a sizable interaction area when unrolled. In addition, we detail design patterns for tangibles of varying complexity and interactive capabilities, including the ability to sense orientation on the mat, harvest power, provide additional input and output, stack, or extend sensing outside the bounds of the mat. Capabilities and interaction modalities are illustrated with self-generated applications. Finally, we report on the experience of professional game developers building novel physical/digital experiences using the platform.

...read moreread less

Journal Article•DOI•

A Comprehensive Perspective on Pilot-Job Systems

[...]

Matteo Turilli¹, Mark Santcroos, Shantenu Jha¹•Institutions (1)

Rutgers University¹

17 Apr 2018-ACM Computing Surveys

TL;DR: A comprehensive analysis of Pilot-Jobs systems is presented in this paper, with a focus on the motivations, evolution, properties, and implementation of PilotJobs, and an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology and its architecture pattern.

...read moreread less

Abstract: Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to execute millions of jobs on several cyberinfrastructures worldwide, consuming billions of CPU hours a year. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement on a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This article offers a comprehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this article are as follows: (1) an analysis of the motivations and evolution of Pilot-Job systems; (2) an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology, and its architecture pattern; and (3) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of six exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing.

...read moreread less

Proceedings Article•DOI•

When David Meets Goliath: Combining Smartwatches with a Large Vertical Display for Visual Data Exploration

[...]

Tom Horak¹, Sriram Karthik Badam², Niklas Elmqvist², Raimund Dachselt¹•Institutions (2)

Dresden University of Technology¹, University of Maryland, College Park²

19 Apr 2018

TL;DR: A conceptual framework is proposed to enable analysts to explore data items, track interaction histories, and alter visualization configurations through mechanisms using both devices in combination to support visual data analysis.

...read moreread less

Abstract: We explore the combination of smartwatches and a large interactive display to support visual data analysis. These two extremes of interactive surfaces are increasingly popular, but feature different characteristics-display and input modalities, personal/public use, performance, and portability. In this paper, we first identify possible roles for both devices and the interplay between them through an example scenario. We then propose a conceptual framework to enable analysts to explore data items, track interaction histories, and alter visualization configurations through mechanisms using both devices in combination. We validate an implementation of our framework through a formative evaluation and a user study. The results show that this device combination, compared to just a large display, allows users to develop complex insights more fluidly by leveraging the roles of the two devices. Finally, we report on the interaction patterns and interplay between the devices for visual exploration as observed during our study.

...read moreread less

Proceedings Article•DOI•

Relay: a new IR for machine learning frameworks

[...]

Jared Roesch¹, Steven Lyubomirsky¹, Logan Weber¹, Josh Pollock¹, Marisa Kirisame¹, Tianqi Chen¹, Zachary Tatlock¹ - Show less +3 more•Institutions (1)

University of Washington¹

18 Jun 2018

TL;DR: Relay as mentioned in this paper is a purely functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability for machine learning models across an array of heterogeneous hardware devices.

...read moreread less

Abstract: Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet.

...read moreread less

Proceedings Article•DOI•

Relay: A New IR for Machine Learning Frameworks

[...]

Jared Roesch¹, Steven Lyubomirsky¹, Logan Weber¹, Josh Pollock¹, Marisa Kirisame¹, Tianqi Chen¹, Zachary Tatlock¹ - Show less +3 more•Institutions (1)

University of Washington¹

26 Sep 2018-arXiv: Programming Languages

TL;DR: This work proposes a new high-level intermediate representation (IR) called Relay, being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability.

...read moreread less

Journal Article•DOI•

Pain-Free Blood Glucose Monitoring Using Wearable Sensors: Recent Advancements and Future Prospects

[...]

Sarah Ali Siddiqui¹, Yuan Zhang¹, Jaime Lloret², Houbing Song³, Zoran Obradovic⁴ - Show less +1 more•Institutions (4)

University of Jinan¹, Polytechnic University of Valencia², Embry-Riddle Aeronautical University, Daytona Beach³, Temple University⁴

02 Apr 2018-IEEE Reviews in Biomedical Engineering

TL;DR: A comprehensive survey on noninvasive/pain-free blood glucose monitoring methods from the recent five years is presented, holding AI-based estimation and decision models hold the future of nonin invasive glucose monitoring in terms of accuracy, cost effectiveness, portability, efficiency, etc.

...read moreread less

Abstract: Keeping track of blood glucose levels non-invasively is now possible due to diverse breakthroughs in wearable sensors technology coupled with advanced biomedical signal processing. However, each user might have different requirements and priorities when it comes to selecting a self-monitoring solution. After extensive research and careful selection, we have presented a comprehensive survey on noninvasive/pain-free blood glucose monitoring methods from the recent five years (2012–2016). Several techniques, from bioinformatics, computer science, chemical engineering, microwave technology, etc., are discussed in order to cover a wide variety of solutions available for different scales and preferences. We categorize the noninvasive techniques into nonsample- and sample-based techniques, which we further grouped into optical, nonoptical, intermittent, and continuous. The devices manufactured or being manufactured for noninvasive monitoring are also compared in this paper. These techniques are then analyzed based on certain constraints, which include time efficiency, comfort, cost, portability, power consumption, etc., a user might experience. Recalibration, time, and power efficiency are the biggest challenges that require further research in order to satisfy a large number of users. In order to solve these challenges, artificial intelligence (AI) has been employed by many researchers. AI-based estimation and decision models hold the future of noninvasive glucose monitoring in terms of accuracy, cost effectiveness, portability, efficiency, etc. The significance of this paper is twofold: first, to bridge the gap between IT and medical field; and second, to bridge the gap between end users and the solutions (hardware and software).

...read moreread less

Journal Article•DOI•

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

[...]

August Ernstsson¹, Lu Li¹, Christoph Kessler¹•Institutions (1)

Linköping University¹

01 Feb 2018-International Journal of Parallel Programming

TL;DR: This article presents SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems, and proposes a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations.

...read moreread less

Abstract: In this article we present SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems. We critically examine the design and limitations of the SkePU 1 programming interface. We present a new, flexible and type-safe, interface for skeleton programming in SkePU 2, and a source-to-source transformation tool which knows about SkePU 2 constructs such as skeletons and user functions. We demonstrate how the source-to-source compiler transforms programs to enable efficient execution on parallel heterogeneous systems. We show how SkePU 2 enables new use-cases and applications by increasing the flexibility from SkePU 1, and how programming errors can be caught earlier and easier thanks to improved type safety. We propose a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We also discuss how the source-to-source compiler can enable a new optimization opportunity by selecting among multiple user function specializations when building a parallel program. Finally, we show that the performance of our prototype SkePU 2 implementation closely matches that of SkePU 1.

...read moreread less

Journal Article•DOI•

Future trends in the market for electrochemical biosensing

[...]

Marta M.P.S. Neves, María Begoña González-García, David Hernández-Santos, Pablo Fanjul-Bolado

01 Aug 2018-Current Opinion in Electrochemistry

TL;DR: Some of the most recent advances, as well as the remaining challenges and future prospects, for electrochemical biosensing development that could make an impact on the future global market are discussed.

...read moreread less

Proceedings Article•DOI•

HPVM: heterogeneous parallel virtual machine

[...]

Maria Kotsifakou¹, Prakalp Srivastava¹, Matthew D. Sinclair¹, Rakesh Komuravelli², Vikram Adve¹, Sarita V. Adve¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Qualcomm²

10 Feb 2018

TL;DR: It is concluded that the HPVM representation is a promising basis for achieving performance portability and for implementing parallelizing compilers for heterogeneous parallel systems.

...read moreread less

Abstract: We propose a parallel program representation for heterogeneous systems, designed to enable performance portability across a wide range of popular parallel hardware, including GPUs, vector instruction sets, multicore CPUs and potentially FPGAs Our representation, which we call HPVM, is a hierarchical dataflow graph with shared memory and vector instructions HPVM supports three important capabilities for programming heterogeneous systems: a compiler intermediate representation (IR), a virtual instruction set (ISA), and a basis for runtime scheduling; previous systems focus on only one of these capabilities As a compiler IR, HPVM aims to enable effective code generation and optimization for heterogeneous systems As a virtual ISA, it can be used to ship executable programs, in order to achieve both functional portability and performance portability across such systems At runtime, HPVM enables flexible scheduling policies, both through the graph structure and the ability to compile individual nodes in a program to any of the target devices on a system We have implemented a prototype HPVM system, defining the HPVM IR as an extension of the LLVM compiler IR, compiler optimizations that operate directly on HPVM graphs, and code generators that translate the virtual ISA to NVIDIA GPUs, Intel's AVX vector units, and to multicore X86-64 processors Experimental results show that HPVM optimizations achieve significant performance improvements, HPVM translators achieve performance competitive with manually developed OpenCL code for both GPUs and vector hardware, and that runtime scheduling policies can make use of both program and runtime information to exploit the flexible compilation capabilities Overall, we conclude that the HPVM representation is a promising basis for achieving performance portability and for implementing parallelizing compilers for heterogeneous parallel systems

...read moreread less

Proceedings Article•DOI•

A case for richer cross-layer abstractions: bridging the semantic gap with expressive memory

[...]

Nandita Vijaykumar¹, Abhilasha Jain¹, Diptesh Majumdar¹, Kevin Hsieh¹, Gennady Pekhimenko², Eiman Ebrahimi³, Nastaran Hajinazar⁴, Phillip B. Gibbons¹, Onur Mutlu¹ - Show less +5 more•Institutions (4)

Carnegie Mellon University¹, University of Toronto², Nvidia³, Simon Fraser University⁴

02 Jun 2018

TL;DR: The benefits of XMem are demonstrated using two use cases: improving the performance portability of software-based cache optimization by expressing the semantics of data locality in the optimization andimproving the performance of OS-based page placement in DRAM by leveraging the semanticsof data structures and their access properties.

...read moreread less

Abstract: This paper makes a case for a new cross-layer interface, Expressive Memory (XMem), to communicate higher-level program semantics from the application to the system software and hardware architecture. XMem provides (i) a flexible and extensible abstraction, called an Atom, enabling the application to express key program semantics in terms of how the program accesses data and the attributes of the data itself, and (ii) new cross-layer interfaces to make the expressed higher-level information available to the underlying OS and architecture. By providing key information that is otherwise unavailable, XMem exposes a new, rich view of the program data to the OS and the different architectural components that optimize memory system performance (e.g., caches, memory controllers). By bridging the semantic gap between the application and the underlying memory resources, XMem provides two key benefits. First, it enables architectural/system-level techniques to leverage key program semantics that are challenging to predict or infer. Second, it improves the efficacy and portability of software optimizations by alleviating the need to tune code for specific hardware resources (e.g., cache space). While XMem is designed to enhance and enable a wide range of memory optimizations, we demonstrate the benefits of XMem using two use cases: (i) improving the performance portability of software-based cache optimization by expressing the semantics of data locality in the optimization and (ii) improving the performance of OS-based page placement in DRAM by leveraging the semantics of data structures and their access properties.

...read moreread less

Proceedings Article•DOI•

Evaluation of Docker Containers for Scientific Workloads in the Cloud

[...]

Pankaj Saha¹, Angel Beltre¹, Piotr Uminski², Madhusudhan Govindaraju¹•Institutions (2)

State University of New York System¹, Intel²

22 Jul 2018

TL;DR: A performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud and analysis of mapping elements of parallel workloads to the containers for optimal resource management with container-ready orchestration tools shows that scientific workloads for both Docker andsingularity based containers can achieve near-native performance.

...read moreread less

Abstract: The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment.This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mechanism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal resource management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container technologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that scientific workloads for both Docker and Singularity based containers can achieve near-native performance.Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarsegrained resource allocation.

...read moreread less

Journal Article•DOI•

New deep learning method to detect code injection attacks on hybrid applications

[...]

Ruibo Yan¹, Xi Xiao¹, Guangwu Hu², Sancheng Peng³, Yong Jiang¹ - Show less +1 more•Institutions (3)

Tsinghua University¹, Shenzhen Institute of Information Technology², Guangdong University of Foreign Studies³

01 Mar 2018-Journal of Systems and Software

TL;DR: A novel deep learning network is constructed, Hybrid Deep Learning Network (HDLN), and used to detect code injection attacks on mobile phones, which outperforms those with other traditional classifiers and gets higher average precision than other detection methods.

...read moreread less

Journal Article•DOI•

Evaluating investments in portability and interoperability between software service platforms

[...]

Netsanet Haile¹, Jörn Altmann¹•Institutions (1)

Seoul National University¹

01 Jan 2018-Future Generation Computer Systems

TL;DR: From these results, platform providers cannot only obtain an understanding on how investments in interoperability and portability impact cost, enable cost-effective service integration, and create value, but also design new strategies for optimizing investments.

...read moreread less

Proceedings Article•DOI•

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability

[...]

Charlene Yang, Rahulkumar Gayatri, Thorsten Kurth, Protonu Basu, Zahra Ronaghi, Adedoyin Adetokunbo, Brian Friesen, Brandon Cook, Douglas W. Doerfler, Leonid Oliker, Jack Deslippe, Samuel Williams - Show less +8 more

01 Nov 2018

TL;DR: The Roofline model is extended so that it empirically captures a more realistic set of performance bounds for CPUs and GPUs, factors in the true cost of different floating-point instructions when counting FLOPs, incorporates the effects of different memory access patterns, and facilitates the performance portability analysis.

...read moreread less

Abstract: System and node architectures continue to diversify to better balance on-node computation, memory capacity, memory bandwidth, interconnect bandwidth, power, and cost for specific computational workloads. For many application developers, achieving performance portability (effectively exploiting the capabilities of multiple architectures) is a desired goal. Unfortunately, dramatically different per-node performance coupled with differences in machine balance can lead to developers being unable to determine whether they have attained performance portability or simply written portable code. The Roofline model provides a means of quantitatively assessing how well a given application makes use of a target platform’s computational capabilities. In this paper, we extend the Roofline model so that it 1) empirically captures a more realistic set of performance bounds for CPUs and GPUs, 2) factors in the true cost of different floating-point instructions when counting FLOPs, 3) incorporates the effects of different memory access patterns, and 4) with appropriate pairing of code performance and Roofline ceiling, facilitates the performance portability analysis.

...read moreread less

Proceedings Article•DOI•

Dynamic Reconfiguration of Mission Parameters in Underwater Human-Robot Collaboration

[...]

Jahidul Islam, Marc Ho, Junaed Sattar

21 May 2018

TL;DR: In this paper, a convolutional neural network is trained to provide accurate hand gesture recognition, and a finite-state machine based deterministic model performs efficient gesture-to-instruction mapping and further improves robustness of the interaction scheme.

...read moreread less

Abstract: This paper presents a real-time programming and parameter reconfiguration method for autonomous underwater robots in human-robot collaborative tasks. Using a set of intuitive and meaningful hand gestures, we develop a syntactically simple framework that is computationally more efficient than a complex, grammar-based approach. In the proposed framework, a convolutional neural network is trained to provide accurate hand gesture recognition; subsequently, a finite-state machine- based deterministic model performs efficient gesture-to-instruction mapping and further improves robustness of the interaction scheme. The key aspect of this framework is that it can be easily adopted by divers for communicating simple instructions to underwater robots without using artificial tags such as fiducial markers or requiring memorization of a potentially complex set of language rules. Extensive experiments are performed both on field-trial data and through simulation, which demonstrate the robustness, efficiency, and portability of this framework in a number of different scenarios. Finally, a user interaction study is presented that illustrates the gain in the ease of use of our proposed interaction framework compared to the existing methods for the underwater domain.

...read moreread less

Proceedings Article•DOI•

Smart Grid: a demanding use case for 5G technologies

[...]

Helen C. Leligou, Theodore Zahariadis, Lambros Sarakis, Eleftherios Tsampasis, Artemis Voulkidis, Terpsichori E. Velivassaki - Show less +2 more

19 Mar 2018

TL;DR: This work outlines a novel 5G PPP-compliant software framework specifically tailored to the energy domain, which combines i) trusted, scalable and lock-in free plug ‘n’ play support for a variety of constrained devices and ii) 5G devices’ abstractions.

...read moreread less

Abstract: The energy sector represents undoubtedly one of the most significant “test cases” for 5G enabling technologies, due to the need of addressing a huge range of very diverse requirements to deal with across a variety of applications (stringent capacity for smart metering/AMI versus latency for supervisory control and fault localization). However, to effectively support energy utilities along their transition towards more decentralized renewable-oriented systems, several open issues still remain as to 5G networks management automation, security, resilience, scalability and portability. To face these issues, we outline a novel 5G PPP-compliant software framework specifically tailored to the energy domain, which combines i) trusted, scalable and lock-in free plug ‘n’ play support for a variety of constrained devices; ii) 5G devices’ abstractions to demonstrate mMTC (massive Machine Type Communications), uMTC (critical MTC) and xMBB (Extended Massive BroadBand) communications coupled with partially distributed, trusted, end-to-end security and MCM to enable secure, scalable and energy efficient communications; iii) extended Mobile Edge Computing (xMEC) micro-clouds to reduce backhaul load, increase the overall network capacity and reduce delays, while facilitating the deployment of generic MTC related NFVs (Network Function Virtualisation) and utility-centric VNFs (Virtual Network Functions).

...read moreread less

Journal Article•DOI•

AIDA: abstraction for advanced in-database analytics

[...]

Joseph Vinish D'silva¹, Florestan De Moor¹, Bettina Kemme¹•Institutions (1)

McGill University¹

01 Jul 2018

TL;DR: AIDA emulates the syntax and semantics of popular data science packages but transparently executes the required transformations and computations inside the RDBMS, and supports the seamless use of both relational and linear algebra operations using a unified abstraction.

...read moreread less

Abstract: With the tremendous growth in data science and machine learning, it has become increasingly clear that traditional relational database management systems (RDBMS) are lacking appropriate support for the programming paradigms required by such applications, whose developers prefer tools that perform the computation outside the database system. While the database community has attempted to integrate some of these tools in the RDBMS, this has not swayed the trend as existing solutions are often not convenient for the incremental, iterative development approach used in these fields. In this paper, we propose AIDA - an abstraction for advanced in-database analytics. AIDA emulates the syntax and semantics of popular data science packages but transparently executes the required transformations and computations inside the RDBMS. In particular, AIDA works with a regular Python interpreter as a client to connect to the database. Furthermore, it supports the seamless use of both relational and linear algebra operations using a unified abstraction. AIDA relies on the RDBMS engine to efficiently execute relational operations and on an embedded Python interpreter and NumPy to perform linear algebra operations. Data reformatting is done transparently and avoids data copy whenever possible. AIDA does not require changes to statistical packages or the RDBMS facilitating portability.

...read moreread less

Journal Article•DOI•

TOSCA Solves Big Problems in the Cloud and Beyond

[...]

Paul Lipton¹, Derek Palma, Matthew Francis Rutkowski², Damian A. Tamburri³•Institutions (3)

CA Technologies¹, IBM², Eindhoven University of Technology³

12 Jan 2018-IEEE Cloud Computing

TL;DR: Important TOSCA concepts and benefits in the context of commonly understood cloud use cases are introduced as a foundation to future discussions regarding advanced TOS CA concepts and additional breakthrough issues.

...read moreread less

Abstract: TOSCA, the Topology and Orchestration Specification for Cloud Applications offers an OASIS-recognized, open standard domain-specific language (DSL) that enables portability and automated management of applications, services, and resources regardless of underlying cloud platform, software defined environment, or infrastructure. With a growing, interoperable eco-system of open source projects, solutions from leading cloud platform and service providers, and research, TOSCA empowers the definition and modeling of applications and their services (microservices or traditional services) across their entire lifecycle by describing their components, relationships, dependencies, requirements, and capabilities for orchestrating software in the context of associated operational policies. The authors introduce important TOSCA concepts and benefits in the context of commonly understood cloud use cases as a foundation to future discussions regarding advanced TOSCA concepts and additional breakthrough issues.

...read moreread less

Journal Article•DOI•

Monitoring Elastically Adaptive Multi-Cloud Services

[...]

Demetris Trihinas, George Pallis, Marios D. Dikaiakos

01 Jul 2018-IEEE Transactions on Cloud Computing

TL;DR: A novel automated, modular, multi-layer and portable cloud monitoring framework that is capable of automatically adapting when elasticity actions are enforced to either the cloud service or to the monitoring topology and is recoverable from faults introduced in the monitoring configuration with proven scalability and low runtime footprint.

...read moreread less

Abstract: Automatic resource provisioning is a challenging and complex task. It requires for applications, services and underlying platforms to be continuously monitored at multiple levels and time intervals. The complex nature of this task lays in the ability of the monitoring system to automatically detect runtime configurations in a cloud service due to elasticity action enforcement. Moreover, with the adoption of open cloud standards and library stacks, cloud consumers are now able to migrate their applications or even distribute them across multiple cloud domains. However, current cloud monitoring tools are either bounded to specific cloud platforms or limit their portability to provide elasticity support. In this article, we describe the challenges when monitoring elastically adaptive multi-cloud services. We then introduce a novel automated, modular, multi-layer and portable cloud monitoring framework. Experiments on multiple clouds and real-life applications show that our framework is capable of automatically adapting when elasticity actions are enforced to either the cloud service or to the monitoring topology. Furthermore, it is recoverable from faults introduced in the monitoring configuration with proven scalability and low runtime footprint. Most importantly, our framework is able to reduce network traffic by 41 percent, and consequently the monitoring cost, which is both billable and noticeable in large-scale multi-cloud services.

...read moreread less

Collapse