scispace - formally typeset
Search or ask a question
Topic

Software portability

About: Software portability is a research topic. Over the lifetime, 8987 publications have been published within this topic receiving 164922 citations. The topic is also known as: portability.


Papers
More filters
Proceedings ArticleDOI
24 Feb 2018
TL;DR: This paper demonstrates how complex multidimensional stencil code and optimizations such as tiling are expressible using compositions of simple 1D Lift primitives, and shows that this approach outperforms existing compiler approaches and hand-tuned codes.
Abstract: Stencil computations are widely used from physical simulations to machine-learning. They are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing Units. Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging. Domain Specific Languages (DSLs) have raised the programming abstraction and offer good performance. However, this places the burden on DSL implementers who have to write almost full-fledged parallelizing compilers and optimizers. Lift has recently emerged as a promising approach to achieve performance portability and is based on a small set of reusable parallel primitives that DSL or library writers can build upon. Lift’s key novelty is in its encoding of optimizations as a system of extensible rewrite rules which are used to explore the optimization space. However, Lift has mostly focused on linear algebra operations and it remains to be seen whether this approach is applicable for other domains. This paper demonstrates how complex multidimensional stencil code and optimizations such as tiling are expressible using compositions of simple 1D Lift primitives. By leveraging existing Lift primitives and optimizations, we only require the addition of two primitives and one rewrite rule to do so. Our results show that this approach outperforms existing compiler approaches and hand-tuned codes.

83 citations

Proceedings ArticleDOI
02 Apr 2011
TL;DR: This work presents a synergistic auto-vectorizing compilation scheme that leverages the optimized intermediate results provided by the first stage across disparate SIMD architectures from different vendors, having distinct characteristics ranging from different vector sizes, memory alignment and access constraints, to special computational idioms.
Abstract: Just-in-Time (JIT) compiler technology offers portability while facilitating target- and context-specific specialization. Single-Instruction-Multiple-Data (SIMD) hardware is ubiquitous and markedly diverse, but can be difficult for JIT compilers to efficiently target due to resource and budget constraints. We present our design for a synergistic auto-vectorizing compilation scheme. The scheme is composed of an aggressive, generic offline stage coupled with a lightweight, target-specific online stage. Our method leverages the optimized intermediate results provided by the first stage across disparate SIMD architectures from different vendors, having distinct characteristics ranging from different vector sizes, memory alignment and access constraints, to special computational idioms. We demonstrate the effectiveness of our design using a set of kernels that exercise innermost loop, outer loop, as well as straight-line code vectorization, all automatically extracted by the common offline compilation stage. This results in performance comparable to that provided by specialized monolithic offline compilers. Our framework is implemented using open-source tools and standards, thereby promoting interoperability and extendibility.

83 citations

Proceedings ArticleDOI
25 Jul 2011
TL;DR: This system uses embedded system, 3G, and ZIGBEE technologies to overcome the drawbacks of current smart home systems such as discrete functions, poor portability, weak updating capability, and personal computer dependence.
Abstract: With more and more applications of Internet of Things in many domains, it also steps into Smart Homes. In this paper, we propose an Internet of Things-based smart home system for home comfort, leisure and safety. This system uses embedded system, 3G, and ZIGBEE technologies to overcome the drawbacks of current smart home systems such as discrete functions, poor portability, weak updating capability, and personal computer dependence. Moreover, the system architecture is presented, and the design of its gateway is shown in detail from hardware to software.

83 citations

Proceedings ArticleDOI
17 Sep 2005
TL;DR: Three optimization techniques for global address space programs with fine-grained communication are presented: redundancy elimination, use of split-phase communication, and communication coalescing.
Abstract: Global address space languages like UPC exhibit high performance and portability on a broad class of shared and distributed memory parallel architectures. The most scalable applications use bulk memory copies rather than individual reads and writes to the shared space, but finer-grained sharing can be useful for scenarios such as dynamic load balancing, event signaling, and distributed hash tables. In this paper we present three optimization techniques for global address space programs with fine-grained communication: redundancy elimination, use of split-phase communication, and communication coalescing. Parallel UPC programs are analyzed using static single assignment form and a dataflow graph, which are extended to handle the various shared and private pointer types that are available in UPC. The optimizations also take advantage of UPC's relaxed memory consistency model, which reduces the need for cross thread analysis. We demonstrate the effectiveness of the analysis and optimizations using several benchmarks, which were chosen to reflect the kinds of finegrained, communication-intensive phases that exist in some larger applications. The optimizations show speedups of up to 70% on three parallel systems, which represent three different types of cluster network technologies.

83 citations

Proceedings ArticleDOI
10 Nov 2012
TL;DR: It is found that OpenACC is an extremely viable programming model for accelerator devices, improving programmer productivity and achieving better performance than OpenCL and CUDA.
Abstract: Hardware accelerators such as GPGPUs are becoming increasingly common in HPC platforms and their use is widely recognised as being one of the most promising approaches for reaching exascale levels of performance. Large HPC centres, such as AWE, have made huge investments in maintaining their existing scientific software codebases, the vast majority of which were not designed to effectively utilise accelerator devices. Consequently, HPC centres will have to decide how to develop their existing applications to take best advantage of future HPC system architectures. Given limited development and financial resources, it is unlikely that all potential approaches will be evaluated for each application. We are interested in how this decision making can be improved, and this work seeks to directly evaluate three candidate technologies-OpenACC, OpenCL and CUDA-in terms of performance, programmer productivity, and portability using a recently developed Lagrangian-Eulerian explicit hydrodynamics mini-application. We find that OpenACC is an extremely viable programming model for accelerator devices, improving programmer productivity and achieving better performance than OpenCL and CUDA.

82 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
90% related
Cloud computing
156.4K papers, 1.9M citations
83% related
The Internet
213.2K papers, 3.8M citations
83% related
Wireless sensor network
142K papers, 2.4M citations
82% related
Artificial neural network
207K papers, 4.5M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023580
20221,257
2021290
2020308
2019381