scispace - formally typeset
Search or ask a question

Showing papers on "Software portability published in 2021"


Journal ArticleDOI
TL;DR: MFEM as mentioned in this paper is an open-source, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and highperformance computing efficiency.
Abstract: MFEM is an open-source, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing efficiency. MFEM’s goal is to provide application scientists with access to cutting-edge algorithms for high-order finite element meshing, discretizations and linear solvers, while enabling researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel and GPU-accelerated settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library.

144 citations


Posted ContentDOI
TL;DR: In this paper, the authors present a workflow that retains Python's high productivity while achieving portable performance across different architectures, including CPU, GPU, FPGA, and the Piz Daint supercomputer.
Abstract: Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. In this work, we present a workflow that retains Python's high productivity while achieving portable performance across different architectures. The workflow's key features are HPC-oriented language extensions and a set of automatic optimizations powered by a data-centric intermediate representation. We show performance results and scaling across CPU, GPU, FPGA, and the Piz Daint supercomputer (up to 23,328 cores), with 2.47x and 3.75x speedups over previous-best solutions, first-ever Xilinx and Intel FPGA results of annotated Python, and up to 93.16% scaling efficiency on 512 nodes.

108 citations


Journal ArticleDOI
TL;DR: A structured and analytical overview of the existing popular object detection models that can be re-purposed for defect detection: such as Region based CNNs, YOLO(You only look once), SSD, SSD(single shot detectors) and cascaded architectures is offered.

63 citations


Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed an efficient architecture by distilling knowledge from well-trained medical image segmentation networks to train another lightweight network, which empowers the lightweight network to get a significant improvement on segmentation capability while retaining its runtime efficiency.
Abstract: Recent advances have been made in applying convolutional neural networks to achieve more precise prediction results for medical image segmentation problems. However, the success of existing methods has highly relied on huge computational complexity and massive storage, which is impractical in the real-world scenario. To deal with this problem, we propose an efficient architecture by distilling knowledge from well-trained medical image segmentation networks to train another lightweight network. This architecture empowers the lightweight network to get a significant improvement on segmentation capability while retaining its runtime efficiency. We further devise a novel distillation module tailored for medical image segmentation to transfer semantic region information from teacher to student network. It forces the student network to mimic the extent of difference of representations calculated from different tissue regions. This module avoids the ambiguous boundary problem encountered when dealing with medical imaging but instead encodes the internal information of each semantic region for transferring. Benefited from our module, the lightweight network could receive an improvement of up to 32.6% in our experiment while maintaining its portability in the inference phase. The entire structure has been verified on two widely accepted public CT datasets LiTS17 and KiTS19. We demonstrate that a lightweight network distilled by our method has non-negligible value in the scenario which requires relatively high operating speed and low storage usage.

58 citations


Journal ArticleDOI
TL;DR: Workflow managers have been developed to simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing as mentioned in this paper.
Abstract: The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not just for experiments, but also for computational analysis. However, transforming data into information involves running a large number of tools, optimizing parameters, and integrating dynamically changing reference data. Workflow managers were developed in response to such challenges. They simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing. In this Perspective, we highlight key features of workflow managers, compare commonly used approaches for bioinformatics workflows, and provide a guide for computational and noncomputational users. We outline community-curated pipeline initiatives that enable novice and experienced users to perform complex, best-practice analyses without having to manually assemble workflows. In sum, we illustrate how workflow managers contribute to making computational analysis in biomedical research shareable, scalable, and reproducible.

52 citations


Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed an efficient architecture by distilling knowledge from well-trained medical image segmentation networks to train another lightweight network, which empowers the lightweight network to get a significant improvement on segmentation capability while retaining its runtime efficiency.
Abstract: Recent advances have been made in applying convolutional neural networks to achieve more precise prediction results for medical image segmentation problems. However, the success of existing methods has highly relied on huge computational complexity and massive storage, which is impractical in the real-world scenario. To deal with this problem, we propose an efficient architecture by distilling knowledge from well-trained medical image segmentation networks to train another lightweight network. This architecture empowers the lightweight network to get a significant improvement on segmentation capability while retaining its runtime efficiency. We further devise a novel distillation module tailored for medical image segmentation to transfer semantic region information from teacher to student network. It forces the student network to mimic the extent of difference of representations calculated from different tissue regions. This module avoids the ambiguous boundary problem encountered when dealing with medical imaging but instead encodes the internal information of each semantic region for transferring. Benefited from our module, the lightweight network could receive an improvement of up to 32.6% in our experiment while maintaining its portability in the inference phase. The entire structure has been verified on two widely accepted public CT datasets LiTS17 and KiTS19. We demonstrate that a lightweight network distilled by our method has non-negligible value in the scenario which requires relatively high operating speed and low storage usage.

41 citations


Journal ArticleDOI
TL;DR: Progress report on rapid on-site detection of pathogenic bacteria with high sensitivity and specificity is becoming an urgent need in public health assurance, medical diagnostics, environmental monitoring, and food safety fields.

39 citations


Journal ArticleDOI
TL;DR: This work considers the efficiency of solving two identical MD models (generic for material science and biomolecular studies) using different software and hardware combinations, and describes the experience in porting the CUDA backend of LAMMPS to ROCm HIP that shows considerable benefits for AMD GPUs comparatively to the OpenCL backend.
Abstract: Classical molecular dynamics (MD) calculations represent a significant part of the utilization time of high-performance computing systems. As usual, the efficiency of such calculations is based on ...

38 citations


Journal ArticleDOI
TL;DR: A TRNG whose randomness is generated by the oscillation of self-timed rings (STRs) and accurately extracted by a jitter-latch structure is designed and shows excellent performance in randomness, robustness, and portability, and the throughput reaches 100 Mbps.
Abstract: Under the requirement of highly reliable encryption, the design of true random number generators (TRNGs) based on field-programmable gate arrays (FPGAs) is receiving increased attention. Although TRNGs based on ring oscillators (ROs) and phase-locked loops (PLLs) have the advantages of small resource overhead and high throughput, there are problems such as instability of randomness and poor portability. To improve the randomness, portability, and throughput of a random number generator, we design a TRNG whose randomness is generated by the oscillation of self-timed rings (STRs) and accurately extracted by a jitter-latch structure. The portability of the structure is verified by electronic design automation (EDA) tools. Under the condition of 0°C–80°C ambient temperature and 1.0 ± 0.1 V output voltage, the proposed structure is tested many times on Xilinx Spartan-6 and Virtex-6 FPGAs with an automatic routing mode. Theoretical analysis shows that this method can effectively improve the coverage of jitter and reduce the migration phenomenon. Experimental results show excellent performance in randomness, robustness, and portability, and the throughput reaches 100 Mbps.

33 citations


Journal ArticleDOI
TL;DR: This work combines an existing magnetohydrodynamics CPU code with a performance portable on-node parallel programming paradigm into K-Athena to allow efficient simulations on multiple architectures using a single codebase and presents profiling and scaling results for different platforms.
Abstract: Large scale simulations are a key pillar of modern research and require ever-increasing computational resources. Different novel manycore architectures have emerged in recent years on the way towards the exascale era. Performance portability is required to prevent repeated non-trivial refactoring of a code for different architectures. We combine Athena++ , an existing magnetohydrodynamics (MHD) CPU code, with Kokkos , a performance portable on-node parallel programming paradigm, into K-Athena to allow efficient simulations on multiple architectures using a single codebase. We present profiling and scaling results for different platforms including Intel Skylake CPUs, Intel Xeon Phis, and NVIDIA GPUs. K-Athena achieves $>10^8$ > 10 8 cell-updates/s on a single V100 GPU for second-order double precision MHD calculations, and a speedup of 30 on up to 24 576 GPUs on Summit (compared to 172,032 CPU cores), reaching $1.94\times 10^{12}$ 1 . 94 × 10 12 total cell-updates/s at 76 percent parallel efficiency. Using a roofline analysis we demonstrate that the overall performance is currently limited by DRAM bandwidth and calculate a performance portability metric of 62.8 percent. Finally, we present the implementation strategies used and the challenges encountered in maximizing performance. This will provide other research groups with a straightforward approach to prepare their own codes for the exascale era. K-Athena is available at https://gitlab.com/pgrete/kathena .

29 citations


Posted ContentDOI
TL;DR: A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment.
Abstract: A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard.

Journal ArticleDOI
TL;DR: This timely survey investigates the landscape of the state-of-the-art container scheduling techniques aiming to inspire more research work in this active area of research and highlights fertile future research opportunities to realize the full potential of the emergent container technology.

Journal ArticleDOI
14 Feb 2021-Symmetry
TL;DR: In this paper, the authors summarize the issues involved in semantic cloud portability and interoperability in the multi-cloud environment and define the standardization effort imminently needed for migrating and collaborating services.
Abstract: The increasing demand for cloud computing has shifted business toward a huge demand for cloud services, which offer platform, software, and infrastructure for the day-to-day use of cloud consumers. Numerous new cloud service providers have been introduced to the market with unique features that assist service developers collaborate and migrate services among multiple cloud service providers to address the varying requirements of cloud consumers. Many interfaces and proprietary application programming interfaces (API) are available for migration and collaboration services among cloud providers, but lack standardization efforts. The target of the research work was to summarize the issues involved in semantic cloud portability and interoperability in the multi-cloud environment and define the standardization effort imminently needed for migrating and collaborating services in the multi-cloud environment.

Journal ArticleDOI
TL;DR: In this article, a review analyzes various metabolites using molecularly imprinted polymer (MIP)-based electrochemical methods for their potential usage as POCT and biomarker research based on targeted metabolomics analysis requirements.
Abstract: In recent years, metabolomics, identification and profiling of metabolites, have gained broad interest compared to other omics technologies and are progressively being utilized for biomarker discoveries. Therefore, the application of metabolomics in different fields are increasing day by day because of its high throughput results. However, the application of metabolomics requires state-of-the-art analytical approaches for the analysis. The complexity and limited availability of these instruments are restricting parameters for applying metabolomics studies in routine analysis. This problem may be overcome with molecularly imprinted polymer (MIP)-based electro sensors since they have high selectivity, sensitivity, easy applicability, portability, and low cost. This is the final step before developing end point-of-care tests (POCT), which patients can easily apply. MIP sensors will have more applications in the targeted metabolomics analysis to develop POCT systems. This review analyzes various metabolites using MIP-based electrochemical methods for their potential usage as POCT and biomarker research based on targeted metabolomics analysis requirements. The future applications for the sensitive assay of metabolites in medicine and clinical trials are also discussed.

Posted Content
TL;DR: NekRS as discussed by the authors is a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) which leverages scalable developments in the SEM code Nek5000 and in libParanumal, which is a library of high-performance kernels for high-order discretizations and PDE-based miniapps.
Abstract: The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described. For performance portability, the code is based on the open concurrent compute abstraction and leverages scalable developments in the SEM code Nek5000 and in libParanumal, which is a library of high-performance kernels for high-order discretizations and PDE-based miniapps. Critical performance sections of the Navier-Stokes time advancement are addressed. Performance results on several platforms are presented, including scaling to 27,648 V100s on OLCF Summit, for calculations of up to 60B gridpoints.

Proceedings ArticleDOI
17 Feb 2021
TL;DR: FABulous as mentioned in this paper is an embedded open-source FPGA framework that provides templates for logic, arithmetic, memory and I/O blocks that can be easily stitched together, whilst enabling users to add their own fully customized blocks and primitives.
Abstract: At the end of CMOS-scaling, the role of architecture design is increasingly gaining importance. Supporting this trend, customizable embedded FPGAs are an ingredient in ASIC architectures to provide the advantages of reconfigurable hardware exactly where and how it is most beneficial. To enable this, we are introducing the FABulous embedded open-source FPGA framework. FABulous is designed to fulfill the objectives of ease of use, maximum portability to different process nodes, good control for customization, and delivering good area, power, and performance characteristics of the generated FPGA fabrics. The framework provides templates for logic, arithmetic, memory, and I/O blocks that can be easily stitched together, whilst enabling users to add their own fully customized blocks and primitives. The FABulous ecosystem generates the embedded FPGA fabric for chip fabrication, integrates Yosys, ABC, VPR and nextpnr as FPGA CAD tools, deals with the bitstream generation and after fabrication tests. Additionally, we provide an emulation path for system development. FABulous was demonstrated for an ASIC integrating a RISC-V core with an embedded FPGA fabric for custom instruction set extensions using a TSMC 180nm process and an open-source 45nm process node.

Journal ArticleDOI
TL;DR: The Kokkos EcoSystem as discussed by the authors is a portable software stack based on the Kokkos Core Programming Model, which provides math libraries, interoperability capabilities with Python and Fortran, and Tools for analyzing, debugging, and optimizing applications.
Abstract: State-of-the-art engineering and science codes have grown in complexity dramatically over the last two decades. Application teams have adopted more sophisticated development strategies, leveraging third party libraries, deploying comprehensive testing, and using advanced debugging and profiling tools. In today’s environment of diverse hardware platforms, these applications also desire performance portability—avoiding the need to duplicate work for various platforms. The Kokkos EcoSystem provides that portable software stack. Based on the Kokkos Core Programming Model, the EcoSystem provides math libraries, interoperability capabilities with Python and Fortran, and Tools for analyzing, debugging, and optimizing applications. In this article, we overview the components, discuss some specific use cases, and highlight how codesigning these components enables a more developer friendly experience.

Posted Content
TL;DR: A comprehensive literature review of existing machine learning-based container orchestration approaches is presented in this article, where a comparative analysis of the reviewed techniques is conducted according to the proposed taxonomies with emphasis on their key characteristics.
Abstract: Containerization is a lightweight application virtualization technology, providing high environmental consistency, operating system distribution portability, and resource isolation. Existing mainstream cloud service providers have prevalently adopted container technologies in their distributed system infrastructures for automated application management. To handle the automation of deployment, maintenance, autoscaling, and networking of containerized applications, container orchestration is proposed as an essential research problem. However, the highly dynamic and diverse feature of cloud workloads and environments considerably raises the complexity of orchestration mechanisms. Machine learning algorithms are accordingly employed by container orchestration systems for behavior modelling and prediction of multi-dimensional performance metrics. Such insights could further improve the quality of resource provisioning decisions in response to the changing workloads under complex environments. In this paper, we present a comprehensive literature review of existing machine learning-based container orchestration approaches. Detailed taxonomies are proposed to classify the current researches by their common features. Moreover, the evolution of machine learning-based container orchestration technologies from the year 2016 to 2021 has been designed based on objectives and metrics. A comparative analysis of the reviewed techniques is conducted according to the proposed taxonomies, with emphasis on their key characteristics. Finally, various open research challenges and potential future directions are highlighted.

Book ChapterDOI
18 May 2021
TL;DR: In this paper, the authors propose using WebAssembly to implement lightweight containers and deliver the required portability for liquid IoT applications, which can offer seamless, hassle-free use of multiple devices.
Abstract: Going all the way to IoT with web technologies opens up the door to isomorphic IoT system architectures, which deliver flexible deployment and live migration of code between any device in the overall system. In this vision paper, we propose using WebAssembly to implement lightweight containers and deliver the required portability. Our long-term vision is to use the technology to support developers of liquid IoT applications offering seamless, hassle-free use of multiple devices.

Journal ArticleDOI
27 Mar 2021
TL;DR: The Elk Audio OS is a Linux-based OS optimized for ultra-low-latency and high-performance audio and sensor processing on embedded hardware, as well as for handling wireless connectivity to local and remote networks.
Abstract: As the Internet of Musical Things (IoMusT) emerges, audio-specific operating systems (OSs) are required on embedded hardware to ease development and portability of IoMusT applications. Despite the increasing importance of IoMusT applications, in this article, we show that there is no OS able to fulfill the diverse requirements of IoMusT systems. To address such a gap, we propose the Elk Audio OS as a novel and open source OS in this space. It is a Linux-based OS optimized for ultra-low-latency and high-performance audio and sensor processing on embedded hardware, as well as for handling wireless connectivity to local and remote networks. Elk Audio OS uses the Xenomai real-time kernel extension, which makes it suitable for the most demanding of low-latency audio tasks. We provide the first comprehensive overview of Elk Audio OS, describing its architecture and the key components of interest to potential developers and users. We explain operational aspects like the configuration of the architecture and the control mechanisms of the internal sound engine, as well as the tools that enable an easier and faster development of connected musical devices. Finally, we discuss the implications of Elk Audio OS, including the development of an open source community around it.

Journal ArticleDOI
TL;DR: The heterogeneous implementation model described in this work is a general-purpose approach that is well suited for various subroutines in numerical simulation codes that is not limited to a particular kind of numerical method or a set of governing equations.

Journal ArticleDOI
TL;DR: Workflow management systems represent, manage, and execute multistep computational analyses and offer many benefits to bioinformaticians as mentioned in this paper, including the ability to selectively re-execute parts of a workflow in the presence of additional inputs or changes in configuration and to resume execution from where a workflow previously stopped.
Abstract: Workflow management systems represent, manage, and execute multistep computational analyses and offer many benefits to bioinformaticians. They provide a common language for describing analysis workflows, contributing to reproducibility and to building libraries of reusable components. They can support both incremental build and re-entrancy-the ability to selectively re-execute parts of a workflow in the presence of additional inputs or changes in configuration and to resume execution from where a workflow previously stopped. Many workflow management systems enhance portability by supporting the use of containers, high-performance computing (HPC) systems, and clouds. Most importantly, workflow management systems allow bioinformaticians to delegate how their workflows are run to the workflow management system and its developers. This frees the bioinformaticians to focus on what these workflows should do, on their data analyses, and on their science. RiboViz is a package to extract biological insight from ribosome profiling data to help advance understanding of protein synthesis. At the heart of RiboViz is an analysis workflow, implemented in a Python script. To conform to best practices for scientific computing which recommend the use of build tools to automate workflows and to reuse code instead of rewriting it, the authors reimplemented this workflow within a workflow management system. To select a workflow management system, a rapid survey of available systems was undertaken, and candidates were shortlisted: Snakemake, cwltool, Toil, and Nextflow. Each candidate was evaluated by quickly prototyping a subset of the RiboViz workflow, and Nextflow was chosen. The selection process took 10 person-days, a small cost for the assurance that Nextflow satisfied the authors' requirements. The use of prototyping can offer a low-cost way of making a more informed selection of software to use within projects, rather than relying solely upon reviews and recommendations by others.

Proceedings ArticleDOI
19 Jun 2021
TL;DR: In this article, the authors present the design and implementation of a novel visual assistance system that employs deep learning and point cloud processing to perform advanced perception tasks on a cost-effective, low-power mobile computing platform.
Abstract: Despite significant recent developments, visual assistance systems are still severely constrained by sensor capabilities, form factor, battery power consumption, computational resources and the use of traditional computer vision algorithms. Current visual assistance systems cannot adequately perform complex computer vision tasks that entail deep learning. We present the design and implementation of a novel visual assistance system that employs deep learning and point cloud processing to perform advanced perception tasks on a cost-effective, low-power mobile computing platform. The proposed system design circumvents the need for expensive, power-intensive Graphical Processing Unit (GPU)-based hardware required by most deep learning algorithms for real-time inference by employing instead edge Artificial Intelligence (AI) accelerators such as the Neural Compute Stick-2 (NCS2), model optimization techniques such as OpenVINO, and TensorFlow Lite, and smart depth sensors such as OpenCV AI Kit-Depth (OAK-D). Critical system design challenges such as training data collection, real-time capability, computational efficiency, power consumption, portability and reliability are addressed. The proposed system includes more advanced functionality than existing systems such as assessment of traffic conditions and detection and localization of hanging obstacles, crosswalks, moving obstacles and sudden elevation changes. The proposed system design incorporates an AI-based voice interface that allows for user-friendly interaction and control and is shown to realize a simple, cost-effective, power-efficient, portable and unobtrusive visual assistance device.

Journal ArticleDOI
TL;DR: In this article, an innovative portable device for both quantitative and semi-quantitative electrochemical analysis is presented, which can operate autonomously without the need of relying on other devices (e.g., PC, tablets, or smartphones).
Abstract: Recent advances in Internet-of-Things technology have opened the doors to new scenarios for biosensor applications. Flexibility, portability, and remote control and access are of utmost importance to move these devices to people’s homes or in a Point-of-Care context and rapidly share the results with users and their physicians. In this paper, an innovative portable device for both quantitative and semi-quantitative electrochemical analysis is presented. This device can operate autonomously without the need of relying on other devices (e.g., PC, tablets, or smartphones) thanks to built-in Wi-Fi connectivity. The developed hardware is integrated into a cloud-based platform, exploiting the cloud computational power to perform innovative algorithms for calibration (e.g., Machine Learning tools). Results and configurations can be accessed through a web page without the installation of dedicated APPs or software. The electrical input/output characteristic was measured with a dummy cell as a load, achieving excellent linearity. Furthermore, the device response to five different concentrations of potassium ferri/ferrocyanide redox probe was compared with a bench-top laboratory instrument. No difference in analytical sensitivity was found. Also, some examples of application-specific tests were set up to demonstrate the use in real-case scenarios. In addition, Support Vector Machine algorithm was applied to semi-quantitative analyses to classify the input samples into four classes, achieving an average accuracy of 98.23%. Finally, COVID-19 related tests are presented and discussed.

Proceedings ArticleDOI
03 Jun 2021
TL;DR: PyKokkos as discussed by the authors is a programming model for writing performance portable applications for all major high performance computing platforms, which provides abstractions for data management and common parallel operations, allowing developers to write portable high performance code with minimal knowledge of architecture-specific details.
Abstract: Kokkos is a programming model for writing performance portable applications for all major high performance computing platforms. It provides abstractions for data management and common parallel operations, allowing developers to write portable high performance code with minimal knowledge of architecture-specific details. Kokkos is implemented as a heavily-templated C++ library. However, C++ is not ideal for rapid prototyping and quick algorithmic exploration. An increasing number of developers use Python for scientific computing, machine learning, and data analytics. In this paper, we present a new Python framework, dubbed PyKokkos, for writing performance portable applications entirely in Python. PyKokkos provides Kokkos-like abstractions that are easier to use and more concise than the C++ interface. We implemented PyKokkos by building a translator from a subset of Python to C++ Kokkos and bridging necessary function calls via automatically generated Python bindings. PyKokkos is also compatible with NumPy, a widely-used high performance Python library. By porting several existing Kokkos applications to PyKokkos, including ExaMiniMD (∼3k lines of code in C++), we show that the latter can achieve efficient execution with low performance overhead.

Journal ArticleDOI
TL;DR: In this paper, the authors developed an open-source software called Single-frequency Uncombined PREcise Point Positioning for Multi-parameter Estimation (SUPREME), which can process multi-GNSS observations collected from a low-cost single-frequency receiver.
Abstract: With the rapid development of Global Navigation Satellite Systems (GNSS), precise point positioning (PPP) has been widely used for positioning, navigation, timing (PNT), and atmospheric sensing. Currently, the demand for low-cost single-frequency hardware is increasing. However, open-source single-frequency PPP (SFPPP) software, in particular those that employ uncombined GNSS observations, is scarce. In view of this fact, we developed an open-source software, called Single-frequency Uncombined PREcise point positioning for Multi-parameter Estimation (SUPREME), which can process multi-GNSS observations collected from a low-cost single-frequency receiver. It can simultaneously estimate receiver coordinates, receiver clock offsets, zenith tropospheric delays, and ionospheric delays utilizing a least-squares filter. SUPREME is developed in the C/C++ computer language; thus, it has good portability and cross-platform capability. The formatted output is also beneficial for the post-analysis of results. We evaluated SUPREME using several experiments based on real single-frequency data, and the results indicate that it provides high accuracy and robust performance.

Journal ArticleDOI
TL;DR: In this article, an IoT centric cyber-physical twin architecture has been proposed for 6G technology, which helps out in serving stronger communication and also contains several features that help out in assisting communication like maintaining a log record of network data and managing all digital assets like images, audio, video, and so forth.
Abstract: With the rapid growth of Internet of Everything, there is a huge rise in the transportable Internet traffic due to which its associated resources have exceptional obstacles which include reliability, security, expandability, security, and portability which the current available network architectures are unable to deal with. In this paper, an IoT centric cyber-physical twin architecture has been proposed for 6G Technology. The cyber-twin technology helps out in serving stronger communication and also contains several features that help out in assisting communication like maintaining a log record of network data and managing all digital assets like images, audio, video, and so forth. These features of the cyber-twin technology enable the proposed network to deal with those exceptional obstacles and make the system more reliable, safe, workable, and adaptable.

Journal ArticleDOI
TL;DR: In this paper, the authors assess the economic implications of data portability in the light of the extant economic literature and with a focus on competition and innovation in the digital platform economy.
Abstract: Article 20 of the General Data Protection Regulation (GDPR) gave consumers in the European Union the right to port their personal data between digital service providers. We critically assess the economic implications of this new right in the light of the extant economic literature and with a focus on competition and innovation in the digital platform economy. In particular, we conclude that observed user behaviour data should clearly fall under the scope of data portability and that, above and beyond the regulations set out under GDPR, a right to port personal data continuously and in real-time would be necessary to truly empower consumers in the context of the digital platform economy. We also discuss the economics of Personal Information Management Systems (PIMSs), which many policymakers see as an essential tool for consumers in an economy where data portability becomes more widespread. However, we are sceptical that PIMS will be self-sustainable and instead advocate to facilitate the development of open-source projects, which have made little progress so far due to a lack of interfaces (which would come about with a right to continuous data portability) and due to a lack of common standards.

Journal ArticleDOI
TL;DR: In this paper, the authors provide a detailed insight into various compression techniques widely disseminated in the deep learning regime and highlight challenges encountered while training RNNs in the context of the IoT.
Abstract: Recurrent Neural Networks are ubiquitous and pervasive in many artificial intelligence applications such as speech recognition, predictive healthcare, creative art, and so on. Although they provide accurate superior solutions, they pose a massive challenge “training havoc.” Current expansion of IoT demands intelligent models to be deployed at the edge. This is precisely to handle increasing model sizes and complex network architectures. Design efforts to meet these for greater performance have had inverse effects on portability on edge devices with real-time constraints of memory, latency, and energy. This article provides a detailed insight into various compression techniques widely disseminated in the deep learning regime. They have become key in mapping powerful RNNs onto resource-constrained devices. While compression of RNNs is the main focus of the survey, it also highlights challenges encountered while training. The training procedure directly influences model performance and compression alongside. Recent advancements to overcome the training challenges with their strengths and drawbacks are discussed. In short, the survey covers the three-step process, namely, architecture selection, efficient training process, and suitable compression technique applicable to a resource-constrained environment. It is thus one of the comprehensive survey guides a developer can adapt for a time-series problem context and an RNN solution for the edge.

Book ChapterDOI
24 Jun 2021
TL;DR: In this paper, the authors analyse the performance of the most promising modern parallel programming models, such as SYCL and Kokkos, on a diverse range of contemporary high-performance hardware, using a compute-bound molecular docking mini-app.
Abstract: Performance portability is becoming more-and-more important as next-generation high performance computing systems grow increasingly diverse and heterogeneous. Several new approaches to parallel programming, such as SYCL and Kokkos, have been developed in recent years to tackle this challenge. While several studies have been published evaluating these new programming models, they have tended to focus on memory-bandwidth bound applications. In this paper we analyse the performance of what appear to be the most promising modern parallel programming models, on a diverse range of contemporary high-performance hardware, using a compute-bound molecular docking mini-app.