Showing papers on "Multi-core processor published in 2022"

PDF

Open Access

Journal Article•DOI•

DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization

[...]

Sambit Kumar Das, Phani Motamarri, V.V. Subramanian, David M. Rogers, Vikram Gavini - Show less +1 more

15 Mar 2022-Computer Physics Communications

TL;DR: In this article , the authors present DFT-FE 1.0, which improves the real-space formulation via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency.

...read moreread less

32 citations

Proceedings Article•DOI•

Real-Time MIMO Transmission over Field-Deployed Coupled-Core Multi-Core Fibers

[...]

Mikael Mazur, Roland R. Roy, Nicolas K. Fontaine, Andrea Marotta, E. Börjeson, L. Dallachiesa, H. S. Chen, Tetsuya Hayashi, Takuji Nagashima, Tetsuya Nakanishi, Tetsu Morishima, Fabio Graziosi, Luca Palmieri, David T. Felson, Per Larsson-Edefors, Antonio Mecozzi, Cristian Antonelli - Show less +13 more

01 Mar 2022

TL;DR: Fast readouts enabling real-time tracking of the DSP implementation, showing that coupled-core fibers are compatible with real- time DSP implementations, are performed.

...read moreread less

Abstract: We perform parallel continuous measurements of deployed SDM fibers using real-time coherent receivers implemented on FPGAs. Fast readouts enabling real-time tracking of the DSP implementation, showing that coupled-core fibers are compatible with real-time DSP implementations. © 2022 The Author(s)

...read moreread less

18 citations

Proceedings Article•DOI•

First Demonstration of Uncoupled 4-Core Multicore Fiber in a Submarine Cable Prototype with Integrated Multicore EDFA

[...]

Hitoshi Takeshita, Kohei Nakamura, Yuushi Matsuo, Takanori Inoue, Daishi Masuda, Tetsuya Hiwatashi, Kohei Hosokawa, Yoshihisa Inada, Emmanuel Le Taillandier de Gabory - Show less +5 more

01 Mar 2022

TL;DR: In this article , the authors demonstrate the first 15.2 km prototype of submarine cable with 4-core MCF and confirmed MC-EDFA integration to the cable improves Q-value by 0.6 dB through real-time 5,350 km transmission.

...read moreread less

Abstract: We demonstrate the first 15.2 km prototype of submarine cable with 4-core MCF. Cabled MCF changes are negligible. We confirmed MC-EDFA integration to the cable improves Q-value by 0.6 dB through real-time 5,350 km transmission. © 2022 The Author(s)

...read moreread less

15 citations

Journal Article•DOI•

A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives

[...]

Biagio Peccerillo, M. E. Mannino, Andrea Mondelli, Sandro Bartolini

01 Aug 2022-Journal of Systems Architecture

TL;DR: In this article , the authors define a taxonomy based on fourteen aspects, grouped in four macro-categories: general aspects, host coupling, architecture, and software aspects, and discuss some prominent open challenges that accelerators are facing, analyzing state-of-theart solutions, and suggesting prospective research directions for the future.

...read moreread less

15 citations

Proceedings Article•DOI•

Real-Time Transmission over 2×55 km All 7-Core Coupled-Core Multi-Core Fiber Link

[...]

Mikael Mazur, L. Dallachiesa, Nicolas K. Fontaine, Roland R. Roy, E. Börjeson, H. S. Chen, Hirotaka Sakuma, Takafumi Ohtsuka, Tetsuya Hayashi, Takemi Hasegawa, Hidehisa Tazawa, David T. Felson, Per Larsson-Edefors - Show less +9 more

01 Mar 2022

TL;DR: In this paper , the authors demonstrate the first transmission experiment using 7-core coupled-core fiber with in-line coupledcore multi-core amplifiers and real-time DSP.

...read moreread less

Abstract: We demonstrate the first transmission experiment using 7-core coupled-core fiber with in-line coupled-core multi-core amplifiers and real-time DSP. The real-time DSP is implemented using a single FPGA that performs MIMO processing across all 7 cores. © 2022 The Author(s)

...read moreread less

12 citations

Journal Article•DOI•

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs

[...]

01 Apr 2022-IEEE Transactions on Computers

TL;DR: C-Laius as mentioned in this paper proposes a runtime system that carefully allocates the computation resource to co-located applications for maximizing the throughput of batch applications while guaranteeing the required QoS of user-facing services.

...read moreread less

Abstract: Datacenters use GPUs to provide the significant computing throughput required by emerging user-facing services. The diurnal user access pattern of user-facing services provides a strong incentive to co-located applications for better GPU utilization, and prior work has focused on enabling co-location on multicore processors and traditional non-preemptive GPUs. However, current GPUs are evolving towards spatial multitasking and introduce a new set of challenges to eliminate QoS violations. To address this open problem, we explore the underlying causes of QoS violation on spatial multitasking GPUs. In response to these causes, we propose C-Laius, a runtime system that carefully allocates the computation resource to co-located applications for maximizing the throughput of batch applications while guaranteeing the required QoS of user-facing services. C-Laius not only allows co-locating one user-facing application with multiple batch applications, but also supports the co-location of multiple user-facing applications with batch applications. In the case of a single co-located user-facing application, our evaluation on an Nvidia RTX 2080Ti GPU shows that C-Laius improves the utilization of spatial multitasking GPUs by 20.8 percent, while achieving the 99%-ile latency target for user-facing services. As to the case of multiple co-located user-facing applications, C-Laius ensures no violation of QoS while improving the accelerator utilization by 35.9 percent on average.

...read moreread less

12 citations

Journal Article•DOI•

ETA-HP: an energy and temperature-aware real-time scheduler for heterogeneous platforms

[...]

Yanshul Sharma, Shounak Chakraborty, Sanjay Moulik

24 Jan 2022-The Journal of Supercomputing

11 citations

Journal Article•DOI•

Long-Haul Coupled 4-Core Fiber Transmission Over 7,200 Km With Real-Time MIMO DSP

[...]

Shohei Beppu, Masahiro Kikuta, Koji Igarashi, Hiroshi Mukai, Masahiro Shigihara, Yasuo Saito, Daiki Soma, Hidenori Takahashi, Noboru Yoshikane, Takehiro Tsuritani, Itsuro Morita, Masatoshi Kamifukuoka-shi Suzuki - Show less +8 more

15 Mar 2022-Journal of Lightwave Technology

TL;DR: In this article , a real-time transoceanic space-division multiplexed transmission with coupled-core multicore fibers was demonstrated using field programmable gate array circuits.

...read moreread less

Abstract: In this work, we present an experimental demonstration of real-time transoceanic space-division multiplexed transmission with coupled-core multicore fibers. To compensate for modal coupling in the coupled-core multicore fibers, we implement real-time multiple-input multiple-output (MIMO) digital signal processing based on field programmable gate array circuits. Using optical receivers with a real-time MIMO, we demonstrate 16-channel wavelength-division multiplexed coupled four-core fiber transmission over 7200 km. The results show the feasibility of real-time coupled-core multicore fiber transmission.

...read moreread less

10 citations

Journal Article•DOI•

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles

[...]

05 Apr 2022-PeerJ

TL;DR: GPUTreeShap as mentioned in this paper is a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units (GPUs) and achieves speedups of up to 19× for SHAP values and up to 340× for interaction values over a state-of-the-art multi-core CPU implementation.

...read moreread less

Abstract: SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second-equivalent CPU-based performance is estimated to require 6850 CPU cores.

...read moreread less

10 citations

Journal Article•DOI•

Fractal dimension of India using multicore parallel processing

[...]

Akhlaq Husain¹•Institutions (1)

BML Munjal University¹

01 Feb 2022

TL;DR: In this article , the authors calculate the fractal dimension of border of India and coastline of India using a novel multicore parallel processing algorithm by both the divider method and the box-counting method.

...read moreread less

Abstract: Fractal dimension measures the degree of geometric irregularity present in the objects. The divider method and the box-counting method are two classical approaches for computing fractal dimension of natural objects and other fractals. In this article we calculate the fractal dimension of border of India and coastline of India using a novel multicore parallel processing algorithm by both these methods. The reliability of the coastline length (in use at present) is also discussed by recovering the power law from the computational results. Simulations are done on a parallel computer with the QGIS software, R−programming language and Python codes.

...read moreread less

10 citations

Proceedings Article•DOI•

Core and Wavelength Allocation of Sending-or-not-sending Quantum Key Distribution for Future Metropolitan Networks over Multicore Fiber

[...]

Weiwen Kong, Yongmei Sun, Yaoxian Gao, Yuefeng Ji

01 Mar 2022

TL;DR: In this paper , core and wavelength allocation schemes of SNS-QKD for future metropolitan transmission over multicore fiber are proposed, which suppress noise photons up to 57.54% compared to conventional channel allocation.

...read moreread less

Abstract: We propose core and wavelength allocation schemes of SNS-QKD for future metropolitan transmission over multicore fiber. Experiments verify that the proposed schemes can suppress noise photons up to 57.54% compared to conventional channel allocation. © 2021 The Author(s)

...read moreread less

Journal Article•DOI•

Signal Processing Platform for Long-Range Multi-Spectral Electro-Optical Systems

[...]

Nikola Latinović, Ilija Popadić, Branko Tomić, Aleksandar Simic, Petar Milanovic, Srecko Nijemcevic, Miroslav Peric, Mladen Veinović - Show less +4 more

01 Feb 2022-Sensors

TL;DR: This paper presents a hardware and software platform for signal processing (SPP) in long-range, multi-spectral, electro-optical systems (MSEOS) and gives remarks regarding upgrading SPPs as novel FPGAs, MCuPs and GPUs become available.

...read moreread less

Abstract: In this paper, we present a hardware and software platform for signal processing (SPP) in long-range, multi-spectral, electro-optical systems (MSEOS). Such systems integrate various cameras such as lowlight color, medium or long-wave-infrared thermal and short-wave-infrared cameras together with other sensors such as laser range finders, radars, GPS receivers, etc. on rotational pan-tilt positioner platforms. An SPP is designed with the main goal to control all components of an MSEOS and execute complex signal processing algorithms such as video stabilization, artificial intelligence-based target detection, target tracking, video enhancement, target illumination, multi-sensory image fusion, etc. Such algorithms might be very computationally demanding, so an SPP enables them to run by splitting processing tasks between a field-programmable gate array (FPGA) unit, a multicore microprocessor (MCuP) and a graphic processing unit (GPU). Additionally, multiple SPPs can be linked together via an internal Gbps Ethernet-based network to balance the processing load. A detailed description of the SPP system and experimental results of workloads for typical algorithms on demonstrational MSEOS are given. Finally, we give remarks regarding upgrading SPPs as novel FPGAs, MCuPs and GPUs become available.

...read moreread less

Journal Article•DOI•

FCM Clustering Approach Optimization Using Parallel High-Speed Intel FPGA Technology

[...]

Abedalmuhdi Almomany, Amin Jarrah, Anwar Al-Assaf

11 May 2022-Journal of Electrical and Computer Engineering

TL;DR: This study presents a method for optimizing the FCM algorithm for high-speed field-programmable gate technology (FPGA) using a high-level C-like programming language called open computing language (OpenCL).

...read moreread less

Abstract: Fuzzy C-Means (FCM) is a widely used clustering algorithm that performs well in various scientific applications. Implementing FCM involves a massive number of computations, and many parallelization techniques based on GPUs and multicore systems have been suggested. In this study, we present a method for optimizing the FCM algorithm for high-speed field-programmable gate technology (FPGA) using a high-level C-like programming language called open computing language (OpenCL). The method was designed to enable the high-level compiler/synthesis tool to manipulate a task-parallelism model and create an efficient design. Our experimental results (based on several datasets) show that the proposed method makes the FCM execution time more than 186 times faster than the conventional design running on a single-core CPU platform. Also, its processing power reached 89 giga floating points operations per second (GFLOPs).

...read moreread less

Journal Article•DOI•

Real-time task scheduling for FPGA-based multicore systems with communication delay

[...]

Jinyi Xu, Kaixuan Li, Yixiang Chen

01 Feb 2022-Microprocessors and Microsystems

TL;DR: In this paper , the authors proposed a priority-driven scheduling algorithm for real-time applications in the FPGA-based multicore structure with an objective to minimize the makespan under hardware resource constraints.

...read moreread less

Journal Article•DOI•

Modeling the Techno-Economics of Multicore Optical Fibers in Subsea Transmission Systems

[...]

15 Mar 2022-Journal of Lightwave Technology

TL;DR: In this article , the potential suitability or strength of arguments for the application of multicore optical fibers in high capacity submarine cable systems via transmission and techno-economic modeling is analyzed in the context of a trans-Atlantic link length system.

...read moreread less

Abstract: We analyze the potential suitability or strength of arguments for the application of multicore optical fibers in high capacity submarine cable systems via transmission and techno-economic modeling. We consider hypothetical multicore fibers (MCFs) with 2–4 weakly coupled cores and compare capacity and cost/bit against conventional single-core fibers (SCFs). The analysis is performed in the context of a trans-Atlantic link length system and we evaluate the relative fiber performance with three different, but related, system design approaches. Two SCF coating diameters are assessed in terms of how this parameter affects the cost/bit through fiber density in submarine cables and resulting cable cost. We find that MCFs may enable higher cable capacity when fiber pair limits are imposed, but likely not at lower cost/bit unless optimistic and best case assumptions are made with respect to MCF relative fiber cost. We also find that reduced diameter SCFs can deliver much of the density and cable cost savings that motivates interest in MCF without the challenges of a new eco-system as required by MCF. However, MCF may enable the design of the largest cable capacities such as 1 Pb/s or more that might not be attainable with SCFs without significant cable changes.

...read moreread less

Journal Article•DOI•

Detection of and Countermeasure Against Thermal Covert Channel in Many-Core Systems

[...]

01 Feb 2022-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: In this paper , a three-step scheme to detect and fight against thermal covert channels (TCCs) attacks is presented, in which each core calculates the spectrum of its own CPU workload traces that are collected over a few fixed time intervals, and then applies a frequency scanning method to detect if there exists any TCC attack.

...read moreread less

Abstract: The thermal covert channels (TCCs) in many-core systems can cause detrimental data breaches. In this article, we present a three-step scheme to detect and fight against such TCC attacks. Specifically, in the detection step, each core calculates the spectrum of its own CPU workload traces that are collected over a few fixed time intervals, and then it applies a frequency scanning method to detect if there exists any TCC attack. In the next positioning step, the logical cores running the transmitter threads are located. In the last step, the physical CPU cores suspiciously engaging in a TCC attack have to undertake dynamic voltage frequency scaling (DVFS) such that any possible TCC trace will be essentially wiped out. Our experiments have confirmed that on average 97% of the TCC attacks can be detected, and with the proposed defense, the packet error rate (PER) of a TCC attack can soar to more than 70%, literally shutting down the attack in practical terms. The performance penalty caused by the inclusion of the proposed DVFS countermeasures is found to be only 3% for an

$8\times 8$

many-core system.

...read moreread less

Journal Article•DOI•

Online Power Management for Multi-Cores: A Reinforcement Learning Based Approach

[...]

Yiming Wang¹, Weizhe Zhang¹, Meng Hao¹, Zheng Wang²•Institutions (2)

Harbin Institute of Technology¹, University of Leeds²

01 Apr 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A novel approach for runtime power optimization on modern multi-core systems by employing reinforcement learning to automatically explore the energy-performance optimization space from training programs and adapting the chip's power budget and uncore frequency to match the changing program phases for any new, previously unseen program.

...read moreread less

Abstract: Power and energy is the first-class design constraint for multi-core processors and is a limiting factor for future-generation supercomputers. While modern processor design provides a wide range of mechanisms for power and energy optimization, it remains unclear how software can make the best use of them. This article presents a novel approach for runtime power optimization on modern multi-core systems. Our policy combines power capping and uncore frequency scaling to match the hardware power profile to the dynamically changing program behavior at runtime. We achieve this by employing reinforcement learning (RL) to automatically explore the energy-performance optimization space from training programs, learning the subtle relationships between the hardware power profile, the program characteristics, power consumption and program running times. Our RL framework then uses the learned knowledge to adapt the chip's power budget and uncore frequency to match the changing program phases for any new, previously unseen program. We evaluate our approach on two computing clusters by applying our techniques to 11 parallel programs that were not seen by our RL framework at the training stage. Experimental results show that our approach can reduce the system-level energy consumption by 12 percent, on average, with less than 3 percent of slowdown on the application performance. By lowering the uncore frequency to leave more energy budget to allow the processor cores to run at a higher frequency, our approach can reduce the energy consumption by up to 17 percent while improving the application performance by 5 percent for specific workloads.

...read moreread less

Journal Article•DOI•

Performance assessment of adaptive core mapping for NoC-based architectures

[...]

01 Jan 2022-International Journal of Embedded Systems

TL;DR: In this paper , an adaptive core mapping technique is implemented to minimize the latency and improve the system performance, where the core mapping region is obtained from the calculation of NAD and PVR, whereas communication energy between the cores is obtained through WMD.

...read moreread less

Abstract: In this trending technology of network-on-chip, the large number of cores embedded on-chip had a rapid growth resulting in performance degradation. Many methodologies came into existence to improve the performance of a network. In this research paper, an adaptive core mapping technique is implemented to minimise the latency and improve the system performance. The traditional method calculation of NAD, PVR and WMD parameters is evaluated in this core mapping strategy. The core mapping region is obtained from the calculation of NAD and PVR, whereas the communication energy between the cores is obtained through WMD. The sequence for mapping the cores is implemented through the lowest communication energy, i.e., in ascending order and this mapping technique is applied to the PARSEC benchmark suite for the evaluation. The proposed adaptive core mapping technique is evaluated using the GEM5 simulator. The simulated outcome of this technique has outperformed both the latency and system performance compared with FASA, FARM and NMAP techniques. The simulations and synthesis carried out through the Vivado Design Suite 2018.3 outperformed the metrics such as total latency, power consumption and the core mapping time compared to other algorithms.

...read moreread less

Journal Article•DOI•

AdaMICA: Adaptive Multicore Intermittent Computing

[...]

Kh. G. Akhunov, Kasim Yildirim

06 Sep 2022-Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies

TL;DR: This work introduces AdaMICA (Adaptive Multicore Intermittent Computing) runtime that supports, for the first time, parallel intermittent computing and provides the highest degree of flexibility of programmable general-purpose multiple cores.

...read moreread less

Abstract: Recent studies on intermittent computing target single-core processors and underestimate the efficient parallel execution of highly-parallelizable machine learning tasks. Even though general-purpose multicore processors provide a high degree of parallelism and programming flexibility, intermittent computing has not exploited them yet. Filling this gap, we introduce AdaMICA (Adaptive Multicore Intermittent Computing) runtime that supports, for the first time, parallel intermittent computing and provides the highest degree of flexibility of programmable general-purpose multiple cores. AdaMICA is adaptive since it responds to the changes in the environmental power availability by dynamically reconfiguring the underlying multicore architecture to use the power most optimally. Our results demonstrate that AdaMICA significantly increases the throughput (52% on average) and decreases the latency (31% on average) by dynamically scaling the underlying architecture, considering the variations in the unpredictable harvested energy.

...read moreread less

Journal Article•DOI•

An integration of autonomic computing with multicore systems for performance optimization in Industrial Internet of Things

[...]

Surendra Kumar Shukla, Bhaskar Pant, Wattana Viriyasitavat, Devvret Verma, Sandeep Kautish, G. Dhiman, Amandeep Kaur, Kannan Srihari, Sachi Nandan Mohanty - Show less +5 more

27 Sep 2022-Iet Communications

TL;DR: In this paper , the authors investigate how the self-awareness characteristic of autonomic computing, paired with existing performance optimization rules, may be used in applications to minimize multi-core processor performance concerns.

...read moreread less

Abstract: The goal of this work is to investigate how the self-awareness characteristic of autonomic computing, paired with existing performance optimization rules, may be used in applications to minimise multi-core processor performance concerns. The suggested self-awareness technique can assist applications in self-execution while also assisting other applications in executing in the system with optimal resource usage and reducing conflicts in a collaborative manner. It means self-awareness is created in the application to get resources, schedules itself, running autonomically with respect to runtime variations in the applications and system parameters. Further, self-aware applications would help collaboratively resolving the performance issues like; contention, bandwidth bottleneck, efficient task to core mapping, and performance monitoring through analysis in a dynamic collaborative multi-core environment. To show the usefulness of this research, “Autonomic Computing Interface” (ACI) for is proposed Multicore systems. The proposed firmware as an interface between applications and the multi-core system can dynamically handle performance issues with coordination of applications in an autonomic way. Also, in a highly dynamic execution environment the proposed runtime parameter tuning, existing performance improvement policies and self-awareness approach in totality, for monitoring, analysis, and possibility for performance improvement would be effective as compared to the existing approaches which work in isolation.

...read moreread less

Journal Article•DOI•

Nvidia Hopper GPU and Grace CPU Highlights

[...]

01 Mar 2022-Computing in Science and Engineering

TL;DR: In this paper , the authors highlight the most interesting features of their new Hopper graphical processing units (GPU) and Grace central processing unit (CPU) computer chips and discuss some of the history behind Nvidia technologies and their most useful features for computational scientists.

...read moreread less

Abstract: At GTC 2022, Nvidia announced a new product family that aims to cover from small enterprise workloads through exascale high performance computing (HPC) and trillion-parameter AI models. This column highlights the most interesting features of their new Hopper graphical processing unit (GPU) and Grace central processing unit (CPU) computer chips and the Hopper product family. We also discuss some of the history behind Nvidia technologies and their most useful features for computational scientists, such as the Hopper DPX dynamic programming (DP) instruction set, increased number of SMs, and FP 8 tensor core availability. Also included are descriptions of the new Hopper Clustered SMs architecture and updated NVSwitch technologies that integrate their new ARM-based Grace CPU.

...read moreread less

Journal Article•DOI•

Multicore Quantum Computing

[...]

26 Oct 2022-Physical review applied

TL;DR: In this article , the authors explore interlinked multicore architectures through analytic and numerical modeling, and assess the potential for quantum advantage using such devices in the noisy intermediate-scale quantum era and beyond, concluding that these techniques impressively suppress imperfections in both the inter-and intracore operations.

...read moreread less

Abstract: Any architecture for practical quantum computing must be scalable. An attractive approach is to create multiple cores, computing regions of fixed size that are well spaced but interlinked with communication channels. This exploded architecture can relax the demands associated with a single monolithic device: the complexity of control, cooling and power infrastructure, as well as the difficulties of crosstalk suppression and near-perfect component yield. Here we explore interlinked multicore architectures through analytic and numerical modeling. While elements of our analysis are relevant to diverse platforms, our focus is on semiconductor electron spin systems in which numerous cores may exist on a single chip within a single fridge. We model shuttling and microwave-based interlinks and estimate the achievable fidelities, finding values that are encouraging but markedly inferior to intracore operations. We therefore introduce optimized entanglement purification to enable high-fidelity communication, finding that $99.5\mathrm{%}$ is a very realistic goal. We then assess the prospects for quantum advantage using such devices in the noisy intermediate-scale quantum era and beyond: we simulate recently proposed exponentially powerful error mitigation schemes in the multicore environment and conclude that these techniques impressively suppress imperfections in both the inter- and intracore operations.

...read moreread less

Journal Article•DOI•

Dispersion-Diversity Multicore Fiber Signal Processing

[...]

E. García, Mario Urena, Ivana Gasulla

04 Aug 2022-ACS Photonics

TL;DR: In this paper , a reconfigurable two-dimensional dispersion-managed signal processing performed by a novel dispersion diversity heterogeneous multicore fiber is presented, which comprises seven different trench-assisted cores featuring a different refractive index profile in terms of both radial geometry and core dopant concentration.

...read moreread less

Abstract: Beyond playing a primary role in high-capacity communication networks, multicore optical fibers can bring many advantages to optical and microwave signal processing, as not only space but also chromatic dispersion are introduced as new degrees of freedom. The key lies in developing radically new multicore fibers where the refractive index profile of each individual core is tailored properly to provide parallel dispersion-diversity signal processing with application in a variety of scenarios such as parallel channel equalization, analogue-to-digital conversion, optical computing, pulse generation and shaping, multiparameter fiber sensing, medical imaging, optical coherence tomography, broadband measurement instrumentation, and next-generation fiber-wireless communications. Here, we experimentally prove, for the first time to our knowledge, reconfigurable two-dimensional dispersion-managed signal processing performed by a novel dispersion-diversity heterogeneous multicore fiber. The fiber comprises seven different trench-assisted cores featuring a different refractive index profile in terms of both radial geometry and core dopant concentration. As a representative application case, we demonstrate reconfigurable microwave signal filtering with increased compactness as well as performance flexibility and versatility as compared to previous technologies.

...read moreread less

Journal Article•DOI•

A Parallel Algorithm Template for Updating Single-Source Shortest Paths in Large-Scale Dynamic Networks

[...]

Arindam Khanda¹, Sriram Srinivasan², Sanjukta Bhowmick³, Boyana Norris⁴, Sajal K. Das¹ - Show less +1 more•Institutions (4)

Missouri University of Science and Technology¹, Virginia Commonwealth University², University of North Texas³, University of Oregon⁴

01 Apr 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This work presents a novel parallel algorithmic framework for updating the Single Source Shortest Path in large-scale dynamic networks and implements it on the shared-memory and GPU platforms.

...read moreread less

Abstract: The Single Source Shortest Path (SSSP) problem is a classic graph theory problem that arises frequently in various practical scenarios; hence, many parallel algorithms have been developed to solve it. However, these algorithms operate on static graphs, whereas many real-world problems are best modeled as dynamic networks, where the structure of the network changes with time. This gap between the dynamic graph modeling and the assumed static graph model in the conventional SSSP algorithms motivates this work. We present a novel parallel algorithmic framework for updating the SSSP in large-scale dynamic networks and implement it on the shared-memory and GPU platforms. The basic idea is to identify the portion of the network affected by the changes and update the information in a rooted tree data structure that stores the edges of the network that are most relevant to the analysis. Extensive experimental evaluations on real-world and synthetic networks demonstrate that our proposed parallel updating algorithm is scalable and, in most cases, requires significantly less execution time than the state-of-the-art recomputing-from-scratch algorithms.

...read moreread less

Journal Article•DOI•

Energy-Efficient Cache-Aware Scheduling on Heterogeneous Multicore Systems

[...]

Saad Zia Sheikh¹, Muhammad Adeel Pasha¹•Institutions (1)

Lahore University of Management Sciences¹

01 Jan 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this article, the authors investigate the nonlinear impacts that core-frequency and cache-partitioning have on task-executions in a heterogeneous multicore environment, and propose an algorithm that exploits this relationship to effectively allocate tasks to specific cores and core-types, and determine the number of cache partitions for each core.

...read moreread less

Abstract: The adoption of heterogeneous multicore architectures into deadline-constrained embedded systems has various benefits in terms of schedulability and energy-efficiency. Existing energy-efficient algorithms, in this domain, allocate tasks to their energy-favorable core-types while using dynamic voltage and frequency scaling to reduce energy consumption. However, the practicality of such algorithms is limited due to the underlying assumptions made to simplify the analysis. This article paves the way for more practical approaches to minimize the energy consumption on heterogeneous multicores. Specifically, we investigate the nonlinear impacts that core-frequency and cache-partitioning have on task-executions in a heterogeneous multicore environment. In doing so, we propose an algorithm that exploits this relationship to effectively allocate tasks to specific cores and core-types, and determine the number of cache-partitions for each core. Extensive simulations using real-world benchmarks show the proficiency of our approach by achieving an average and maximum energy savings of 14.9 and 20.4 percent, respectively for core-level energy consumption, and 20.2 and 60.4 percent, respectively for system-level energy consumption.

...read moreread less

Journal Article•DOI•

ForkJoinPcc Algorithm for Computing the Pcc Matrix in Gene Co-Expression Networks

[...]

Amel Alhussan, Hussah Nasser Aleisa, Ghada Atteia, Nahed H. Solouma, Rania A. Abul Seoud, Ola S. Ayoub, Vidan Fathi Ghoneim, Nagwan M. Abdel Samee - Show less +4 more

07 Apr 2022-Electronics

TL;DR: The authors provide a parallel algorithm for finding the Pearson’s correlation coefficient between genes measured in the Affymetrix microarrays and reveal that the ForkJoinPcc algorithm achieves a substantial speedup on the cluster platform of 62× compared with a 3.8× speed up on the multicore platform.

...read moreread less

Abstract: High-throughput microarrays contain a huge number of genes. Determining the relationships between all these genes is a time-consuming computation. In this paper, the authors provide a parallel algorithm for finding the Pearson’s correlation coefficient between genes measured in the Affymetrix microarrays. The main idea in the proposed algorithm, ForkJoinPcc, mimics the well-known parallel programming model: the fork–join model. The parallel MATLAB APIs have been employed and evaluated on shared or distributed multiprocessing systems. Two performance metrics—the processing and communication times—have been used to assess the performance of the ForkJoinPcc. The experimental results reveal that the ForkJoinPcc algorithm achieves a substantial speedup on the cluster platform of 62× compared with a 3.8× speedup on the multicore platform.

...read moreread less

Journal Article•DOI•

Temporal and spatial parallel processing of simulated quantum annealing on a multicore CPU

[...]

Hasitha Muthumala Waidyasooriya, Masanori Hariyama

13 Jan 2022-The Journal of Supercomputing

Journal Article•DOI•

Multicore raised cosine fibers for next generation space division multiplexing systems

[...]

Jihene Lataoui, Alaaeddine Rjeb, N Jaba, Habib Fathallah, Mohsen Machhout - Show less +1 more

01 Jan 2022-Optical Fiber Technology

TL;DR: In this article , the authors optimize the single core raised cosine (RC) profile to support 6 linearly polarized (LP) modes with large effective mode area (min Aeff = 96 μm2), with low differential mode delay (DMD), and high intermodal separation (⩾2×10−4).

...read moreread less

Journal Article•DOI•

AdaMICA

[...]

Kh. G. Akhunov, Kasim Yildirim

06 Sep 2022-Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies

...read moreread less

Proceedings Article•DOI•

Simultaneously Measuring Group Delays, Chromatic Dispersion and Skews of Multicore Fibers Using a Frequency Domain Method

[...]

Xin Chen, Kangmei Li, Jason Hurley, Ming-Jun Li

01 Mar 2022

TL;DR: In this article , a frequency domain method is proposed to measure group delays, chromatic dispersion and skews of multicore fibers, which agree well with the time domain method.

...read moreread less

Abstract: A frequency domain method is proposed to measure group delays, chromatic dispersion and skews of multicore fibers. We present detailed studies through measuring a 2×2 multicore fiber which agree well with the time domain method. © 2021 The Author(s)

...read moreread less

Collapse