Showing papers on "Overhead (computing) published in 2022"

PDF

Open Access

Journal Article•DOI•

Multi-Task Federated Learning for Personalised Deep Neural Networks in Edge Computing

[...]

Jed Mills¹, Jia Hu¹, Geyong Min¹•Institutions (1)

01 Mar 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this article, a multi-task federated learning (MTFL) algorithm is proposed that introduces non-federated batch-normalization (BN) layers into the federated DNN.

...read moreread less

Abstract: Federated Learning (FL) is an emerging approach for collaboratively training Deep Neural Networks (DNNs) on mobile devices, without private user data leaving the devices. Previous works have shown that non-Independent and Identically Distributed (non-IID) user data harms the convergence speed of the FL algorithms. Furthermore, most existing work on FL measures global-model accuracy, but in many cases, such as user content-recommendation, improving individual User model Accuracy (UA) is the real objective. To address these issues, we propose a Multi-Task FL (MTFL) algorithm that introduces non-federated Batch-Normalization (BN) layers into the federated DNN. MTFL benefits UA and convergence speed by allowing users to train models personalised to their own data. MTFL is compatible with popular iterative FL optimisation algorithms such as Federated Averaging (FedAvg), and we show empirically that a distributed form of Adam optimisation (FedAvg-Adam) benefits convergence speed even further when used as the optimisation strategy within MTFL. Experiments using MNIST and CIFAR10 demonstrate that MTFL is able to significantly reduce the number of rounds required to reach a target UA, by up to $5\times$ 5 × when using existing FL optimisation strategies, and with a further $3\times$ 3 × improvement when using FedAvg-Adam. We compare MTFL to competing personalised FL algorithms, showing that it is able to achieve the best UA for MNIST and CIFAR10 in all considered scenarios. Finally, we evaluate MTFL with FedAvg-Adam on an edge-computing testbed, showing that its convergence and UA benefits outweigh its overhead.

...read moreread less

85 citations

Journal Article•DOI•

Communication-Efficient Federated Learning With Compensated Overlap-FedAvg

[...]

Yuhao Zhou¹, Qing Ye¹, Jiancheng Lv¹•Institutions (1)

Sichuan University¹

01 Jan 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this paper, the authors propose Overlap-FedAvg, an innovative framework that loosed the chain-like constraint of federated learning and paralleled the model training phase with the model communication phase, so that the latter phase could be totally covered by the former phase.

...read moreread less

Abstract: While petabytes of data are generated each day by a number of independent computing devices, only a few of them can be finally collected and used for deep learning (DL) due to the apprehension of data security and privacy leakage, thus seriously retarding the extension of DL. In such a circumstance, federated learning (FL) was proposed to perform model training by multiple clients’ combined data without the dataset sharing within the cluster. Nevertheless, federated learning with periodic model averaging (FedAvg) introduced massive communication overhead as the synchronized data in each iteration is about the same size as the model, and thereby leading to a low communication efficiency. Consequently, variant proposals focusing on the communication rounds reduction and data compression were proposed to decrease the communication overhead of FL. In this article, we propose Overlap-FedAvg, an innovative framework that loosed the chain-like constraint of federated learning and paralleled the model training phase with the model communication phase (i.e., uploading local models and downloading the global model), so that the latter phase could be totally covered by the former phase. Compared to vanilla FedAvg, Overlap-FedAvg was further developed with a hierarchical computing strategy, a data compensation mechanism, and a nesterov accelerated gradients (NAG) algorithm. In Particular, Overlap-FedAvg is orthogonal to many other compression methods so that they could be applied together to maximize the utilization of the cluster. Besides, the theoretical analysis is provided to prove the convergence of the proposed framework. Extensive experiments conducting on both image classification and natural language processing tasks with multiple models and datasets also demonstrate that the proposed framework substantially reduced the communication overhead and boosted the federated learning process.

...read moreread less

34 citations

Journal Article•DOI•

Communication-efficient hierarchical federated learning for IoT heterogeneous systems with imbalanced data

[...]

Alaa Awad Abdellatif¹, Naram Mhaisen¹, Amr Mohamed¹, Aiman Erbad², Mohsen Guizani¹, Zaher Dawy³, Wassim Nasreddine³ - Show less +3 more•Institutions (3)

Qatar University¹, Khalifa University², American University of Beirut³

01 Mar 2022-Future Generation Computer Systems

TL;DR: In this article, the authors proposed an optimized solution for user assignment and resource allocation over hierarchical federated learning (FL) architecture for IoT heterogeneous systems, which is a promising solution for telemonitoring systems that demand intensive data collection, for detection, classification, and prediction of future events, from different locations while maintaining a strict privacy constraint.

...read moreread less

21 citations

Journal Article•DOI•

Elastic Deep Learning in Multi-Tenant GPU Clusters

[...]

Yidi Wu¹, Kaihao Ma¹, Xiao Yan¹, Zhi Liu¹, Zhenkun Cai¹, Yuzhen Huang¹, James Cheng¹, Han Yuan², Fan Yu² - Show less +5 more•Institutions (2)

The Chinese University of Hong Kong¹, Huawei²

01 Jan 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This work proposes EDL, which enables elastic deep learning with a simple API and can be easily integrated with existing deep learning frameworks such as TensorFlow and PyTorch, and incorporates techniques that are necessary to reduce the overhead of parallelism adjustments, such as stop-free scaling and dynamic data pipeline.

...read moreread less

Abstract: We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (i.e., the number of GPUs), for deep neural network (DNN) training in a GPU cluster. Elasticity can benefit multi-tenant GPU cluster management in many ways, for example, achieving various scheduling objectives (e.g., job throughput, job completion time, GPU efficiency) according to cluster load variations, utilizing transient idle resources, and supporting performance profiling, job migration, and straggler mitigation. We propose EDL, which enables elastic deep learning with a simple API and can be easily integrated with existing deep learning frameworks such as TensorFlow and PyTorch. EDL also incorporates techniques that are necessary to reduce the overhead of parallelism adjustments, such as stop-free scaling and dynamic data pipeline. We demonstrate with experiments that EDL can indeed bring significant benefits to the above-listed applications in GPU cluster management.

...read moreread less

19 citations

Journal Article•DOI•

Offloading dependent tasks in multi-access edge computing: A multi-objective reinforcement learning approach

[...]

Fuhong Song¹, Huanlai Xing¹, Xinhan Wang¹, Shouxi Luo¹, Penglin Dai¹, Ke Li¹ - Show less +2 more•Institutions (1)

Southwest Jiaotong University¹

01 Mar 2022-Future Generation Computer Systems

TL;DR: In this article, a multi-objective RL algorithm was proposed to minimize the application completion time, energy consumption of the mobile device, and usage charge for edge computing, subject to dependency constraints.

...read moreread less

16 citations

Journal Article•DOI•

A comprehensive survey on clustering in vehicular networks: Current solutions and future challenges

[...]

Muddasar Ayyub¹, Alma Oracevic, Rasheed Hussain, Ammara Anjum Khan², Zhongshan Zhang³ - Show less +1 more•Institutions (3)

University of Science and Technology Beijing¹, University of Technology, Sydney², Beijing Institute of Technology³

01 Jan 2022

TL;DR: A comprehensive survey on the recent advancements in the clustering schemes for vehicular networks is presented in this article, where the authors take a holistic approach to classify the algorithms by focusing on, (i) the objective of clustering mechanisms (i.e., reliability, scalability, stability, routing overhead, and delay), (ii) general-purpose clustering algorithms, (iii) application-based (i., QoS, MAC, security, etc.) clustering, and iv) technology-based clustering (machine learning-based, nature-inspired, fuzzy logic-

...read moreread less

Abstract: Vehicular networks are on the verge of deployment, thanks to the advancements in computation and communication technologies. This breed of ad hoc networks leverages vehicles as nodes with Vehicle-to-anything (V2X) communication paradigm. Clustering is considered one of the most important techniques used to enhance network stability, reliability, and scalability. Furthermore, clustering employs bandwidth optimization by reducing the overhead and transmission delay and helps in mitigating the hidden node problem. To date, extensive research has been done to address clustering issues in vehicular networks, and several surveys have also been published in the literature. However, a holistic approach towards clustering in vehicular networks is still lacking. In this regard, we conduct a comprehensive survey on the recent advancements in the clustering schemes for vehicular networks. We take a holistic approach to classify the algorithms by focusing on, (i) the objective of clustering mechanisms (i.e., reliability, scalability, stability, routing overhead, and delay), (ii) general-purpose clustering algorithms, (iii) application-based (i.e., QoS, MAC, security, etc.) clustering, and iv) technology-based clustering (machine learning-based, nature-inspired, fuzzy logic-based and software-defined networking-based clustering). We investigate the existing clustering mechanisms keeping in mind the factors such as cluster formation, maintenance, and management. Additionally, we present a comprehensive set of parameters for selecting cluster heads and the role of enabling technologies for cluster maintenance. Finally, we identify future research trends in clustering techniques for vehicular networks and their various breeds. This survey will act as a one-stop shop for the researchers, practitioners, and system designers to select the right clustering mechanism for their applications, services, or for their research. As a result of this survey, we can see that clustering is heavily dependent on the underlying application, context, environment, and communication paradigm. Furthermore, clustering in vehicular networks can greatly benefit from enabling technologies such as artificial intelligence.

...read moreread less

16 citations

Journal Article•DOI•

Hairpin RNA genetic algorithm based ANFIS for modeling overhead cranes

[...]

Xiaohua Zhu¹, M. A Duenas-Lopez², Ning Wang³•Institutions (3)

Zhangzhou Normal University¹, Beijing University of Technology², Zhejiang University³

15 Feb 2022-Mechanical Systems and Signal Processing

TL;DR: The RNA genetic algorithm with hairpin genetic operators (hRNA-GA), inspired by the hairpin structure in RNA molecules, is proposed, which has better search ability and is applied to find the optimal parameters of ANFISs for modeling an actual overhead crane system.

...read moreread less

15 citations

Journal Article•DOI•

A structural health monitoring system of the overhead transmission line conductor

[...]

Long Zhao¹, Tian Zhang¹, Xinbo Huang¹, Zhang Ye¹, Wei Liu - Show less +1 more•Institutions (1)

Xi'an Polytechnic University¹

01 Jan 2022-Iet Science Measurement & Technology

6 citations

Journal Article•DOI•

Efficient privacy-preserving data replication in fog-enabled IoT

[...]

Kinza Sarwar¹, Sira Yongchareon¹, Jian Yu¹, Saeed Ur Rehman²•Institutions (2)

Auckland University of Technology¹, Flinders University²

01 Mar 2022-Future Generation Computer Systems

TL;DR: In this article, the authors proposed a data replica creation scheme based on a Level of Privacy (LoP) defined by data owners and the service capacity of fog nodes, which can significantly achieve efficient replicas privacy, prediction accuracy, and outperform the existing state-of-the-art schemes in terms of computational and memory costs.

...read moreread less

5 citations

Journal Article•DOI•

Toward Taming the Overhead Monster for Data-flow Integrity

[...]

FengLang¹, HuangJiayi², HuangJeff¹, HuJiang¹•Institutions (2)

Texas A&M University¹, University of California, Santa Barbara²

31 May 2022-ACM Transactions on Design Automation of Electronic Systems

TL;DR: Data-Flow Integrity (DFI) is a well-known approach to effectively detecting a wide range of software attacks as discussed by the authors, however, its real-world application has been quite limited so far because of the prohib...

...read moreread less

Abstract: Data-Flow Integrity (DFI) is a well-known approach to effectively detecting a wide range of software attacks. However, its real-world application has been quite limited so far because of the prohib...

...read moreread less

5 citations

Journal Article•DOI•

ViTrack: Efficient Tracking on the Edge for Commodity Video Surveillance Systems

[...]

Linsong Cheng¹, Jiliang Wang¹, Yinghui Li¹•Institutions (1)

Tsinghua University¹

01 Mar 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: ViTrack is a framework for efficient multi-video tracking using computation resource on the edge for commodity video surveillance systems and leverages a Markov Model based approach to efficiently recover missing information and finally derive the complete trajectory.

...read moreread less

Abstract: Nowadays, video surveillance systems are widely deployed in various places, e.g., schools, parks, airports, roads, etc. However, existing video surveillance systems are far from full utilization due to high computation overhead in video processing. In this work, we present ViTrack, a framework for efficient multi-video tracking using computation resource on the edge for commodity video surveillance systems. In the heart of ViTrack lies a two layer spatial/temporal compressed target detection method to significantly reduce the computation overhead by combining videos from multiple cameras. Further, ViTrack derives the video relationship and camera information even in absence of camera location, direction, etc. To alleviate the impact of variant video quality and missing targets, ViTrack leverages a Markov Model based approach to efficiently recover missing information and finally derive the complete trajectory. We implement ViTrack on a real deployed video surveillance system with 110 cameras. The experiment results demonstrate that ViTrack can provide efficient trajectory tracking with processing time 45x less than the existing approach. For 110 video cameras, ViTrack can run on a Dell OptiPlex 390 computer to track given targets in almost real time. We believe ViTrack can enable practical video analysis for widely deployed commodity video surveillance systems.

...read moreread less

Journal Article•DOI•

MSDA: multi-subset data aggregation scheme without trusted third party

[...]

Zhixin Zeng¹, Xiaodi Wang¹, Yining Liu¹, Liang Chang¹•Institutions (1)

Guilin University of Electronic Technology¹

01 Feb 2022-Frontiers of Computer Science

TL;DR: In this article, a multi-subset data aggregation scheme for the smart grid is proposed without a trusted third party, in which the control center collects the number of users in different subsets, and obtains the sum of electricity consumption in each subset, meantime individual user's data privacy is still preserved.

...read moreread less

Abstract: Data aggregation has been widely researched to address the privacy concern when data is published, meanwhile, data aggregation only obtains the sum or average in an area. In reality, more fine-grained data brings more value for data consumers, such as more accurate management, dynamic price-adjusting in the grid system, etc. In this paper, a multi-subset data aggregation scheme for the smart grid is proposed without a trusted third party, in which the control center collects the number of users in different subsets, and obtains the sum of electricity consumption in each subset, meantime individual user’s data privacy is still preserved. In addition, the dynamic and flexible user management mechanism is guaranteed with the secret key negotiation process among users. The analysis shows MSDA not only protects users’ privacy to resist various attacks but also achieves more functionality such as multi-subset aggregation, no reliance on any trusted third party, dynamicity. And performance evaluation demonstrates that MSDA is efficient and practical in terms of communication and computation overhead.

...read moreread less

Journal Article•DOI•

Multi-objective biogeography-based optimization and reinforcement learning hybridization for network-on chip reliability improvement

[...]

Nassima Kadri¹, Mouloud Koudil¹•Institutions (1)

École Normale Supérieure¹

01 Mar 2022-Journal of Parallel and Distributed Computing

TL;DR: In this paper, an innovative reliability-aware task mapping technique is presented, based on a hybridization between Multi-Objective Optimization (MOO) and Reinforcement Learning (RL).

...read moreread less

Journal Article•DOI•

Non-interactive verifiable privacy-preserving federated learning

[...]

Yi Xu¹, Changgen Peng¹, Weijie Tan¹, Youliang Tian¹, Minyao Ma, Kun Niu¹ - Show less +2 more•Institutions (1)

Guizhou University¹

01 Mar 2022-Future Generation Computer Systems

TL;DR: Li et al. as discussed by the authors proposed a non-interactive verifiable privacy-preserving FL based on dual-servers (NIVP-DS) architecture, which improves the efficiency and security of the system and is robust to clients dropping out, based on the constraints that the communication overhead between client and server not more than 2 × that of plaintext computation.

...read moreread less

Journal Article•DOI•

A scalable hierarchically distributed architecture for next-generation applications

[...]

Hemanth Kumar Ravuri¹, Maria Torres Vega¹, Jeroen van der Hooft¹, Tim Wauters¹, Filip De Turck¹ - Show less +1 more•Institutions (1)

Ghent University¹

01 Jan 2022-Journal of Network and Systems Management

TL;DR: In this article, the authors proposed a scalable hierarchical control plane architecture for SDN/NFV-based next-generation application domains such as immersive media delivery system and implemented the proposed architecture based on the well-known ZeroSDN controller.

...read moreread less

Abstract: The rigidity of traditional network architectures, with tightly coupled control and data planes, impairs their ability to adapt to highly dynamic requirements of future application domains. While Software-Defined Networking (SDN) can provide the required dynamism, it suffers from scalability issues. Therefore, efforts have been made to propose alternative decentralized solutions, such as the flat distributed SDN architecture. Such alternatives address the scalability problem for mainly local flows, but are impaired by a substantial increase in the overhead for cross-domain flow setup. To manage the trade-off between scalability and overhead, there is a need for intermediate hierarchical solutions. However, these have not been explored to the complete potential so far. Furthermore, the Network Function Virtualization (NFV) paradigm complements SDN by offering computational and storage services in the form of Virtual Network Functions (VNFs). When integrated seamlessly, both SDN and NFV can offer solutions to the problems posed by highly dynamic application domains. Hence, this work proposes a scalable hierarchical SDN control plane architecture for SDN/NFV-based next-generation application domains such as immersive media delivery system. We have implemented the proposed architecture based on the well-known state-of-the-art ZeroSDN controller. To evaluate the performance of the architecture, we have implemented an on-demand immersive media (point cloud) streaming application and varied the load on the control plane using the background traffic. To benchmark our solution, we have evaluated its performance in comparison with the centralized and flat distributed architectures. We show that the proposed architecture performs better than the rest in terms of scalability, lost flows, and processing latency. Our study shows that the proposed architecture when distributed to three controllers, accepts 23% more flows with almost 70% reduced processing latency compared to the state-of-the-art ONOS controller.

...read moreread less

Journal Article•DOI•

Using hardware performance counters to speed up autotuning convergence on GPUs

[...]

Jiří Filipovič¹, Jana Hozzová¹, Amin Nezarat¹, Jaroslav Ol’ha¹, Filip Petrovič¹ - Show less +1 more•Institutions (1)

Masaryk University¹

01 Feb 2022-Journal of Parallel and Distributed Computing

TL;DR: In this article, a method for searching generic tuning spaces is proposed, where the tuning spaces can contain tuning parameters changing any user-defined property of the source code and can be used to navigate the searching process towards faster implementations.

...read moreread less

Journal Article•DOI•

A case study of rupture in overhead ground wire twined by armor rod

[...]

Nguyen Quoc Hung¹, Honglei Deng, Peng Ruidong, Meishan Zhong, Rui Yang, Yongchun Liang, Deming Guo, Gang Liu - Show less +4 more•Institutions (1)

Xiamen University¹

01 Jan 2022-Engineering Failure Analysis

Book Chapter•DOI•

Design and Analysis of Low Signaling Overhead Multiple Access Protocol for Wireless Ad Hoc Networks

[...]

Hangyu Li, Xu Li, Tao Jing

01 Jan 2022

Journal Article•DOI•

Return just your search: privacy-preserving homoglyph search for arbitrary languages

[...]

Bowen Zhao¹, Shaohua Tang¹, Ximeng Liu², Wu Yiming¹•Institutions (2)

South China University of Technology¹, Fuzhou University²

01 Apr 2022-Frontiers of Computer Science

TL;DR: Wang et al. as mentioned in this paper proposed an efficient privacy-preserving homoglyph search scheme supporting arbitrary languages (POSA), which enhances the performance of the fuzzy keyword search in three aspects.

...read moreread less

Abstract: Searchable encryption is an effective way to ensure the security and availability of encrypted outsourced cloud data. Among existing solutions, the keyword exact search solution is relatively inflexible, while the fuzzy keyword search solution either has a high index overhead or suffers from the false-positive. Furthermore, no existing fuzzy keyword search solution considers the homoglyph search on encrypted data. In this paper, we propose an efficient privacy-preserving homoglyph search scheme supporting arbitrary languages (POSA, in short). We enhance the performance of the fuzzy keyword search in three aspects. Firstly, we formulate the similarity of homoglyph and propose a privacy-preserving homoglyph search. Secondly, we put forward an index build mechanism without the false-positive, which reduces the storage overhead of the index and is suitable for arbitrary languages. Thirdly, POSA returns just the user’s search, i.e., all returned documents contain the search keyword or its homoglyph. The theoretical analysis and experimental evaluations on real-world datasets demonstrate the effectiveness and efficiency of POSA.

...read moreread less

Journal Article•DOI•

Ripple: An approach to locate k nearest neighbours for location-based services

[...]

Pratima Biswas¹, Sourav Kumar Dandapat¹, Ashok Singh Sairam²•Institutions (2)

Indian Institute of Technology Patna¹, Indian Institute of Technology Guwahati²

01 Mar 2022-Information Systems

TL;DR: Zhang et al. as discussed by the authors proposed a light-weighted framework to find the nearest neighbours, which progressively expands the investigation region as a series of concentric squares, with the cell of query point as the centre, and optimize the search space by exploiting the distance between the query point and the distant candidate neighbours.

...read moreread less

Journal Article•DOI•

Dynamic Bandwidth allocation algorithm for avoiding Frame rearrangement in NG-EPON

[...]

Ammar Rafiq¹, Muhammad Faisal Hayat¹, Muhammad Usman Younus², Muhammad Usman Younus³•Institutions (3)

University of Engineering and Technology, Lahore¹, COMSATS Institute of Information Technology², University of Toulouse³

01 Feb 2022-Optical Switching and Networking

TL;DR: The simulation results show that the proposed DWBA minimizes frame-rearrangement problem as compared to existing DWBA algorithms and proves to be more efficient based on average (end-to-end) delay and completion time.

...read moreread less

Book Chapter•DOI•

CloudML: Privacy-Assured Healthcare Machine Learning Model for Cloud Network

[...]

S. Savitha¹, Sathish Kumar Ravichandran¹•Institutions (1)

Christ University¹

01 Jan 2022

TL;DR: In this paper, the authors proposed a Cloud-Machine Learning (CloudML) model for encrypted heart disease datasets by employing a privacy preservation scheme in it, which is designed in such a way that it does not vary in accuracy while clustering the datasets.

...read moreread less

Abstract: Cloud computing is the need of the twenty-first century with an exponential increase in the volume of data. Compared to any other technologies, the cloud has seen fastest adoption in the industry. The popularity of cloud is closely linked to the benefits it offers which ranges from a group of stakeholders to huge number of entrepreneurs. This enables some prominent features such as elasticity, scalability, high availability, and accessibility. So, the increase in popularity of the cloud is linked to the influx of data that involves big data with some specialized techniques and tools. Many data analysis applications use clustering techniques incorporated with machine learning to derive useful information by grouping similar data, especially in healthcare and medical department for predicting symptoms of diseases. However, the security of healthcare data with a machine learning model for classifying patient’s information and genetic data is a major concern. So, to solve such problems, this paper proposes a Cloud-Machine Learning (CloudML) Model for encrypted heart disease datasets by employing a privacy preservation scheme in it. This model is designed in such a way that it does not vary in accuracy while clustering the datasets. The performance analysis of the model shows that the proposed approach yields significant results in terms of Communication Overhead, Storage Overhead, Runtime, Scalability, and Encryption Cost.

...read moreread less

Journal Article•DOI•

A traffic anomaly detection scheme for non-directional denial of service attacks in software-defined optical network

[...]

Tao Liu¹, Tao Liu², He Wang², Yuqing Zhang¹, Yuqing Zhang² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Xidian University²

01 Jan 2022-Computers & Security

TL;DR: In this article, the authors proposed a traffic anomaly detection scheme by analyzing and defining the specific security threat non-directional denial of service attack (ND-DoS) faced by the SDON.

...read moreread less

Book Chapter•DOI•

A New Detection Method of Overhead Power Line Based on HED Algorithm

[...]

Haiping Liang¹, Yanzhe Yin¹, Wang Xinming, Li Shaobo, Liang Huayang, Shize Li - Show less +2 more•Institutions (1)

North China Electric Power University¹

01 Jan 2022

TL;DR: The Holistically-Nested edge detection (HED) algorithm is based on the Tensorflow deep learning framework to build a HED network model and train it, and the pixel-level segmentation of the wires in the image is realized.

...read moreread less

Abstract: Power line detection is an important content of power system inspection, which of important significance for UAV inspection, obstacle avoidance, and visual measurement. Traditional power line detection algorithms are mostly based on straight line detection, and the power lines are further extracted from the detected straight lines. Therefore, the disturbance of background information is large, and the detection effect for curved lines is not satisfied. Classical deep learning object detection algorithms are not suitable for the detection of elongated objects such as overhead power lines. Based on the Holistically-Nested edge detection (HED) algorithm, this paper uses the Tensorflow deep learning framework to build a HED network model and train it. Through pixel-level classification of the image, the pixel-level segmentation of the wires in the image is realized. Through experimental verification, the power line detection achieves a high accuracy rate.

...read moreread less

Journal Article•DOI•

Cracking in-memory database index: A case study for Adaptive Radix Tree index

[...]

Gang Wu¹, Gang Wu², Yidong Song¹, Guodong Zhao¹, Sun Wei³, Donghong Han¹, Baiyou Qiao¹, Guoren Wang⁴, Ye Yuan⁴ - Show less +5 more•Institutions (4)

Northeastern University (China)¹, Nanjing University², Baidu³, Beijing Institute of Technology⁴

01 Feb 2022-Information Systems

TL;DR: In this paper, an algorithm using auxiliary data structures to crack the Adaptive Radix Tree (ART) index structure of in-memory databases has been proposed, which makes it possible to build up an ART index step by step with incessant queries, and hence avoids the poor instant availability of a complete index which is constructed once and for all.

...read moreread less

Journal Article•DOI•

Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution

[...]

Konstantinos Iliakis¹, Sotirios Xydis², Dimitrios Soudris¹•Institutions (2)

National Technical University of Athens¹, National and Kapodistrian University of Athens²

01 Feb 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this article, the authors propose a micro-architecture with lightweight out-of-order execution capability enabling instruction-level parallelism to complement the conventional Thread-Level Parallelism model.

...read moreread less

Abstract: GPU is the dominant platform for accelerating general-purpose workloads due to its computing capacity and cost-efficiency. GPU applications cover an ever-growing range of domains. To achieve high throughput, GPUs rely on massive multi-threading and fast context switching to overlap computations with memory operations. We observe that among the diverse GPU workloads, there exists a significant class of kernels that fail to maintain a sufficient number of active warps to hide the latency of memory operations, and thus suffer from frequent stalling. We argue that the dominant Thread-Level Parallelism model is not enough to efficiently accommodate the variability of modern GPU applications. To address this inherent inefficiency, we propose a novel micro-architecture with lightweight Out-Of-Order execution capability enabling Instruction-Level Parallelism to complement the conventional Thread-Level Parallelism model. To minimize the hardware overhead, we carefully design our extension to highly re-use the existing micro-architectural structures and study various design trade-offs to contain the overall area and power overhead, while providing improved performance. We show that the proposed architecture outperforms traditional platforms by 23 percent on average for low-occupancy kernels, with an area and power overhead of 1.29 and 10.05 percent, respectively. Finally, we establish the potential of our proposal as a micro-architecture alternative by providing 16 percent speedup over a wide collection of 60 general-purpose kernels.

...read moreread less

Journal Article•DOI•

Outage analysis of mmWave-NOMA transmission in the presence of LOS and NLOS paths

[...]

Seyed Mahmoud Pishvaei¹, Farid Tabee Miandoab¹, Behzad Mozaffari Tazehkand¹•Institutions (1)

University of Tabriz¹

01 Mar 2022-Future Generation Computer Systems

TL;DR: In this paper, the authors optimized the power allocation factors to improve the random beamforming performance in mmWave-NOMA networks and analyzed the impact of the non-line of sight (NLOS) path alongside the line-of-sight (LOS) path under the outage probability term.

...read moreread less

Journal Article•DOI•

An in-place truncated Fourier transform

[...]

Nicholas Coxon

01 May 2022-Journal of Symbolic Computation

TL;DR: In this paper, simple modifications to van der Hoeven's forward and inverse truncated Fourier transforms allow the algorithms to be performed in-place, and with only a linear overhead in complexity.

...read moreread less

Book Chapter•DOI•

Sliding Mode Control of Non-linear Double-Pendulum Overhead Cranes with Prescribed Trolley Convergence

[...]

Muhammad A. Shehu, Aijun Li, Ying Zeng

01 Jan 2022

Journal Article•DOI•

Fast and Portable Concurrent FIFO Queues With Deterministic Memory Reclamation

[...]

Oliver Giersch¹, Jörg Nolte¹•Institutions (1)

Brandenburg University of Technology¹

01 Mar 2022-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this article, the authors present an algorithm for a high performance, unbounded, portable, multi-producer/multi-consumer, lock-free FIFO (first-in first-out) queue.

...read moreread less

Abstract: In this article we present an algorithm for a high performance, unbounded, portable, multi-producer/multi-consumer, lock-free FIFO (first-in first-out) queue. Aside from its competitive performance on current hardware, it is further characterized by its integrated memory reclamation mechanism, which is able to reliably and deterministically de-allocate nodes as soon as the final operation with a reference has concluded, similar to reference counting . This differentiates our approach from most other lock-free data structures, which usually require external (generic) memory reclamation or garbage collection mechanisms such as hazard pointers . Our deterministic memory reclamation mechanism completely prevents the build up of memory awaiting reclamation and is hence very memory efficient, yet it does not introduce any substantial performance overhead. By utilizing concrete knowledge about the internal structure and access patterns of our queue, we are able to construct and constrain the reclamation mechanism in such a way that keeps the overhead for memory management almost entirely out of the common fast path. The presented algorithm is portable to all modern 64-bit processor architectures, as it only relies on the commonly available and lock-free atomic synchronization primitives compare-and-swap and fetch-and-add .

...read moreread less