Parallel Multidimensional Lookahead Sorting Algorithm

doi:10.1109/ACCESS.2019.2920917

Home
/
Papers
/
Parallel Multidimensional Lookahead Sorting Algorithm

Journal Article•DOI•

Parallel Multidimensional Lookahead Sorting Algorithm

Fayez Gebali¹, Mohamed Taher², Ahmed Medhat Zaki², M. Watheq El-Kharashi¹, A. Tawfik³ - Show less +1 more•Institutions (3)

University of Victoria¹, Ain Shams University², Ajman University of Science and Technology³

05 Jun 2019-IEEE Access (Institute of Electrical and Electronics Engineers (IEEE))-Vol. 7, pp 75446-75463

TL;DR: The proposed technique is ideally suited for general purpose graphic processing units and shared-memory massively parallel processor systems and ensures that data being processed exhibits temporal and spatial locality to maximize the utilization of processor cache.

read less

Abstract: This paper presents a new parallel structured lookahead multidimensional sorting algorithm. Our algorithm can be based on any sequential sorting algorithm. The amount of parallelism can be controlled using several parameters such as the number of threads, word size, memory/processor communication overhead, and the dimension of the algorithm. The proposed technique is ideally suited for general purpose graphic processing units and shared-memory massively parallel processor systems. It ensures that data being processed exhibits temporal and spatial locality to maximize the utilization of processor cache. The algorithm achieves a speedup even when a single processor is used. A lookahead algorithm is also proposed to achieve even higher speedup. The performance of the proposed algorithm is verified numerically and experimentally.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Optimization and Prediction Model of Flatness Actuator Efficiency in Cold Rolling Process Based on Process Data

[...]

Pengfei Wang¹, Shuren Jin¹, Xu Li², Huagui Huang¹, Haifeng Wang, Dianhua Zhang², Wentian Li, Yulin Yao - Show less +4 more•Institutions (2)

Yanshan University¹, Northeastern University (China)²

26 Aug 2021-Steel Research International

5 citations

Journal Article•DOI•

A radix sorting parallel algorithm suitable for graphic processing unit computing

[...]

Shi‐yang Xiao¹, Cai‐lin Li², Bao‐yun Guo², Han Xiao•Institutions (2)

Northeast Forestry University¹, Shandong University of Technology²

25 Mar 2021-Concurrency and Computation: Practice and Experience

TL;DR: The experimental results show that the B_LSD_RS parallel algorithm based on OpenCL not only achieves high performance but also achieves performance portability among different GPU computing platforms.

...read moreread less

Abstract: Radix sorting is an essential basic data processing operation in many computer fields. It has important practical significance to accelerate its performance through Graphic Processing Unit (GPU). The heterogeneous parallel computing technology attracts much attention and is widely applied for its effective computation efficiency and parallel real‐time data processing capability. Taking advantage of the parallelism of GPU in numerical computation processing, a parallelization design method of the Binary_Least Significant Digit (LSD) first Radix Sorting (B_LSD_RS) algorithm based on Open Computing Language (OpenCL) is proposed. The radix sorting algorithm is divided into multiple kernel tasks, and the kernels are sequentially controlled by the event information transfer. The parallel algorithm is implemented and verified on the GPU + CPU heterogeneous platform. The experimental results show that compared with the performance of the B_LSD_RS sequential algorithm based on AMD Ryzen5 1600X CPU, B_LSD_RS parallel algorithm based on Open Multi‐Processing (OpenMP) and B_LSD_RS parallel algorithm based on Compute Unified Device Architecture (CUDA), the B_LSD_RS parallel algorithm based on OpenCL obtained 28.86 times, 11.01 times and 2.14 times speedup in the NVIDIA GTX 1070 computing platform respectively, not only achieves high performance but also achieves performance portability among different GPU computing platforms.

...read moreread less

3 citations

Journal Article•DOI•

Sorting Data via a Look-Up-Table Neural Network and Self-Regulating Index

[...]

Ying Zhao¹, Dongli Hu¹, Dongxia Huang¹, You Liu¹, Yang Zitong¹, Lei Mao¹, Chao Liu, Fangfang Zhou¹ - Show less +4 more•Institutions (1)

Central South University¹

27 Jul 2020-Complexity

TL;DR: Results of three controlled experiments demonstrate that LS can effectively control the monotonicity and boundedness, achieve a better time consumption than quick sort and Google’s learned sorting, and present an excellent stability when the data size or the number of repetitive elements increases.

...read moreread less

Abstract: The so-called learned sorting, which was first proposed by Google, achieves data sorting by predicting the placement positions of unsorted data elements in a sorted sequence based on machine learning models. Learned sorting pioneers a new generation of sorting algorithms and shows a great potential because of a theoretical time complexity and easy access to hardware-driven accelerating approaches. However, learned sorting has two problems: controlling the monotonicity and boundedness of the predicted placement positions and dealing with placement conflicts of repetitive elements. In this paper, a new learned sorting algorithm named LS is proposed. We integrate a back propagation neural network with the technique of look-up-table in LS to guarantee the monotonicity and boundedness of the predicted placement positions. We design a data structure called the self-regulating index in LS to tentatively store and duly update placement positions for eliminating potential placement conflicts. Results of three controlled experiments demonstrate that LS can effectively control the monotonicity and boundedness, achieve a better time consumption than quick sort and Google’s learned sorting, and present an excellent stability when the data size or the number of repetitive elements increases.

...read moreread less

1 citations

Journal Article•DOI•

Stock market prediction using weighted inter-transaction class association rule mining and evolutionary algorithm

[...]

Minh-Tai Vo¹•Institutions (1)

Zhejiang University of Technology¹

08 Apr 2022

TL;DR: Zhang et al. as discussed by the authors proposed a new rule mining method, named genetic network programming (GNP), to solve the prediction problem using the evolutionary algorithm, which provides many advantages in financial prediction, since it can discover relationships among the attributes of different transactions.

...read moreread less

Abstract: Evolutionary computation and data mining are two fascinating fields that have attracted many researchers. This paper proposes a new rule mining method, named genetic network programming (GNP), to solve the prediction problem using the evolutionary algorithm. Compared with the conventional association rule methods that do not consider the weight factor, the proposed algorithm provides many advantages in financial prediction, since it can discover relationships among the attributes of different transactions. Experimental results on data from the New York Exchange Market show that the new method outperforms other conventional models in terms of both accuracy and profitability, and the proposed method can establish more important and accurate rules than the conventional methods. The results confirmed the effectiveness of the proposed data mining method in financial prediction.

...read moreread less

1 citations

DOI•

Stock market prediction using weighted inter-transaction class association rule mining and evolutionary algorithm

[...]

Yan Chen, Dongxu Mo, Feipeng Zhang

08 Apr 2022

TL;DR: Zhang et al. as mentioned in this paper proposed a new rule mining method, named genetic network programming (GNP), to solve the prediction problem using the evolutionary algorithm, which provides many advantages in financial prediction, since it can discover relationships among the attributes of different transactions.

...read moreread less

Abstract: Abstract Evolutionary computation and data mining are two fascinating fields that have attracted many researchers. This paper proposes a new rule mining method, named genetic network programming (GNP), to solve the prediction problem using the evolutionary algorithm. Compared with the conventional association rule methods that do not consider the weight factor, the proposed algorithm provides many advantages in financial prediction, since it can discover relationships among the attributes of different transactions. Experimental results on data from the New York Exchange Market show that the new method outperforms other conventional models in terms of both accuracy and profitability, and the proposed method can establish more important and accurate rules than the conventional methods. The results confirmed the effectiveness of the proposed data mining method in financial prediction.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Sorting networks and their applications

[...]

Kenneth E. Batcher¹•Institutions (1)

Goodyear Aerospace¹

30 Apr 1968

TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.

...read moreread less

Abstract: To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently. A major problem in the design of such a computing system is the connecting together of the various parts of the system (the I/O devices, memories, processing units, etc.) in such a way that all the required data transfers can be accommodated. One common scheme is a high-speed bus which is time-shared by the various parts; speed of available hardware limits this scheme. Another scheme is a cross-bar switch or matrix; limiting factors here are the amount of hardware (an m × n matrix requires m × n cross-points) and the fan-in and fan-out of the hardware.

...read moreread less

2,553 citations

Proceedings Article•DOI•

Designing efficient sorting algorithms for manycore GPUs

[...]

Nadathur Satish¹, Mark J. Harris², Michael Garland²•Institutions (2)

University of California, Berkeley¹, Nvidia²

23 May 2009

TL;DR: The design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA, are described, which are the fastest GPU sort and the fastest comparison-based sort reported in the literature.

...read moreread less

Abstract: We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort and our merge sort is the fastest comparison-based sort reported in the literature. Our radix sort is up to 4 times faster than the graphics-based GPUSort and greater than 2 times faster than other CUDA-based radix sorts. It is also 23% faster, on average, than even a very carefully optimized multicore CPU sorting routine. To achieve this performance, we carefully design our algorithms to expose substantial fine-grained parallelism and decompose the computation into independent tasks that perform minimal global communication. We exploit the high-speed onchip shared memory provided by NVIDIA's GPU architecture and efficient data-parallel primitives, particularly parallel scan. While targeted at GPUs, these algorithms should also be well-suited for other manycore processors.

...read moreread less

684 citations

"Parallel Multidimensional Lookahead..." refers background in this paper

...Figure 23 shows the speedup of our proposed algorithm on different GPUs as compared to the quick sort algorithm....
[...]
...We compared the efficiency of the algorithm on the Intel Core i7 − 3770K CPU with a frequency of 3.5 GHz and on NVIDIA GPUs GeForce GT 720M , GeForce GTX 980, and GeForce GTX 1080....
[...]
...[20] L. Ha, J. Krüger, and C. T. Silva, ‘‘Fast four-way parallel radix sorting on GPUs,’’ Comput....
[...]
...Satish et al. proposed a parallel radix sort algorithm for manycore GPUs [19]....
[...]
...[26] Z. Yildiz, M. Aydin, and G. Yilmaz, ‘‘Parallelization of bitonic sort and radix sort algorithms on many core GPUs,’’ in Proc....
[...]

Book•

Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition)

[...]

Barry Wilkinson, Michael Allen

01 Mar 2004

TL;DR: This chapter discusses distributed Shared Memory Systems and Programming, which involvesributed shared memory systems and programming, and its applications, which involve distributed shared memory and programming.

...read moreread less

619 citations

Journal Article•DOI•

Sorting on a mesh-connected parallel computer

[...]

C. D. Thompson¹, Hsiang-Tsung Kung¹•Institutions (1)

Carnegie Mellon University¹

01 Apr 1977-Communications of The ACM

TL;DR: Two algorithms are presented for sorting n2 elements on an n × n mesh-connected processor array that require O (n) routing and comparison steps and are shown to be optimal in time within small constant factors.

...read moreread less

Abstract: Two algorithms are presented for sorting n2 elements on an n × n mesh-connected processor array that require O (n) routing and comparison steps. The best previous algoritmhm takes time O(n log n). The algorithms of this paper are shown to be optimal in time within small constant factors. Extensions to higher-dimensional arrays are also given.

...read moreread less

489 citations

Journal Article•DOI•

Real-time parallel hashing on the GPU

[...]

Dan A. Alcantara¹, Andrei Sharf¹, Fatemeh Abbasinejad¹, Shubhabrata Sengupta¹, Michael Mitzenmacher², John D. Owens¹, Nina Amenta¹ - Show less +3 more•Institutions (2)

University of California, Davis¹, Harvard University²

01 Dec 2009

TL;DR: An efficient data-parallel algorithm for building large hash tables of millions of elements in real-time, which considers a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations.

...read moreread less

Abstract: We demonstrate an efficient data-parallel algorithm for building large hash tables of millions of elements in real-time. We consider two parallel algorithms for the construction: a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations. Our construction is a hybrid approach that uses both algorithms. We measure the construction time, access time, and memory usage of our implementations and demonstrate real-time performance on large datasets: for 5 million key-value pairs, we construct a hash table in 35.7 ms using 1.42 times as much memory as the input data itself, and we can access all the elements in that hash table in 15.3 ms. For comparison, sorting the same data requires 36.6 ms, but accessing all the elements via binary search requires 79.5 ms. Furthermore, we show how our hashing methods can be applied to two graphics applications: 3D surface intersection for moving data and geometric hashing for image matching.

...read moreread less

194 citations