scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Efficient sorting algorithms for the cell broadband engine

06 Jul 2008-pp 736-741
TL;DR: This paper proposes some novel sorting mechanisms specific to the cell broadband engine and juxtapose these algorithms with similar ones implemented on Itanium 2 processor as well as the Pentium 4 processor.
Abstract: The problem of sorting has been studied extensively and many algorithms have been suggested in the literature for the problem. Literature on parallel sorting is abundant. Many of the algorithms proposed, though being theoretically important, may not perform satisfactorily in practice owing to large constants in their time bounds. The algorithms presented in this paper have the potential of being practical. We suggest some novel sorting mechanisms specific to the cell broadband engine. We try to utilize the specifics of its architecture in order to get the optimum performance. As part of our comparative analysis we juxtapose these algorithms with similar ones implemented on Itanium 2 processor as well as the Pentium 4 processor.
Citations
More filters
Proceedings ArticleDOI
22 Jun 2010
TL;DR: This paper considers the sorting of a large number of multifield records on the Cell Broadband engine and shows that this method outperforms previously proposed sort methods that use either comb sort or bitonic sort for run generation followed by a 2-way odd-even merging of runs.
Abstract: We consider the sorting of a large number of multifield records on the Cell Broadband engine We show that our method, which generates runs using a 2-way merge and then merges these runs using a 4-way merge, outperforms previously proposed sort methods that use either comb sort or bitonic sort for run generation followed by a 2-way odd-even merging of runs Interestingly, best performance is achieved by using scalar memory copy instructions rather than vector instructions

7 citations

Journal ArticleDOI
TL;DR: A hybrid thread-level and process-level parallel algorithm for sorting Multisets is presented on the heterogeneous cluster with multi-core nodes which have different amount of processing cores, different computation and communication abilities, and distinct size of main memory.
Abstract: By distributing adaptively the data blocks to the processing cores to balance their computation loads and applying the strategy of “the extremum of the extremums” to select the data with the same keys, a cache-efficient and thread-level parallel algorithm for sorting Multisets on the multi-core computer is proposed. For the sorting Multisets problem, an aperiodic multi-round data distribution model is presented, which the first round scheduling assigns data blocks into the slave multi-core nodes according to the given distribution order and the other rounds scheduling will distribute data blocks into the slave multi-core nodes by first request first distribution strategy. The scheduling technique can ensure that each slave node can receive the next required data block before it finishes sorting the current data block in its own main memory. A hybrid thread-level and process-level parallel algorithm for sorting Multisets is presented on the heterogeneous cluster with multi-core nodes which have different amount of processing cores, different computation and communication abilities, and distinct size of main memory. The experimental results on the single multi-core computer and the heterogeneous cluster with multi-core computers show that the presented parallel sorting Multisets algorithms are efficient and they obtain good speedup and scalability.

5 citations

Proceedings ArticleDOI
05 Jul 2009
TL;DR: Experimental results indicate that the merge sort adaptation is faster than other sort algorithms proposed for the SPU as well as faster than the authors' SPU adaptations of shaker sort and brick sort.
Abstract: We adapt merge sort for a single SPU of the Cell Broadband Engine. This adaptation takes advantage of the vector instructions supported by the SPU. Experimental results indicate that our merge sort adaptation is faster than other sort algorithms (e.g., AA sort, Cell sort, quick sort) proposed for the SPU as well as faster than our SPU adaptations of shaker sort and brick sort. An added advantage is that our merge sort adaptation is a stable sort whereas none of the other sort adaptations is stable.

4 citations


Cites methods from "Efficient sorting algorithms for th..."

  • ...Regardless of whether we sort large data sets using the hierarchical strategy of [5] or the master-slave strategy of[13],...

    [...]

Proceedings ArticleDOI
18 Dec 2010
TL;DR: The experimental results on the heterogeneous cluster with multi-core computers show that the presented parallel sorting Multisets algorithm is efficient and it obtains good speedup and scalability.
Abstract: To sort efficiently Multisets in parallel on the heterogeneous multi-core clusters that the nodes have different amount of processing cores, different computing and communication capabilities and distinct size of main memory, a novel a periodic multi-round data distribution model is presented and a parallel sorting algorithm for Multisets is designed by using the characteristics of multi-threading technology and multi-level caches on multi-core architectures. The first round scheduling assigns data blocks into the slave multi-core nodes according to the given distribution order, and the other rounds scheduling will distribute data blocks into the slave multi-core nodes by First Request First Distribution (FRFD) strategy to ensure that each slave multi-core node can receive the next required data block before it has sorting the current data block in its own main memory. The experimental results on the heterogeneous cluster with multi-core computers show that the presented parallel sorting Multisets algorithm is efficient and it obtains good speedup and scalability.

3 citations


Cites methods from "Efficient sorting algorithms for th..."

  • ...Six parallel algorithms for sorting integers were implemented on Cell processors and their speedups were analyzed in [9]....

    [...]

01 Jan 2013
TL;DR: This paper is on Sorting Algorithm which uses modified Selection Sort and modified Bubble sort and it contains explanation of procedural concept of algorithm along with implemented Algorithm.
Abstract: The paper is on Sorting Algorithm which uses modified Selection Sort and modified Bubble sort. It contains explanation of procedural concept of algorithm along with implemented Algorithm. It also contains calculation on Time complexity of algorithm and highlights the key benefits of using this sorting algorithm. General Terms Algorithm.

3 citations

References
More filters
01 Jan 2001
TL;DR: Here the authors haven’t even started the project yet, and already they’re forced to answer many questions: what will this thing be named, what directory will it be in, what type of module is it, how should it be compiled, and so on.
Abstract: Writers face the blank page, painters face the empty canvas, and programmers face the empty editor buffer. Perhaps it’s not literally empty—an IDE may want us to specify a few things first. Here we haven’t even started the project yet, and already we’re forced to answer many questions: what will this thing be named, what directory will it be in, what type of module is it, how should it be compiled, and so on.

6,547 citations


"Efficient sorting algorithms for th..." refers methods in this paper

  • ...Optimal algorithms like QUICK SORT and HEAP SORT whose run times match this lower bound can be found in the literature [4, 5]....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the Cell/B.E.E., or Cell Broadband Engine, processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.
Abstract: The Cell Broadband Engine™ (Cell/B.E.) processor is the first implementation of the Cell Broadband Engine Architecture (CBEA), developed jointly by Sony, Toshiba, and IBM. In addition to use of the Cell/B.E. processor in the Sony Computer Entertainment PLAYSTATION® 3 system, there is much interest in using it for workstations, media-rich electronics devices, and video and image processing systems. The Cell/B.E. processor includes one PowerPC® processor element (PPE) and eight synergistic processor elements (SPEs). The CBEA is designed to be well suited for a wide variety of programming models, and it allows for partitioning of work between the PPE and the eight SPEs. In this paper we show that the Cell/B.E. processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.

401 citations


"Efficient sorting algorithms for th..." refers background in this paper

  • ...The Cell BE is made up of one 64-bit Power Processor Element (PPE), 8 specialized coprocessors called Synergistic Processing Elements (SPEs), a high-bandwidth bus interface and a high-speed memory controller, all on a single chip [7] (fig....

    [...]

Book ChapterDOI
01 Jan 2007
TL;DR: A transition from an art to a discipline has been a continually recurring theme during the ensuing years; for example, we read in 1970 of the first steps toward transforming the art of programming into a science as discussed by the authors.
Abstract: When Communications of the ACM began publication in 1959, the members of ACM's Editorial Board made the following remark as they described the purposes of ACM's periodicals [2]: “If computer programming is to become an important part of computer research and development, a transition of programming from an art to a disciplined science must be effected.” Such a goal has been a continually recurring theme during the ensuing years; for example, we read in 1970 of the “first steps toward transforming the art of programming into a science” [26]. Meanwhile we have actually succeeded in making our discipline a science, and in a remarkably simple way: merely by deciding to call it “computer science.”

222 citations

Proceedings ArticleDOI
26 Mar 2007
TL;DR: The findings show that the cell is an ideal candidate to tackle modern security needs: two processing elements alone, out of the eight available on one cell processor provide sufficient computational power to filter a network link with bit rates in excess of 10 Gbps.
Abstract: The security of your data and of your network is in the hands of intrusion detection systems, virus scanners and spam filters, which are all critically based on string matching. But network links are getting faster and faster, and string matching is getting more and more difficult to perform in real time. Traditional processors are not keeping up with the performance demands, whereas specialized hardware will never be able to compete with commodity hardware in terms of cost effectiveness, reusability and ease of programming. Advanced multi-core architectures like the IBM Cell Broadband Engine promise unprecedented performance at a low cost, thanks to their popularity and production volume. Nevertheless, the suitability of the cell processor to string matching has not been investigated so far. In this paper we investigate the performance attainable by the cell processor when employed for string matching algorithms based on deterministic finite-state automata (DFA). Our findings show that the cell is an ideal candidate to tackle modern security needs: two processing elements alone, out of the eight available on one cell processor provide sufficient computational power to filter a network link with bit rates in excess of 10 Gbps.

54 citations

01 Jan 1994
TL;DR: This paper identifies techniques that have been employed in the design of sorting and selection algorithms for various interconnection networks and considers both randomized and deterministic techniques.
Abstract: In this paper we identify techniques that have been employed in the design of sorting and selection algorithms for various interconnection networks. We consider both randomized and deterministic techniques. Interconnection Networks of interest include the mesh, the mesh with fixed and reconfigurable buses, the hypercube family, and the star graph. For the sake of comparisons, we also list PRAM algorithms.

36 citations