What is the function of a multi-processor?

Multi-processorBy adding processing logic to perform additions, subtractions, etc., on groups of adjacent words of a sorting memory one can implement a multi-processor.

What is the advantage of a sorting network over a normal crossbar?

It has the advantages over a normal crossbar of requiring less hardware (an n-input n-output switching network can be built with approximately ( 14 )n(log2n) 2 elements versus n2 in a normal crossbar) and of having a constant fan-in and a fan-out requirement on its elements.

What is the function of the multi-access content addressable memory?

Multi-access content addressable memoryBy adding facilities for shifting the bits within the words in the aforementioned memory di erent elds of the words can be brought into the more-signi cant positions which govern the ordering of the words.

How many inputs and outputs can be accommodated in a complete cycle?

While a complete cycle may be long in this memory (50-bit words at 100 nanoseconds/bit = 5 microseconds/recirculation = 10 microseconds/complete cycle) many inputs and outputs can be accommodated in each cycle.

What is the use of fast sorting capability?

Such fast sorting capability can be used to manipulate large sets of data quickly and solve some of the communications problems associated with large-scale computing systems.

What is the s+t output of the merging network?

The s + t outputs of the merging network present the s+t numbers of the merged lists in ascending order, c2; c2; :::; cs+t.A \\1 by 1" merging network is simply one comparison element.

(Open Access) Sorting networks and their applications (1968) | Kenneth E. Batcher

Q: What are the contributions in this paper?

This paper describes networks that have a fast sorting or ordering capability ( sorting networks or sorting memories ).

Q: What is the simplest way to sort a bitonic sequence?

Since any two monotonic sequences can be put together to form a bitonic sequence a network which rearranges a bitonic sequence into monotonic order (a bitonic sorter) can be used as a merging network.

Q: How many bits can be placed between different levels?

Parallel-input-parallel-output registers of 1024 bits each can be placed between certain levels to perform this task or the re-clocking may be incorporated within each comparison element with a pair of ipops on the outputs.

Q: What is the function of the adjacent word transfer?

The adjacent word transfer sends back signals over each path to signal each input and output line whether or not a connection has been established.

Q: What is the effect of doubling the size of a merge?

Doubling the size of a merge only increases the longest path by unity so the merging time increases slowly with the size of the network.

Sorting networks and their applications

K. E. BATCHER

Goodyear Aerospace Corporation

Akron, Ohio

INTRODUCTION

To achieve high throughput rates today's computers

perform several operations simultaneously. Not only

are I/O operations performed concurrently with com-

puting, but also, in multiprocessors, several computing

operations are done concurrently. A ma jor problem in

the design of such a computing system is the connect-

ing together of the various parts of the system (the

I/O devices, memories, pro cessing units, etc.) in such

a way that all the required data transfers can be ac-

commodated. One common scheme is a high-sp eed

bus which is time-shared by the various parts; speed of

available hardware limits this scheme. Another scheme

is a cross-bar switch or matrix; limiting factors here are

the amount of hardware (an m X n matrix requires m

X n cross-points) and the fan-in and fan-out of the

hardware.

This paper describ es networks that have a fast sort-

ing or ordering capability (sorting networks or sorting

memories). In (

)

(

+1) steps 2

words can be or-

dered. A sorting network can be used as a multiple-

input, multiple-output switching network. It has the

advantages over a normal crossbar of requiring less

hardware (an n-input n-output switching network can

be built with approximately (

)

(

log

)

elements ver-

sus

in a normal crossbar) and of having a constant

fan-in and a fan-out requirement on its elements. Thus,

a sorting network should be useful as a exible means

of tieing together the various parts of a large-scale com-

puting system. Thousands of input and output lines

can be accommo dated with a reasonable amount of

hardware.

Other applications of sorting memories are as a

switching network with buering, a multiaccess mem-

ory,amultiaccess content-addressable memory and as

amultiprocessor. Of course, the networks also maybe

used just for sorting and merging.

Comparison elements

The basic element of sorting networks is the com-

parison element (Figure 1). It receives two numbers

over its inputs, A and B, and presents their minimum

on its L output and their maximum on its H output.

H’

A’

H MAX(A,B)

L’

Figure 1 - Symbol for a comparison element

B’

MIN (A,B)

If the numbers in and out of the element are trans-

mitted serially most-signicant bit rst the element

has the state diagram of Figure 2. A reset input places

the element in the

state and as long as the

and

bits agree it remains in this state with its

outputs equal to its inputs. When the

and

bits

disagree the elementgoestothe

A<B

or the

A>B

state and remains there until the next reset input. In

the

A>B

state the output

equals the input

and

the output

equals the input

. In the

A<B

state

the opposite situation o ccurs.

(A < B)

H = B

L = A

( A = B)

H=L=A

significant-bit first)

(A > B)

H=A

L=B

RESET RESET

A=B

A=0

B=1

A=1

B=0

Figure 2 - State diagram for a serial comparison element (most-

307

308 Spring Joint Computer Conference, 1968

A serial comparison element can be implemented

with 13 NORS and can be put on one integrated cir-

cuit chip. When used in sorting networks each H and L

output will feed an A or B input of another elementso

the fan-out is constant regardless of network size; this

fact could be used to simplify the design of the chip.

With several of the currently available logic families

speeds of 100 nanoseconds/bit with a propagation de-

lay from inputs to outputs of 40 nanoseconds are easily

achieved.

Faster op eration can be attained by treating sev-

eral bits in parallel in each step with more complex

comparison elements.

Some of the applications described b elow will re-

quire \bi-directional" comparison elements. Besides

the

and

inputs and the

and

outputs there

are

and

inputs and

and

outputs (see Figure

1). If

A>B

then

and

,if

A<B

then

and

, otherwise

and

are left

undened. Information ows from left-to-right over

the solid lines and from right-to-left over the dotted

lines.

Odd-even merging networks

Merging is the pro cess of arranging two

ascendingly-ordered lists of numb ers into one

ascendingly-ordered list. Figure 3 shows a symbol for

an \s by t" merging network in which the s numbers of

one ascendingly-ordered list,

; :::; a

are presented

over s inputs simultaneously with the t numbers of an-

other ascendingly-ordered list

; :::; b

over another

t inputs. The s + t outputs of the merging network

present the s+t numbers of the merged lists in ascend-

ing order,

; :::; c

A\1by 1" merging network is simply one compari-

son element. Larger networks can be built by using the

iterative rule shown in Figure 4. An \s by t" merging

network can be built by presenting the o dd-indexed

numbers of the two input lists to one small merging

network (the o dd merge), presenting the even-indexed

number to another small merging network (the even

merge) and then comparing the outputs of these small

merges with a row of comparison elements.

The low-

est output of the o dd merge is left alone and becomes

the lowest number of the nal list. The

output of

the even merge is compared with the

output of

the o dd merge to form the 2

and 2

numbers of

the nal list for all applicable i's. This mayormay not

exhaust all the outputs of the o dd and even merges; if

an output remains in the odd or even merge it is left

alone and becomes the highest numb er in the nal list.

MERGE

Figure 3 - Symbol for an ‘‘s by t’’ merging network

s+t

Appendix A sketches the pro of of this iterative rule.

Figure 5 shows a \2 by2"anda\4by 4" merging net-

work constructed by this rule.

A\2

by2

" merging network constructed by this

rule uses p.2

+1 comparison elements. The longest

path goes through p+1 comparison elements and the

shortest path through one element. Doubling the size

of a merge only increases the longest path by unityso

the merging time increases slowly with the size of the

network.

Sorting Networks and Their Applications 309

s+t

Figure 4 - Iterative rule for odd-even merging networks

ODD

EVEN

MERGE

odd-even merging networks

Figure 5- Construction of ‘‘2 by 2’’ and ‘‘4 by 4’’

Bitonic sorters

Another way of constructing merging networks

from comparison elements is presented here. While

requiring somewhat more elements than the o dd-even

merging networks, they have the advantage of exibil-

ity (one network can accommo date input lists of var-

ious lengths) and of mo dularity ( a large network can

be split up into several identical modules).

We will call a sequence of numbers

bitonic

if it is

the juxtaposition of two monotonic sequences, one as-

cending, the other descending. We also say it remains

bitonic if it is split anywhere and the two parts in-

terchanged. Since any two monotonic sequences can

be put together to form a bitonic sequence a network

which rearranges a bitonic sequence into monotonic or-

der (a bitonic sorter) can b e used as a merging network.

Appendix B shows that if a sequence of 2n num-

bers,

; :::; a

is bitonic and if we form the two

n-number sequences:

min(

)

;

min(

)

; :::;

min(

) (1)

and

max(

)

;

max(

)

; :::;

max(

) (2)

that each of these sequences is bitonic and no number

of (1) is greater than anynumber of (2).

This fact gives us the iterative rule illustrated in

Figure 6. A bitonic sorter for 2n numb ers can be con-

structed from n comparison elements and two bitonic

sorters for n numbers. The comparison elements form

the sequences (1) and (2) and since each is bitonic they

are sorted by the two n-number bitonic sorters. Since

no number of (1) is greater than anynumber of (2) the

output of one bitonic sorter is the lower half of the sort

and the output of the other is the upp er half.

A bitonic sorter for 2 numbers is simply a compari-

son element and using the iterative rule bitonic sorters

for 2

numbers can be constructed for anyp. Figure

7 shows bitonic sorters for 4 numbers and 8 numbers.



A 2

-number bitonic sorter requires p levels of 2

elements each for a total of

elements. It can

act as a merging network for anytwo input lists whose

total length equals 2

Large bitonic sorters can be constructed from a

number of smaller bitonic sorters; for instance, a 16-

number bitonic sorter can be constructed from eight

4-number bitonic sorters, as shown in Fig. 8. This

allows large networks to b e built of standard mo dules



Readers may recognize the similaritybetween the top ologies of the bitonic sort and the fast-fourier-transform.

310 Spring Joint Computer Conference, 1968

of convenient size.

n-1

n-2

Figure 6-Iterative rule for bitonic sorters

2n-1

2n-2

n+3

n+2

n+1

n-ITEM

BITONIC

SORTER

2n-1

2n-2

n+3

n+2

n+1

n-1

n-2

IS BITONIC

Sorting networks

A sorter for arbitrary sequences can b e constructed

from o dd-even merges or bitonic sorters using the well-

known sorting-by-merging scheme: The numbers are

combined two at a time to from ordered lists of length

two; these lists are merged two at a time to form or-

dered lists of length four, etc. until all numbers are

merged into one ordered list.

To sort 2

numbers using odd-even merges requires

comparison elements followed by 2

\2-by-2"

merging networks followed by 2

\4-by-4" merging

networks, etc,. etc. The longest path will go through

(

)

(

+ 1) elements and the shortest path through p

elements. The network requires (

+ 4)2

comparison elements.

To sort 2

numbers using bitonic sorters requires

(

)

(

+1) levels each with 2

elements for (

elements. Each path go es through (

)

(

+1)

levels.

8 numbers

Figure 7- Construction of bitonic sorters for 4 numbers and for

4-number bitonic sorters

Figure 8- A 16 number bitonic sorter constructed from eight

A sorter of 1024 numbers will have 55 levels and

24,063 elements with o dd-even merges or 28,160 el-

ements with bitonic sorters. With a 40 nanosecond

propagation delayperlevel the total delayis2.2 mi-

croseconds. Serial transmission of the bits would re-

quire about this much time between successive bits of

Sorting Networks and Their Applications 311

the numb ers unless re-clo cking occurs within the net-

work. Parallel-input-parallel-output registers of 1024

bits each can be placed between certain levels to per-

form this task or the re-clo cking may b e incorporated

within each comparison element with a pair of ip-

ops on the outputs. The latter scheme does not add

to the terminal count of the comparison element so

the cost of the added ip-ops on the comparison el-

ementchip is small. One can use anyofthefamiliar

techniques for driving shift registers such as the \A-B"

technique where successive levels are clo cked out-of-

phase with each other. With present circuit and wiring

techniques a bit rate of 10 megahertz may b e possible

with 50 nanosecond delay p er level (2.75 microsecond

delay from input to output of a 1024-word sorter).

With re-clo cking in the element and odd-even

merges extra elements are needed to balance the

unequal-length paths. Bitonic sorters do not have

this problem.

Applications

The fast sorting capability of these networks allows

their use in solving other problems where large sets of

data must be manipulated. Some of these applications

are sketched below.

Switching network

A sorting network can connect its input lines to its

output lines with any p ermutation. The connection is

made bynumbering the output lines in order and pre-

senting the desired output address for each input line at

the input. The sorting network sorts the addresses and

in the pro cess makes a connection from each input line

to its desired output line for the transmission of data.

Bi-directional paths will be obtained if bi-directional

comparison elements are used.

An alternative p ermuting network has been shown

in the recent literature

which has less elements [(

1)2

+1 versus (

+ 4)2

1 for permuting 2

items] but a more complex set-up algorithm.

Switching network with conict resolution

The aforementioned switching network assumes

each input wants a unique output line. In many ap-

plications conicts between inputs occur and must be

resolved by inhibiting conicting inputs. Figure 9

sketches an m-input, n-output network that performs

this task. Each input line inserts a word containing

the output address desired (or zero es if the line is in-

active), a control bit equal to 1 and a prioritynumber

into an m-item sorting network with bi-directional el-

ements. This orders the items so input items with the

same output address are grouped together and ordered

by their prioritynumber. The ordered set of m-input

items is merged with a set of n items, each containing

a xed output address and a control bit equal to 0.

At the right side of the m by n merge the m+n items

are in one ordered list; each address-inserter item will

be directly below any input items with the same ad-

dress. The adjacentword transfer network, lo oking at

the control bits, connects each address-inserter item to

the input item directly aboveit if one exists (the in-

put item with lowest prioritynumber is picked in each

case). The elements in the sort and the merge are bi-

directional so two-way paths are formed from input to

output. The adjacent word transfer sends back sig-

nals over each path to signal each input and output

line whether or not a connection has b een established.

Data can then be transmitted over each of the con-

nected input lines.

DESIRED OUTPUT 1 PRIORITY

OUTPUT ADDRESS 0 0 0

M-ITEM

SORTING

NETWORK

conflict resolution

CONTROL BIT

INPUT ITEM

ADDRESS_INSERTER ITEM

ADDRESS INSERTER

ADJACENT WORD TRANSFER

M+N

‘‘M BY N’’

MERGING NETWORK

N INPUT LINES

N OUTPUT LINES

Figure 9 - An m-input, n-output switching network with

Multi-access memory

Re-clocking delays in the comparison elements give

a sorting network some storage capability which can

be augmented if needed with shift registers on the out-

puts. When the output lines are fed back to the input

lines a recirculating self-sorting store is created (Fig-

ure 10). In each recirculation cycle word positions are

changed to keep the memory in order.

Inputs to the memory can b e made by breaking the

recirculation paths of some words and inserting new

words. To prevent destroying old information during

input we use the convention that words with all bits

equal to \one" are \empty" and contain no informa-

tion: these will automatically collect at the \high-end"

of memory where input lines can use them to insert new

words.

Outputs from the memory can be accommo dated

by reserving the most-signicant-bit (MSB) of each

word: \1" for normal words and \0" for words to be

outputted. Words for output will automatically col-

lect at the \low end" of memory where output lines

can read them. Selection of which words to output

is accommodated by reserving the least-signicant-bit

(LSB) of eachword; \1" for normal words and \0"

Sorting networks and their applications

Figures

Citations

A Survey of General-Purpose Computation on Graphics Hardware

Software protection and simulation on oblivious RAMs

A Survey of General-Purpose Computation on Graphics Hardware.

Billion-scale similarity search with GPUs

GPU Computing

References

On the Synthesis of Signal Switching Networks with Transient Blocking

Related Papers (5)

Parallel Processing with the Perfect Shuffle

The Art of Computer Programming

The Art of Computer Programming: Volume 3: Sorting and Searching

Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes

Access and Alignment of Data in an Array Processor

Frequently Asked Questions (16)

Q1. What are the contributions in this paper?

Q2. What are other applications of sorting memories?

Q3. What is the simplest way to sort a bitonic sequence?

Q4. How many bits can be placed between different levels?

Q5. What is the function of a multi-processor?

Q6. What is the function of the adjacent word transfer?

Q7. What is the effect of doubling the size of a merge?

Q8. What is the advantage of a sorting network over a normal crossbar?

Q9. What is the function of the multi-access content addressable memory?

Q10. How many inputs and outputs can be accommodated in a complete cycle?

Q11. What is the problem in the design of a computing system?

Q12. What is the function of the m-item sorting network?

Q13. What is the simplest way to build a s by t" merging network?

Q14. What is the use of fast sorting capability?

Q15. What is the s+t output of the merging network?

Q16. How many elements are needed to sort a number?