Bitonic merge

Bitonic merge / Even-odd merge (optional)

Drawback of merge with rank-based algorithms:

  • reading elements at the same time (not allowed in EREW PRAMs).
  • Inefficient when only O(p)\small O(p) elements must be merged on p\small p processors.

Bitonic sequence (Definition)

Sequence (a0,a1,,an1)\small (a_0, a_1, \dots ,a_{n-1}) is bitonic if

either there is an index i\small i such that

a0a1ai\small a_0 \leq a_1 \leq \dots \leq a_i

ai+1ai+2an1 \small a_{i+1} \geq a_{i+2} \geq \dots \geq a_{n-1} 

or there is a cyclic shift such that the first case holds (first case holds if we just shift all elements to left or right)

Lemma

Let a=(a0,a1,,an1)\small a = (a_0, a_1, \dots ,a_{n-1}) be a bitonic sequence with even length n\small n .

Ifn\small nis odd we can put the realan1\small a_{n-1}in eithera\small a'or shifteda\small a''→ the algorithm is then not oblivious anymore.

Let a,a\small a', a'' be two sequences of length n/2\small n/2 :

a=(min(a0,an/2),min(a1,an/2+1),,min(an,an1)) \footnotesize a' = (\min(a_0, a_{n/2}),~\min(a_1, a_{n/2+1}),~\dots,~\min(a_n, a_{n-1})) 

a=(max(a0,an/2),max(a1,an/2+1),,max(an,an1)) \footnotesize a'' = (\max(a_0, a_{n/2}),~\max(a_1, a_{n/2+1}),~\dots,~\max(a_n, a_{n-1})) 

This makes a,a\small a', a''

  1. bitonic
  1. aa\small a' \leq a'' for all elements they contain

  • Proof of lemma
    1. Any subsequence of a bitonic sequence is also bitonic.

      a,a\small a', a'' are also (disjoint) subsequences of a\small a .

      Concatinating them makes a\small a .

    1. Contradiction: Assume that there is some ai>aj\small a'_i > a''_j

      ai=min(ai,ai+n/2)=ai \small a'_i = \min (a_i,~a_{i+n/2}) = a_i 

      then ai=ai>aj\small a'_i = a_i> a''_j

      This contradicts that either ai\small a'_i is min\small \min or aj\small a''_j is max\small \max .

Examples

  • visualization

    n=6\small n = 6

    a = a_0,   a_1,   a_2,   a_3,   a_4,   a_5
    ↓      ↓      ↓      ↓      ↓      ↓
    0      1      2     n/2   n/2+1  n/2+2     indices
    |____________________|
    |____________________|              comparisions
    |____________________|
  • Example 1

    a=(1,1,2,3,4,7,7,6,5,4,4,3)=(1,1,2,3,4,7)(7,6,5,4,4,3) \small a = (1,1,2,3,4,7,7,6,5,4,4,3) = (1,1,2,3,4,7)~||~(7,6,5,4,4,3) 

    a=(1,1,2,3,4,3)\small a' = (1,1,2,3,4,3)

    a=(7,6,5,4,4,7)\small a'' = (7,6,5,4,4,7)

  • Example 2 (cyclically shifted)

    a=(3,4,7,7,6,5,4,4,3,1,1,2)=(3,4,7,7,6,5)(4,4,3,1,1,2) \small a = (3,4,7,7,6,5,4,4,3,1,1,2) = (3,4,7,7,6,5)~||~(4,4,3,1,1,2) 

    a=(3,4,3,1,1,2)\small a' = (3,4,3,1,1,2)

    a=(4,4,7,7,6,5)\small a'' = (4,4,7,7,6,5)

Ordering bitonic sequences

= oblivious merge

bitonic_merge(int a[], int n) {
if (n==1) {
return;
}// n is odd
int nn = n/2;                 // offset
int s = n/2;	                // size of (first half) a'if (n%2 == 1) {
nn++;
if (a[n/2] < a[n-1]) {      // a[n/2] will be in a' and a[n-1] in a''
s++;
}
}for (i=0; i<n/2; i++) {       // parallelizable
int mina, maxa;
mina = min(a[i], a[i+nn]);
maxa = max(a[i], a[i+nn]);
a[i] = mina;                // (first half) a'
a[i+nn] = maxa;             // (second half) a''
}
// recursive call of each half
bitonic_merge(a,s);
bitonic_merge(a+s,n-s);
}

Given sequence a\small a split into a,a\small a', a'' with the definition given above.

Then recursively order a,a\small a', a'' .

Hint: Our pseudocode is written in C.

Therefore in the last instruction bitonic_merge(a+s, s-n); we are passing the start index of the array +s and the length s-n .

Work

Wpar(1)O(1)\small \text{Wpar}(1) \in O(1)

Wpar(n)O(n)+2Wpar(n/2)=O(nlogn) \small \text{Wpar}(n) \in O(n) + 2\cdot \text{Wpar}(n/2) = O(n \log n) 


Wpar(n)=n+2(n/2log2(n/2))=n+nlog2(n/2)=n+nlog2nn=nlog2n \small \text{Wpar}(n) = n + 2\cdot (n/2 \cdot \log_2(n/2)) = n+ n \log_2(n/2) = n + n \log_2 n - n = n \log_2 n 

Wpar(n)=nlog2n\color{pink}\small \text{Wpar}(n) = n \log_2 n

Theorem

On a shared-memory system a bitonic sequence of length n\small n can be sorted in

log2n\small \lceil \log_2n\rceil parallel steps

O(nlogn)\small O(n \log n) operations

time of O((nlogn)/p+logp)\small O((n \log n) / p + \log p)

space of O(n)\small O(n)(this means it can be done in place)

The sorting is not stable.

When n>p\small n> p

call bitonic_merge() recursively until

n/2k=n/pk=log2p \small n/2^k = n/p \quad \Longleftrightarrow \quad k = \log_2p 

Then merge subsequences of length n/p\small n/p sequentially on the p\small p processors.

Work per execution step is O(n)\small O(n) , therefore total work is O(nlogp)\small O(n \log p) .

Wpar(n)=O(nlogp)\color{pink}\small \text{Wpar}(n) = O(n \log p)

Not work optimal.

Output is bitonic but n+m\small n+m may not be a power of two.

Add padding in the beginning out output (value: \small -\infin ) as a virtual element.

Comparator Networks

Comparator networks as a model for parallel sorting

Comparator min/max operation

Depth longest path from input to output

Size number of comparators/operations

Example: Bather’s bitonic sorting network

Depth O(log(n)2)\small O(\log (n)^2)

Size O(nlog(n)2)\small O(n \cdot \log (n)^2)

Question: What is the minimal depth and size for sortingn\small nnumbers with a sorting network

Question remains unsolved.

Depth O(logn)\small O(\log n) and size O(nlogn)\small O(n \log n) ? → No.

Size O(nlogn)\small O(n \log n) ? → Possible with AKS network but not practical.

O((nlogn)/p+logn)\small O((n \log n) / p + \log n) → Possible with EREW PRAM mergesort but not practical.