Bitonic merge
Bitonic merge / Even-odd merge (optional)
Drawback of merge with rank-based algorithms:
- reading elements at the same time (not allowed in EREW PRAMs).
- Inefficient when only elements must be merged on processors.
Bitonic sequence (Definition)
Sequence is bitonic if
either there is an index such that
or there is a cyclic shift such that the first case holds (first case holds if we just shift all elements to left or right)
Lemma
Let be a bitonic sequence with even length .
Ifis odd we can put the realin eitheror shifted→ the algorithm is then not oblivious anymore.
Let be two sequences of length :
This makes
- bitonic
- for all elements they contain
Proof of lemma
-
Any subsequence of a bitonic sequence is also bitonic.
are also (disjoint) subsequences of .
Concatinating them makes .
-
Contradiction: Assume that there is some
then
This contradicts that either is or is .
-
Any subsequence of a bitonic sequence is also bitonic.
Examples
visualization
a = a_0, a_1, a_2, a_3, a_4, a_5 ↓ ↓ ↓ ↓ ↓ ↓ 0 1 2 n/2 n/2+1 n/2+2 indices |____________________| |____________________| comparisions |____________________|
Example 1
Example 2 (cyclically shifted)
Ordering bitonic sequences
= oblivious merge
bitonic_merge(int a[], int n) {
if (n==1) {
return;
}// n is odd
int nn = n/2; // offset
int s = n/2; // size of (first half) a'if (n%2 == 1) {
nn++;
if (a[n/2] < a[n-1]) { // a[n/2] will be in a' and a[n-1] in a''
s++;
}
}for (i=0; i<n/2; i++) { // parallelizable
int mina, maxa;
mina = min(a[i], a[i+nn]);
maxa = max(a[i], a[i+nn]);
a[i] = mina; // (first half) a'
a[i+nn] = maxa; // (second half) a''
}
// recursive call of each half
bitonic_merge(a,s);
bitonic_merge(a+s,n-s);
}
Given sequence split into with the definition given above.
Then recursively order .
Hint: Our pseudocode is written in C.
Therefore in the last instruction
bitonic_merge(a+s, s-n);
we are passing the start index of the array
+s
and the length
s-n
.
Work
Theorem
On a shared-memory system a bitonic sequence of length can be sorted in
parallel steps
operations
time of
space of (this means it can be done in place)
The sorting is not stable.
When
call
bitonic_merge()
recursively until
Then merge subsequences of length sequentially on the processors.
Work per execution step is , therefore total work is .
Not work optimal.
Output is bitonic but may not be a power of two.
Add padding in the beginning out output (value: ) as a virtual element.
Comparator Networks
Comparator networks as a model for parallel sorting
Comparator min/max operation
Depth longest path from input to output
Size number of comparators/operations
Example: Bather’s bitonic sorting network
Depth
Size
Question: What is the minimal depth and size for sortingnumbers with a sorting network
Question remains unsolved.
Depth and size ? → No.
Size ? → Possible with AKS network but not practical.
→ Possible with EREW PRAM mergesort but not practical.