Quicksort

Quicksort bottleneck: Partitioning

Idea:

Redistribution to new distributed array
Split set of processes into those with small and large elements, recurse

Initial data distribution

Each process gets block of size $\small n/p$

Intermediate result

Each process has sorted its block but its size is not $\small n/p$

Result

Each process has sorted its block of size $\small n/p$

Algorithm 1) Using hypercube algorithm

Choose pivot, distribute to all processes
process 0 chooses pivot, then it distributed to all Bcast
(alternatively: each process chooses pivot and the median gets calculated withAllgather)

Local partition
Each process finds elements smaller, equal, larger than pivot element.

Global partition: paiwise (hypercube) exchange
- Even numbered process i sends “larger” group to process i+1 and receives “smaller” group from it.
- Odd numbered process i receives “larger” group from i-1 and sends back its own “smaller” group.

Split communicator, recurse
Then MPI_Comm_split with color rank%2 is used:
- Even numbered processes work on smaller pivot elements.
- Odd numbered processes work on larger pivot elements.
After $\small \log_2(p)$ steps each process is in communicator by itself.

Drawbacks:

load balance might be arbitrarily bad, one processor could do all the work if pivot is bad

Analysis: Assuming exact median pivot is found

$\small T(n,p) = O(\log(p)²) + O(n/p \cdot \log(n))$

Speedup: $\small O(p)$ for $\small n \gg p$ .

Split communicator, recurse $\dots$
Then MPI_Comm_split with color rank<middle?0:1 is used:
After $\small \log_2(p)$ steps each process is in communicator by itself: sort locally.

Analysis: same as above but better load balance