Bucket-sort
Sequential algorithm
Explanation:https://www.youtube.com/watch?v=ELrhrrCjDOA
Given: integers in range stored in array .
-
Count number of elements of each “key” (magnitude)
for (i=0; i<R; i++) bucket[i] = 0; // initialize range for (i=0; i<n; i++) bucket[A[i]]++; // count occurences of each value in range
-
Compute start index of each bucket with exclusive prefix-sums
Implementation
for (i=1; i<R; i++) bucket[i] += bucket[i-1]; // add previous value to this one // shift all entries to right for (i=R-1; i>0; i--) bucket[i] = bucket[i-1]; bucket[0] = 0;
Exscan(bucket);
-
Put sorted
into
for (i=0; i<n; i++) { int tmp = bucket[A[i]]; // value in prefixed bucket B[tmp] = A[i]; }
Now contains the sorted list.
- Finally: copy back to if necessary
Analysis
This implementation is stable (relative order of elements is preserved).
Parallel algorithm
Works just as above, but we do the counting and
Exscan
part distributedly.
integers distributed evenly across processes
per process
-
Find occurences of elements and store them in
B
(previously calledbucket
)
-
Call
Allreduce
can be optimized by using
MPI_BCast
MPI_Allreduce( B, // local array with number of occurences AllB, // distributed array with number of occurences (of all processes) R, // size equal to R (range) MPI_INT, MPI_SUM, // addition comm );
Now vector
AllB
contains the sizes of all buckets over all processes (stored in all processes).
-
Call
Exscan(B)
(distributedly)MPI_Exscan( B, // local array with number of occurences RelB, // output R, // size equal to R (range) MPI_INT, MPI_SUM, // addition comm );
RelB
is the relative position of every ranks elements in their buckets.It is different in every process and is calculated distributedly.
-
Call
Exscan(AllB)
(locally)Exscan(AllB)
j'
= number of local elements with keyA[j]
beforej
Local element
A[j]
needs to go to positionAllB[A[j]]+RelB[A[j]]+j'
(forA[j]>0
).
-
Compute number of elements to be sent to each process:
sendelts[i]
with
-
Call
Alltoall
all → all (make data of all processes available to eachother)
This opeartion transposes the data.
MPI_Alltoall( sendelts, 1, MPI_INT, // send recvelts, 1, MPI_INT, // receive comm );
-
Call
Alltoallv
Transpose again
MPI_Alltoallv( A,sendelts,sdispls,MPI_INT, // send C,recvelts, // receive …,comm );
-
Finally: Reorder elements from
C
back toA
If Analysis
If the implementation is stable then it is “Radixsort”