📎

Bucket-sort

Sequential algorithm

Explanation:https://www.youtube.com/watch?v=ELrhrrCjDOA

Given: n\small n integers in range [0,R1]\small [0, R-1] stored in array A\small A .

  1. Count number of elements of each “key” (magnitude)
    for (i=0; i<R; i++) bucket[i] = 0; // initialize range
    for (i=0; i<n; i++) bucket[A[i]]++; // count occurences of each value in range

  1. Compute start index of each bucket with exclusive prefix-sums
    • Implementation
      for (i=1; i<R; i++) bucket[i] += bucket[i-1]; // add previous value to this one
      // shift all entries to right
      for (i=R-1; i>0; i--) bucket[i] = bucket[i-1];
      bucket[0] = 0;
    Exscan(bucket);

  1. Put sorted A\small A into B\small B
    for (i=0; i<n; i++) {
    int tmp = bucket[A[i]]; // value in prefixed bucket
    B[tmp] = A[i];
    }

    Now B\small B contains the sorted list.

  1. Finally: copy B\small B back to A\small A if necessary

Analysis

This implementation is stable (relative order of elements is preserved).

Parallel algorithm

Works just as above, but we do the counting and Exscan part distributedly.

n\small n integers distributed evenly across p\small p processes

m=n/p\small m = n/p per process

  1. Find occurences of elements and store them in B(previously calledbucket)

  1. Call Allreduce

    can be optimized by usingMPI_BCast

    MPI_Allreduce(
    B, // local array with number of occurences
    AllB, // distributed array with number of occurences (of all processes)
    R, // size equal to R (range)
    MPI_INT, MPI_SUM, // addition
    comm
    );

    Now vector AllB contains the sizes of all buckets over all processes (stored in all processes).

  1. Call Exscan(B)(distributedly)
    MPI_Exscan(
    B, // local array with number of occurences
    RelB, // output
    R, // size equal to R (range)
    MPI_INT, MPI_SUM, // addition
    comm
    );

    RelB is the relative position of every ranks elements in their buckets.

    It is different in every process and is calculated distributedly.

  1. Call Exscan(AllB)(locally)
    Exscan(AllB)

    j' = number of local elements with key A[j] before j

    Local element A[j] needs to go to position AllB[A[j]]+RelB[A[j]]+j' (for A[j]>0 ).

    💡
    Now we have the required data to sort the total array and only have to redistribute the data.

  1. Compute number of elements to be sent to each process: sendelts[i] with i[0;p1]\small i\in [0;~p-1]

  1. Call Alltoall

    all → all (make data of all processes available to eachother)

    This opeartion transposes the data.

    MPI_Alltoall(
    sendelts,	1, MPI_INT, // send
    recvelts,	1, MPI_INT, // receive
    comm
    );

  1. Call Alltoallv

    Transpose again

    MPI_Alltoallv(
    A,sendelts,sdispls,MPI_INT, // send
    C,recvelts, // receive
    …,comm
    );

  1. Finally: Reorder elements from C back to A

If Analysis

If the implementation is stable then it is “Radixsort”