Clik

https://en.m.wikipedia.org/wiki/Cilk

https://cilk.mit.edu/

https://cilk.mit.edu/programming/

Basics

Clik

C/C++ extension for task-parallel programming.

Efficient work-stealing runtime for strict or fully-strict DAGs.

Deprecated since 2018

Extensions

clik_spawn spawn tasks, as #pragma omp task

Instructions afterclik_spawnare calledcontinuation.

clik_sync wait for completion of children, as #pragma omp taskwait

clik_for loop tasks, as #pragma omp for

Examples

For both MergeSort and QuickSort there is a bottleneck with O(n)\small O(n) (for the partitioning or merging part) that limits the speed-up.

For both of them

W(n)=O(nlogn)\small W(n) = O(n \log n)

T(n)=n\small T(n) = n

We want to improve that part.


  • MergeSort and merging
    void mergesort(int x[], int n) {
    if (n==1) return;cilk_spawn mergesort(x,n/2); // left half
    cilk_spawn mergesort(x+n/2,n-n/2); // right half
    cilk_sync;
    parmerge(x,n/2,x+n/2,n-n/2,y);
    parcopy(y,n,x); // y --> x
    }

    1. Merge function: data parallel with binary search → see 2nd lecture
      • Complexity, Parallelism

        Complexity of ranking step: O(logchunks)+O(logn+logm) \small O(\log \text{chunks}) + O(\log n + \log m)  .

        Overall complexity does not change with loop parallelization (because it has overhead).

        Parallelism: O(n/logn)\small O(n/\log n)

      // data parallel loop
      for (i=0; i<chunks; i++) {
      rankA[i] = binsearch(A[i*n/chunks],B,m);
      rankB[i] = binsearch(B[i*m/chunks],A,n);
      }... // use ranks to determine which blocks from A and B to merge

    1. Merge function: Divide and conquer

      This is a general function that takes 2 array inputs of arbitrary length (it’s also used in the example above, but there both halves always have the same length).

      • Complexity, Work, Parallelism

        Complexity O(log(n)2)\small O(\log(n)^2)

        Work O(n)O(n)

        Parallelism O(n/log(n)2)\small O(n/ \log(n)^2)

      // called with:     x,   n/2,   x+n/2, n-n/2,       x
      void parmerge(int A[], int n, int B[], int m, int C[]) {
      if (n==0) {
      copy(B,m,C); return;
      }
      if (m==0) {
      copy(A,n,C); return;
      }
      if (n>=m) {
      int s = binsearch(A[n/2],B,m); // rank A[n/2] in B
      C[(n/2)+s] = A[n/2];// recursive call
      cilk_spawn parmerge(A, (n/2), B, s, C);
      cilk_spawn parmerge(A+(n/2)+1, n-(n/2)-1, B+s, m-s, C+(n/2)+s+1);
      } else { ... }
      cilk_sync; // not necessary, in Cilk implicitly done
      }

  • QuickSort and partitioning
    void QuickSort(int x[],int n) {
    if (n<=1) return;
    pivot = choosepivot(x,n);
    ix = partition(x,n,pivot); // partition in O(n)cilk_spawn QuickSort(x,ix);
    cilk_spawn QuickSort(x+ix+1,n-1-ix);
    // no cilk_sync; -> we synchronize in main
    }

    Partition function:

    Using prefix-sums index computation (compaction, see prefix sums lecture)

    int partition(int x[], int n, int pivot) {
    ...// data parallel loop
    for (i=0; i<n; i++) {
    if (x[i] < pivot) {
    index[i] = 1; mark[i] = true
    } else {
    index[i] = 0; mark[i] = false;
    }
    }Exscan(index,n);for (i=0; i<n; i++) {
    if (mark[i]) {
    compacted[index[i]] = x[i];
    }
    }... // elements larger-equal to pivot
    }