Clik
https://en.m.wikipedia.org/wiki/Cilk
https://cilk.mit.edu/programming/
Basics
Clik
C/C++ extension for task-parallel programming.
Efficient work-stealing runtime for strict or fully-strict DAGs.
Deprecated since 2018
Extensions
clik_spawn
spawn tasks, as
#pragma omp task
Instructions afterclik_spawn
are calledcontinuation.
clik_sync
wait for completion of children, as
#pragma omp taskwait
clik_for
loop tasks, as
#pragma omp for
Examples
For both MergeSort and QuickSort there is a bottleneck with (for the partitioning or merging part) that limits the speed-up.
For both of them
We want to improve that part.
MergeSort and merging
void mergesort(int x[], int n) { if (n==1) return;cilk_spawn mergesort(x,n/2); // left half cilk_spawn mergesort(x+n/2,n-n/2); // right half cilk_sync; parmerge(x,n/2,x+n/2,n-n/2,y); parcopy(y,n,x); // y --> x }
- Merge function: data parallel with binary search →
see 2nd lecture
Complexity, Parallelism
Complexity of ranking step: .
Overall complexity does not change with loop parallelization (because it has overhead).
Parallelism:
// data parallel loop for (i=0; i<chunks; i++) { rankA[i] = binsearch(A[i*n/chunks],B,m); rankB[i] = binsearch(B[i*m/chunks],A,n); }... // use ranks to determine which blocks from A and B to merge
- Merge function: Divide and conquer
This is a general function that takes 2 array inputs of arbitrary length (it’s also used in the example above, but there both halves always have the same length).
Complexity, Work, Parallelism
Complexity
Work
Parallelism
// called with: x, n/2, x+n/2, n-n/2, x void parmerge(int A[], int n, int B[], int m, int C[]) { if (n==0) { copy(B,m,C); return; } if (m==0) { copy(A,n,C); return; } if (n>=m) { int s = binsearch(A[n/2],B,m); // rank A[n/2] in B C[(n/2)+s] = A[n/2];// recursive call cilk_spawn parmerge(A, (n/2), B, s, C); cilk_spawn parmerge(A+(n/2)+1, n-(n/2)-1, B+s, m-s, C+(n/2)+s+1); } else { ... } cilk_sync; // not necessary, in Cilk implicitly done }
- Merge function: data parallel with binary search →
see 2nd lecture
QuickSort and partitioning
void QuickSort(int x[],int n) { if (n<=1) return; pivot = choosepivot(x,n); ix = partition(x,n,pivot); // partition in O(n)cilk_spawn QuickSort(x,ix); cilk_spawn QuickSort(x+ix+1,n-1-ix); // no cilk_sync; -> we synchronize in main }
Partition function:
Using prefix-sums index computation (compaction, see prefix sums lecture)
int partition(int x[], int n, int pivot) { ...// data parallel loop for (i=0; i<n; i++) { if (x[i] < pivot) { index[i] = 1; mark[i] = true } else { index[i] = 0; mark[i] = false; } }Exscan(index,n);for (i=0; i<n; i++) { if (mark[i]) { compacted[index[i]] = x[i]; } }... // elements larger-equal to pivot }