Clik

Basics

Clik

C/C++ extension for task-parallel programming.

Efficient work-stealing runtime for strict or fully-strict DAGs.

Deprecated since 2018

Extensions

clik_spawn spawn tasks, as #pragma omp task

Instructions afterclik_spawnare calledcontinuation.

clik_sync wait for completion of children, as #pragma omp taskwait

clik_for loop tasks, as #pragma omp for

Examples

For both MergeSort and QuickSort there is a bottleneck with $\small O(n)$ (for the partitioning or merging part) that limits the speed-up.

For both of them

$\small W(n) = O(n \log n)$

$\small T(n) = n$

We want to improve that part.

MergeSort and merging
```
void mergesort(int x[], int n) {
if (n==1) return;cilk_spawn mergesort(x,n/2); // left half
cilk_spawn mergesort(x+n/2,n-n/2); // right half
cilk_sync;
parmerge(x,n/2,x+n/2,n-n/2,y);
parcopy(y,n,x); // y --> x
}
```
1. Merge function: data parallel with binary search → see 2nd lecture
  - Complexity, Parallelism
    Complexity of ranking step: $\small O(\log \text{chunks}) + O(\log n + \log m)$ .
    Overall complexity does not change with loop parallelization (because it has overhead).
    Parallelism: $\small O(n/\log n)$
```
// data parallel loop
for (i=0; i<chunks; i++) {
rankA[i] = binsearch(A[i*n/chunks],B,m);
rankB[i] = binsearch(B[i*m/chunks],A,n);
}... // use ranks to determine which blocks from A and B to merge
```
1. Merge function: Divide and conquer
  This is a general function that takes 2 array inputs of arbitrary length (it’s also used in the example above, but there both halves always have the same length).
  - Complexity, Work, Parallelism
    Complexity $\small O(\log(n)^2)$
    Work $O(n)$
    Parallelism $\small O(n/ \log(n)^2)$
```
// called with:     x,   n/2,   x+n/2, n-n/2,       x
void parmerge(int A[], int n, int B[], int m, int C[]) {
if (n==0) {
copy(B,m,C); return;
}
if (m==0) {
copy(A,n,C); return;
}
if (n>=m) {
int s = binsearch(A[n/2],B,m); // rank A[n/2] in B
C[(n/2)+s] = A[n/2];// recursive call
cilk_spawn parmerge(A, (n/2), B, s, C);
cilk_spawn parmerge(A+(n/2)+1, n-(n/2)-1, B+s, m-s, C+(n/2)+s+1);
} else { ... }
cilk_sync; // not necessary, in Cilk implicitly done
}
```

QuickSort and partitioning

void QuickSort(int x[],int n) {
if (n<=1) return;
pivot = choosepivot(x,n);
ix = partition(x,n,pivot); // partition in O(n)cilk_spawn QuickSort(x,ix);
cilk_spawn QuickSort(x+ix+1,n-1-ix);
// no cilk_sync; -> we synchronize in main
}

Partition function:

Using prefix-sums index computation (compaction, see prefix sums lecture)

int partition(int x[], int n, int pivot) {
...// data parallel loop
for (i=0; i<n; i++) {
if (x[i] < pivot) {
index[i] = 1; mark[i] = true
} else {
index[i] = 0; mark[i] = false;
}
}Exscan(index,n);for (i=0; i<n; i++) {
if (mark[i]) {
compacted[index[i]] = x[i];
}
}... // elements larger-equal to pivot
}