📎

Simple algorithm examples

Algorithm design using PRAM:

  • PRAM conflict variant must be determined
  • We assume there are always as many processors as convenient

    Can be based on input size p=f(n)\small p = f(n) or fixed.

Basics

Initializing array

Sequential:

O(n)\small O(n) operations

Parallel:

If a processor is available for each i then O(1)\small O(1) .

par (0 <= i < n) {
a[i] = i*i;
}

For loop

Sequential:

Total number of operations O(nlog2n)\small O(n \log_2 n)

The loop iterates O(log2n)\small O(\log_2n) times, each time executing O(n)\small O(n) instructions.

Parallel:

At most O(log2n)\small \lceil O(\log_2 n) \rceil parallel steps.

Each instruction in loop is O(1)\small O(1) .

for (k = 1; k < n; k <<= 1) { // k = k*2
par (0 <= i <= n) a[i] = i/k; // k is a constant in this context
}

Finding max in array

Sequential: Performn2\small n^2comparisions (a[i]\small a[i]vs.a[j]\small a[j]) and find maximum.

Finding maximum in array - V1

par (0<=i<n) b[i]=true; // stays true if maxpar (0<=i<n, 0<=j<n)
if (a[i] < a[j]) b[i]=false; // check if smaller entry exists
par (0<=i<n) if (b[i]) x=a[i];

Complexity

O(1)\small O(1) for each parallel instruction - each with n\small n , n2\small n^2 and n\small n processors.

Therefore p=n2\small p = n^2 .


On a CRCW PRAM with n\small n numbers, the max can be found in O(1)\small O(1) time steps and O(n2)\small O(n^2) operations in total.

This is a constant time algorithm with polynomial resources (operations, processors).

Finding maximum in array - V2

concurrent read, exclusive write

num = n;while (num > 1) {
half = ⌈num / 2⌉par (0 <= i < half) {
if (i + half < num) a[i] = max(a[i], a[i + half]); // boundary check
}num = half;
}

Complexity

The while loop performs O(log2n)\small O(\log_2 n) iterations because in each iteration we half num\small \texttt{num} until num1\small \texttt{num} \leq 1

num:=half:=num/2 \small \texttt{num} :=\texttt{half} := \lceil \texttt{num}/ 2\rceil  until num1\small \texttt{num} \leq 1

In the parallel part of the loop we perform half\small \texttt{half} operations at once O(1)\small O(1) with half\small \texttt{half} processors.

Therefore in total we have O(log2n)\small O(\log_2n)

Number of (sequential) operations

The while loop performs O(log2n)\small O(\log_2 n) iterations, each with a parallel part.

In the parallel part of the loop we perform half\small \texttt{half} operations:

half=n/2,n/4,n/8,\small \texttt{half} = n/2,~n/4,~ n/8,~…

That means in total we do this:

i=2log2(n)n/iO(n)\small \sum_{i=2}^{\log_2 (n)}~n/i \in O(n)


On a CREW (but also EREW if modified) PRAM with n\small n numbers, the max can be found in O(logn)\small O(\log n) time steps and O(n)\small O(n) operations.

Finding maximum in array - V3

This time we are wasting processors.

num = n;while (num > 1) {
half = ⌈num / 2⌉par (0 <= i < n) { // n instead of half
if (i + half < num) a[i] = max(a[i], a[i + half]); // boundary check
}num = half;
}

On a CREW PRAM with n\small n numbers, the max can be found in O(logn)\small O(\log n) time steps, and O(nlogn)\small O(n \log n)operations in total using n\small nprocessors .

Finding maximum in array - V4

This time we are also using redundant processors for the parallel part.

num = n;while (num > 1) {
half = ⌈num / 2⌉par (0 <= i < num) {
for (j = i; j < i+(num/p); j++) {
if (j + half < num) a[i] = max(a[j], a[j + half]); //same as older version
}
}num = half;
}

On a CREW PRAM with n\small n numbers, the max can be found in O((n/p)logn)\small O((n/p)\log n) time steps and O(n)\small O(n) operations in total.

Matrix-Matrix multiplication

Matrix-Matrix multiplication

Calculate with nested parallelism

Input:

(n×l)\small (n \times l) matrix A\small A

(l×m)\small (l \times m) matrix B\small B

Output:

(n×m)\small (n \times m) matrix C\small C

C=AB\small C = A \cdot B

C[i,j]=A[i,k]B[k,j]\small C[i, j]=\sum A[i, k] \cdot B[k, j]

par (0<=i<n) {
par (0<=j<m) {
C[i,j] = 0;
for (k=0; k<l; k++) {
C[i,j] += A[i,k]*B[k,j];
}
}
}

On a CREW PRAM, the multiplication can be done in O(l)\small O(l) time steps and O(nml)\small O(nml) operations in total, using nm\small n\cdot m processors.