Simple algorithm examples

Algorithm design using PRAM:

PRAM conflict variant must be determined

We assume there are always as many processors as convenient
Can be based on input size $\small p = f(n)$ or fixed.

Basics

Initializing array

Sequential:

$\small O(n)$ operations

Parallel:

If a processor is available for each i then $\small O(1)$ .

par (0 <= i < n) {
a[i] = i*i;
}

For loop

Sequential:

Total number of operations $\small O(n \log_2 n)$

The loop iterates $\small O(\log_2n)$ times, each time executing $\small O(n)$ instructions.

Parallel:

At most $\small \lceil O(\log_2 n) \rceil$ parallel steps.

Each instruction in loop is $\small O(1)$ .

for (k = 1; k < n; k <<= 1) { // k = k*2
par (0 <= i <= n) a[i] = i/k; // k is a constant in this context
}

Finding max in array

Sequential: Perform $\small n^2$ comparisions ( $\small a[i]$ vs. $\small a[j]$ ) and find maximum.

Finding maximum in array - V1

par (0<=i<n) b[i]=true; // stays true if maxpar (0<=i<n, 0<=j<n)
if (a[i] < a[j]) b[i]=false; // check if smaller entry exists
par (0<=i<n) if (b[i]) x=a[i];

Complexity

$\small O(1)$ for each parallel instruction - each with $\small n$ , $\small n^2$ and $\small n$ processors.

Therefore $\small p = n^2$ .

On a CRCW PRAM with $\small n$ numbers, the max can be found in $\small O(1)$ time steps and $\small O(n^2)$ operations in total.

This is a constant time algorithm with polynomial resources (operations, processors).

Finding maximum in array - V2

concurrent read, exclusive write

num = n;while (num > 1) {
half = ⌈num / 2⌉par (0 <= i < half) {
if (i + half < num) a[i] = max(a[i], a[i + half]); // boundary check
}num = half;
}

Complexity

The while loop performs $\small O(\log_2 n)$ iterations because in each iteration we half $\small \texttt{num}$ until $\small \texttt{num} \leq 1$

$\small \texttt{num} :=\texttt{half} := \lceil \texttt{num}/ 2\rceil$ until $\small \texttt{num} \leq 1$

In the parallel part of the loop we perform $\small \texttt{half}$ operations at once $\small O(1)$ with $\small \texttt{half}$ processors.

Therefore in total we have $\small O(\log_2n)$

Number of (sequential) operations

The while loop performs $\small O(\log_2 n)$ iterations, each with a parallel part.

In the parallel part of the loop we perform $\small \texttt{half}$ operations:

$\small \texttt{half} = n/2,~n/4,~ n/8,~…$

That means in total we do this:

$\small \sum_{i=2}^{\log_2 (n)}~n/i \in O(n)$

On a CREW (but also EREW if modified) PRAM with $\small n$ numbers, the max can be found in $\small O(\log n)$ time steps and $\small O(n)$ operations.

Finding maximum in array - V3

This time we are wasting processors.

num = n;while (num > 1) {
half = ⌈num / 2⌉par (0 <= i < n) { // n instead of half
if (i + half < num) a[i] = max(a[i], a[i + half]); // boundary check
}num = half;
}

On a CREW PRAM with $\small n$ numbers, the max can be found in $\small O(\log n)$ time steps, and $\small O(n \log n)$ operations in total using $\small n$ processors .

Finding maximum in array - V4

This time we are also using redundant processors for the parallel part.

num = n;while (num > 1) {
half = ⌈num / 2⌉par (0 <= i < num) {
for (j = i; j < i+(num/p); j++) {
if (j + half < num) a[i] = max(a[j], a[j + half]); //same as older version
}
}num = half;
}

On a CREW PRAM with $\small n$ numbers, the max can be found in $\small O((n/p)\log n)$ time steps and $\small O(n)$ operations in total.

Matrix-Matrix multiplication

Calculate with nested parallelism

Input:

$\small (n \times l)$ matrix $\small A$

$\small (l \times m)$ matrix $\small B$

Output:

$\small (n \times m)$ matrix $\small C$

$\small C = A \cdot B$

$\small C[i, j]=\sum A[i, k] \cdot B[k, j]$

par (0<=i<n) {
par (0<=j<m) {
C[i,j] = 0;
for (k=0; k<l; k++) {
C[i,j] += A[i,k]*B[k,j];
}
}
}

On a CREW PRAM, the multiplication can be done in $\small O(l)$ time steps and $\small O(nml)$ operations in total, using $\small n\cdot m$ processors.