Matrix-Vector multiplication

Theory

$\small \vec w= M \cdot \vec v$

$\underbrace{ \begin{pmatrix} w_1 \\ w_2 \end{pmatrix} }_{\vec w} = \underbrace{\small \begin{pmatrix} a & b \\ c & d \end{pmatrix} }_{M} \cdot \underbrace{ \begin{pmatrix} e \\ f \end{pmatrix} }_{\vec v}$

There are 2 ways to calculate this:https://www.youtube.com/watch?v=ebdfJwHM5vo

Method 1)

$\small w[i] = \sum_{0 \leq j< n} M[i][j] \cdot V[i]$

$\small \begin{pmatrix} a & b \\ c & d \end{pmatrix} \cdot \begin{pmatrix} e \\ f \end{pmatrix} = \begin{pmatrix} \begin{pmatrix} a & b\end{pmatrix} \cdot \begin{pmatrix} e \\ f\end{pmatrix} \\ \\ \begin{pmatrix} c & d\end{pmatrix} \cdot \begin{pmatrix} e \\ f\end{pmatrix} \end{pmatrix}$

Method 2)

$\small \vec w[j] = \\ \quad ∑_{c_0≤i<c_1} M[j][i] \cdot \vec v[i] + \\ \quad ∑_{c_1≤i<c_2}M[j][i]\cdot \vec v’[i] + \\ \quad \dots + \\ \quad ∑_{c_{p-1}0≤i<c_p}M[j][i]\cdot\vec v[i]$

$\small \begin{pmatrix} a & b \\ c & d \end{pmatrix} \cdot \begin{pmatrix} e \\ f \end{pmatrix} = e \cdot \begin{pmatrix} a \\ c \end{pmatrix} + f \cdot \begin{pmatrix} b \\ d \end{pmatrix}$

and the same also applies to matrices but slightly adapted
$\small \begin{pmatrix} a & b & c\\ d & e & f \end{pmatrix} \cdot \begin{pmatrix} g & h\\ i & j \\ k & l \end{pmatrix} = \begin{pmatrix} g & h\end{pmatrix} \cdot \begin{pmatrix} a \\ d\end{pmatrix} + \begin{pmatrix} i & j\end{pmatrix} \cdot \begin{pmatrix} b \\ e\end{pmatrix} + \begin{pmatrix} k & l\end{pmatrix} \cdot \begin{pmatrix} c \\ f\end{pmatrix}$

The sequential algorithm is always in in $\small O(n^2)$ .

We now want to use collective operations to parallelize this.

Solution 1: `Allscatter`

Based on method 1

$\small \text{Tpar}(p,n) = O(n^2 /p+n)$ for $\small n > \log p$

where:

$\small O(n^2/p)$ for local multiplication

$\small O(n + \log p)$ for Allgather

Linear speedup for $\small p \leq n$ .

Parallel algorithm

Distribute $\small n/p$ rows of $\small M$ and $\small v$ to processors
Now process $\small k \in [0;~p-1]$ has some rows of $\small M$ and parts of $\small v$ .
$\small k\cdot (n/p)≤j'<(k+1)\cdot (n/p)$
where $\small j'$ is a subset of all rows $\small j\in [0;~n-1]$ .
Therefore $\small j := j'+k\cdot (n/p)$ .

Use Allgather to get back $\small v$ from pieces $\small v'$ on each process.
Now process has some rows of $\small M$ and enitre $\small v$ .

Calculate $\small M' \cdot v$ locally
$\small w’[j] = \sum_{0≤i<n}M’[j][i]\cdot v[i]$
where $\small j$ is different for each process (and they iterate through $\small i$ locally).

Solution 2: `Reduce_scatter`

Based on method 2