Matrix-Matrix multiplication

see📎 Matrix-Vector multiplication for detailed explaination

$\small \begin{pmatrix} a & b & c\\ d & e & f \end{pmatrix} \cdot \begin{pmatrix} g & h\\ i & j \\ k & l \end{pmatrix} = \begin{pmatrix} g & h\end{pmatrix} \cdot \begin{pmatrix} a \\ d\end{pmatrix} + \begin{pmatrix} i & j\end{pmatrix} \cdot \begin{pmatrix} b \\ e\end{pmatrix} + \begin{pmatrix} k & l\end{pmatrix} \cdot \begin{pmatrix} c \\ f\end{pmatrix}$

Sequential solution in $\small O(n³)$

Assume $\small A,B,C$ are $\small n\times n$ matrices.

and also $\small n \gg p, \quad p|n$

MPI processes organised into $\small \sqrt p \times \sqrt p$ matrices through MPI_Cart_create .

Each process $\small (i,j)$ contains submatrices $\small A', B', C'$

There are

$\small \sqrt p$ communicators for rows

$\small \sqrt p$ communicators for columns

Folklore: Blockwise algorithm

Analysis

Work: $\small O(n^3 /p+n^2 /\sqrt p)$

Communication: $\small 2\cdot (\log_2( \sqrt p) \cdot α + β \cdot (\sqrt{p}-1) \cdot n^2 /p)$

Space: $\small \sqrt p \cdot n²$

very inefficient - increases by factor $\small \sqrt p$ over sequential algorithm

Speedup: $\small p$ when $\small p \in O(n^2)$

$\small q = \sqrt p$

Process $\small (i,j)$ computes $\small C'_{i_j} = \left(A'_{i, 0}, A'_{i, 1}, \ldots A'_{i, q}\right) \cdot \left(B'_{0, j}, B'_{1, j}, \ldots, B'_{q, j}\right)$