seeš
Matrix-Vector multiplication
for detailed explaination
(adābeācfā)ā
(gikāhjlā)=(gāhā)ā
(adā)+(iājā)ā
(beā)+(kālā)ā
(cfā)
Sequential solution in
O(n³)
Assume
A,B,C
are
nĆn
matrices.
and also
nā«p,pā£n
MPI processes organised into
pāĆpā
matrices through
MPI_Cart_create
.
Each process
(i,j)
contains submatrices
Aā²,Bā²,Cā²ļ»æ

There are
pā
communicators for rows
pā
communicators for columns
Folklore: Blockwise algorithm
Analysis
Work:
O(n3/p+n2/pā)
Communication:
2ā
(
lo
g2ā(pā)ā
α+βā
(pāā1)ā
n2/p)
Space:
pāā
n²
very inefficient - increases by factor
pā
over sequential algorithm
Speedup:
p
when
pāO(n2)
q=pā
Process
(i,j)
computes
Ci
j
āā²ā=(Ai,0ā²ā,Ai,1ā²ā,ā¦Ai,qā²ā)ā
(B0,jā²ā,B1,jā²ā,ā¦,Bq,jā²ā)
Algorithm
-
Get data
Allgather
Ai,āā
on rows
Allgather
Aā,
j
ā
on columns
-
Locally compute (sequentially)
Ci,
j
ā=
ā
1=0pāā1āAi,1āā
B1,
j
ā

Scalable Universal Matrix-Multiplication Algorithm (SUMMA)
Analysis
Work:
O(n3/p+n2/pā)
Communication:
2ā
pāā
(
lo
g2ā(pā)ā
α+βā
n2/p)
Space:
2ā
n²/p
Speedup:
p
when
pāO(n2)
Idea: Pipelining
Allgather
operations
Ci,jā
gets computed in
pā
communication rounds.
Main algorithm
In round
lā[0;pāā1]
we
-
broadcoast
Ai,1ā
in the row communicator
-
broadcoast
Bi,
j
ā
in the column communicator
-
Update
CĀ +=Ā Aā
B