seeš
Matrix-Vector multiplication
for detailed explaination
(adābeācfā)ā
(gikāhjlā)=(gāhā)ā
(adā)+(iājā)ā
(beā)+(kālā)ā
(cfā)ļ»æ
Sequential solution in
O(nĀ³)ļ»æ
Assume
A,B,Cļ»æ
are
nĆnļ»æ
matrices.
and also
nā«p,pā£nļ»æ
MPI processes organised into
pāĆpāļ»æ
matrices through
MPI_Cart_create
.
Each process
(i,j)ļ»æ
contains submatrices
Aā²,Bā²,Cā²ļ»æ
There are
pāļ»æ
communicators for rows
pāļ»æ
communicators for columns
Folklore: Blockwise algorithm
Analysis
Work:
O(n3/p+n2/pā)ļ»æ
Communication:
2ā
(
lo
g2ā(pā)ā
Ī±+Ī²ā
(pāā1)ā
n2/p)ļ»æ
Space:
pāā
nĀ²ļ»æ
very inefficient - increases by factor
pāļ»æ
over sequential algorithm
Speedup:
pļ»æ
when
pāO(n2)ļ»æ
q=pāļ»æ
Process
(i,j)ļ»æ
computes
Ci
j
āā²ā=(Ai,0ā²ā,Ai,1ā²ā,ā¦Ai,qā²ā)ā
(B0,jā²ā,B1,jā²ā,ā¦,Bq,jā²ā)ļ»æ
Algorithm
-
Get data
Allgather
Ai,āāļ»æ
on rows
Allgather
Aā,
j
āļ»æ
on columns
-
Locally compute (sequentially)
Ci,
j
ā=
ā
1=0pāā1āAi,1āā
B1,
j
āļ»æ
Scalable Universal Matrix-Multiplication Algorithm (SUMMA)
Analysis
Work:
O(n3/p+n2/pā)ļ»æ
Communication:
2ā
pāā
(
lo
g2ā(pā)ā
Ī±+Ī²ā
n2/p)ļ»æ
Space:
2ā
nĀ²/pļ»æ
Speedup:
pļ»æ
when
pāO(n2)ļ»æ
Idea: Pipelining
Allgather
operations
Ci,jāļ»æ
gets computed in
pāļ»æ
communication rounds.
Main algorithm
In round
lā[0;pāā1]ļ»æ
we
-
broadcoast
Ai,1āļ»æ
in the row communicator
-
broadcoast
Bi,
j
āļ»æ
in the column communicator
-
Update
CĀ +=Ā Aā
Bļ»æ