Models

Architecture models

Algorithm inmodel $\mapsto$ Implementation asprogram $\mapsto$ Execution of it in concretearchitecture

Computational model / Architecture model / Machine model

Theoretical construct for designing and analyzing algorithms.

formally describes

components of the processor (memory, communication, synchronization)

how they execute programs

at what costs (serves as execution-cost-model)

Bridging models

introduced as Valiant’s Bulk Synchronous Parallel BSP-Model.

should have enough resemblence to actual machines that the algorithms implemented with the model can be used to predict the performance to some extent.

Minimal requirements

If algorithm $\small A$ performs better than $\small B$ in the model,

then the implementation should also do so too on the applicable hardware.

Model gap

sequential computing

There are many bridging models, ie. RAM.

Predicted performance corresponds to real performance to some degree.

parallel computing

No single, agreed upon model.

Performance portability problem: No good “bridging” properties of models for the many paradigms, languages, architectures → we have to design, analyze, benchmark the parallel algorithms instead.

RAM model

RAM

= random access memory / von Neumann architecture.

standard sequential model - but there are alternatives (ie. turing machine)

Wrong asssumption in RAM

= ALU instructions and memory access take the same time.

In reality, memory access time is much slower (= von Neumann Bottleneck) .

We try to overcome it through the use of caches but they are complex.

PRAM model

PRAM

= parallel random access memory

a parallel computer-memory architecture model

Processors operate under the same clock (lock-step) : they all perform one instruction per clock tick

Wrong assumption in PRAM

= ALU instructions and memory access take the same time.

Even more inaccurate than in RAM because of multiple processors etc. (= Aggrevated von Neumann Bottleneck) .

We try to overcome it by increasing the memory bandwidth through prefetching in banked / non-monolithmemory for parallel instructions / vector operations.

UMA

= Uniform memory access

Access time to memory does not depend on processor-location.

Monolithic shared memory

Synchronous, lock-step

Communication/coordination through shared memory

Using multiple ports, memory network

ie. RAM, PRAM

NUMA

= Non-Uniform memory access

Access time to memory depends on processor-location.

Non-monolithic / banked shared memory

Asynchronous, no lock-step, no common lock

Synchronization through atomic operations and barriers.

Memory-network

ie. almost all real processors

NUMA with local memory

Some memory locations are closer to processor (faster access) than others

Local memory is not shared - only accessible by the processor to which it belongs

Distributed memory

Local memory + memory distributed over processors via network

Needs explicit communication

PRAM conflict resolution

What happens if several processors in the same time-step access the same memory location in the PRAM model?

EREW PRAM ❌ reading exclusive ❌ writing exclusive

CREW PRAM ✔️ reading concurrent ❌ writing exclusive

CRCW PRAM ✔️ reading concurrent ✔️ writing concurrent

When exclusive it’s the algorithm designer’s responsibility to prevent it and not the hardware’s.

CRCW PRAM has 3 theoretical write-conflict resolutions:

common conflicting processors must write same value

arbitrary one random write succeeds

priority a priority-scheme decides which write succeeds

Flynn’s taxonomy

Classifies instruction streams and data streams in computing systems.

Classifies architectures by:

Instructions carried out simultaniously

Data / words accessed simultaniously

SSID - single instruction, single data

single processor operates on single data per instruction

ie. sequential architecture like RAM

SIMD - single instruction, multiple data

single processor operates on multiple data per instruction

ie. traditional vector processor, SIMD-extensions, some GPUs

MISD - multiple instruction, single data

multiple processors and operate on a single stream of data

ie. some say there are none, some say pipelined architectures

MIMD - multiple instruction, multiple data

multiple processors operate on multiple per instruction

ie. PRAM, distributed memory architecture

Parallel programming models

= Programming paradigms and interfaces, libraries

parallel resources (processes, threads, tasks, ...) and granularity of parallelism

memory (shared, distributed, hybrid)

methods of synchronization and communication

All given programming models can be realized by any architecture but a better fit is more efficient.

Using Flynn’s taxonomy for programming models

SIMD: one instruction operates on many different elements

MIMD: different processes can execute different programs

SPMD: same program operates on multiple data (is a subcase of MIMD)

All code in this lecture will be SPMD

All processes must execute the same program but they can execute diffrent parts of the same program at the same time.