Compute

Parallel Computation and GPU

Image generated with ChatGPT
slides by Lukas Steinwender

Embarassingly Parallel Problems

  • each task independent
  • shared memory
  • easy to parallelize
slides by Lukas Steinwender

Threading

slides by Lukas Steinwender

MPI (Message Passing Interface)

  • more complex
  • each process has its own memory
  • can scale across multiple machines
  • can adapt to more settings
  • Python: mpi4py
    • requires OpenMPI to be installed
    • done for you on OzSTAR
#running scripts with mpi
mpiexec -n <num processes> python3 <path/to/your/file.py>
flowchart LR
    main@{ shape: rounded, label: "Main"}
    worker1@{ shape: rounded, label: "Worker 1"}
    worker2@{ shape: rounded, label: "Worker 2"}
    workerN@{ shape: rounded, label: "Worker N"}
    out@{ shape: rect, label: "Output"}

    main -...->|assign work| worker1
    main -...->|assign work| worker2
    main -...->|assign work| workerN
    worker1 ---->|collect result| main
    worker2 ---->|collect result| main
    workerN ---->|collect result| main
    main ----->|combine results| out
slides by Lukas Steinwender

GPU Computing

Image Credit: Irfan et al. (2023)
  • GPU: highly specialized hardware
  • designed for embarassingly parallel problems
  • huge in Deep Learning
    • batch-wise gradient computation
  • each block has it's own memory (shared across block)
    • each thread within a block has it's own local memory
  • more specialized variants exist: TPU, PPU

software needs to be designed for GPU-computation!

slides by Lukas Steinwender

thread: each core can have several threads (each chrome tab is a thread)

TPU: Tensor Processing Unit

PPU: Physics Processing Unit