Compute

HPC on OzSTAR

Image generated with ChatGPT
slides by Lukas Steinwender

Components

Source: OzSTAR docs (2025/10)
Unit OzSTAR (2018) Ngarrgu Tindebeek (NT, 2023)
CPU 4140 11648
GPU 230 88
OS AlmaLinux 9 AlmaLinux 9

the OzSTAR docs are pretty comprehensive and good
you might even find some things that are useful outside of HPC

slides by Lukas Steinwender

Nodes

Source: OzSTAR docs (2025-10)
  • access points to the HPC
Node Type Application
farnakle1/farnakle2 LOGIN nodes (OzSTAR) user interaction
tooarrana1/tooarrana2 LOGIN nodes (NT) user interaction
Farnakle (john, bryan) HEAD nodes (OzSTAR) compute
Tooarrana (dave, gina) HEAD nodes (NT) compute
trevor cloud-compute outside communication
data-mover file transfer large file transfers
slides by Lukas Steinwender

SSH: Accessing OzSTAR

  • Secure SHell
  • protocoll used for connecting to remote hosts
ssh user@<host>
  • remember this command?
quota
  • in case quota are exceeded or you need specific custom software
#uploading big data to OzStar
rsync -avPxH --no-g --chmod=Dg+s <path/to/files> <username>@data-mover01.hpc.swin.edu.au:<path/to/remote/destination>
slides by Lukas Steinwender

Modules

  • prepackaged stacks of software
  • can be loaded and used
  • some modules might depend on others
    • loading order matters!
module spider [<pattern>]   #list all modules (following <pattern>)
module load <module name>   #load a module
module list                 #show loaded modules
module avail                #list available modules
module purge                #unload all modules                                                 
#relevant modules for Cn5
module purge
module load gcc/12.2.0          #compiler (required for python)
module load python/3.11.2-bare  #minimal installation of python
module load ipython/9.3.0       #for interactive computing
# module load python-scientific/3.13.1-foss-2025a #scientific python packages                    
module load openmpi/4.1.5       #message passing interface
slides by Lukas Steinwender

Let's Set Up a .bashrc on OzSTAR!

  • adding commonly used modules
  • adding aliases
slides by Lukas Steinwender

Screen Sessions

screen -S <name>    #start new session with name `name`                                     
screen -R <name>    #connect if existing, else create new
screen -r <name>    #connect to existing screen
screen -ls          #list active screen sessions
screen -d <name>    #force detach running session
screen -X -S <name> #force kill a session

even if you're logged out, your screen keeps running

slides by Lukas Steinwender

SLURM (Simple Linux Utility for Resource Management)

By Unknown author - Own work; traced from PNG images at https://slurm.net/branding/ (EPS files mentioned on this website are unavailable), GPL, https://commons.wikimedia.org/w/index.php?curid=49374769
  • HPC resource manager
  • highly customizable (by admins)
  • mangages
slides by Lukas Steinwender

SLURM Commands

sbatch <path/to/slurm/script.sh>                                #launch a job
srun <path/to/slurm/script.sh>                                  #launch parallel job
sinfo [-s] [-N] [-l]                                            #get information about cluster components
squeue [-u <username>]                                          #check the queue
scancel [<job_id>] [-u <username>] [-t PD]                      #kill a running job `-t PD`: cancel all pending jobs
sinteractive --time=0:20:00 --mem=16g --cpus-per-task=8 --x11   #launch interactive session
jobreport <job_id>                                              #current resource usage of job

Let's Play on OzSTAR!

slides by Lukas Steinwender

Resource Requests

  • every job requires knowledge of
    • memory: peak memory consumption
    • time: total time required for execution
    • cores: maximum number of cores required at the same time
    • temporary memory
  • estimating resources
    • memory
    • time
      • longest loop + buffer
      • I/O are detremental to time
slides by Lukas Steinwender

Resource Estimate

source .venv/bin/activate   #activate environment
mprof run content/session3_02_hpc_ozstar/01_resource_estimate.py
mprof plot -o "mprofile_plot.png"
slides by Lukas Steinwender

SLURM Scripts

  • check this repo for a more comprehensive template
#!/bin/bash

#SBATCH --mail-type=NONE #ALL

#SBATCH --job-name=myjob     #job name

##SBATCH --array=0-2                     #slurm array to execute multiple jobs at once
#SBATCH --output=./execlogs/%x_%a.out
#SBATCH --error=./execlogs/%x_%a.err

#SBATCH --ntasks=1                      #number of tasks per node
#SBATCH --mem=4G                        #total amount of memory per node (in case you are using slurm array)
#SBATCH --time=0-00:05:00               #time limit
##SBATCH --gres=gpu:2                    #request GPUs (amount of GPUs after colon)
##SBATCH --tmp=150GB                     #temporary memory (if large files are acessed, loads of files are read and write)

#load modules
module load python-scientific/3.13.1-foss-2025a #scientific python packages

#run your stuff
# cp /path/to/file.txt $JOBFS             #copy large files to temporary directory

source ~/<path2env>/bash/bin/activate   #activate environment
python3 <path2file.py>                  #run your files
deactivate                              #deactivate environment

# cp $JOBFS/path/to/output.txt /path/to/target/directory #don't forget to copy your results back                                      
slides by Lukas Steinwender

Action!

  1. Launch Your First Job!
slides by Lukas Steinwender

Ngarrgu Tindebeek: "knowledge of the void" in Woiwurrung (provided by Wurundjeri elders)

Tooarrana: endangered Australian animal

Farnakle: Australian slang for "wasting time or engaging in inconsequential activity that creates a false appearance of productivity"

`sinteractive` allows limited number of cores, limited amount of memory

OzSTAR: if you ask for <4GB memory, job will never get flagged