High Throughput Computing

Last updated on 2025-11-04 | Edit this page

Overview

Questions

What is High Throughput Computing?
How can I run lots of similar tasks at the same time?
What types of task are suitable for a job array?

Objectives

Be able to explain what HTC is
Be able to submit a job array to the scheduler
Be able to give an example of a task suitable for HTC

Overview

We have learned how to speed up a computational task by splitting the computation simultaneously over multiple cores. That was parallel computing, or HPC.

There is another type of parallel job, called High Throughput Computing (HTC), which uses a job array, to run a series of very similar tasks.

A job array can be used when tasks are independent of each other, and do not all need to run at once so the job scheduler can efficiently run one or more queued tasks as the requested computational resources become available.

This type of task is often referred to as “embarrassingly parallel”, and common examples include Monte Carlo simulations, parameter sensitivity analyses, or batch file processing.

You can think of this as splitting out the iterations of a loop into an array of independent tasks.

Submitting a job array

Let’s submit a “hello world” job array to get familiar with the basics. We will need another slurm SBATCH line to set up the job array. We use the -a option to indicate this is an array job, then specify a range of values for the array tasks to take, as shown in hello-jobarray.sh below.

BASH

#!/bin/bash
#SBATCH -p compute
#SBATCH -t 01:00
#SBATCH -a 1-5   # Set up a job array with 5 tasks

echo Hello from task ${SLURM_ARRAY_TASK_ID}

Recall the syntax for accessing the value of an environment variable is $VARIABLE or ${VARIABLE}, so we can use ${SLURM_ARRAY_TASK_ID} to use the task ID in the command we execute.

We then submit this as usual:

BASH

yourUsername@login:~$ sbatch hello-jobarray.sh

This sets up 5 jobs with the same base JobID but with different task IDs, so their slurm JobIDs take the form of JOBID_TASKID, which creates slurm log files slurm-JOBID_TASKID.log.

The contents of these files looks like

OUTPUT

Hello from task 1

OUTPUT

Hello from task 2

and so on.

So the key idea is that we use the task ID variable SLURM_ARRAY_TASK_ID to perform a similar command in each of the job array tasks.

Materials Science example

We’re going to look at an example written in python, which generates a folder structure and input files for the Lammps MD software to explore the bulk modulus for various empirical potentials designed to mimic aluminium.

For each of six potentials, there is a series of folders in which the lattice parameter of an FCC crystal is varied about its equilibrium value. Then Lammps is used to calculate the energy of each structure.

After this, there is some post processing for each potential, to harvest the energy as a function of the lattice parameter and then to use this to calculate the bulk modulus for each potential.

We can then generate a summary graph which compares the bulk moduli across the different potentials.

A typical approach for running Lammps on this combination of potentials and lattice parameters would be to iterate over the potentials in an outer loop, and over lattice scaling values in a nested (inner) loop. A bit like this:

PYTHON

for potential in potential_values:
  for scale in lattice_scaling_factors:
    [run lammps for this combination]

The downside to this nested loop approach to covering the parameter space is that each iteration is run sequentially i.e. has to wait for the previous one to finish before it can start.

If only there were a way to run some/all of the iterations at once…

We’ll work up to completing this task in a series of smaller steps.

Challenge

Convert a nested loop to a single loop

Consider the python code below, which iterates over two variables in a nested loop.

PYTHON

for num in [1, 2, 3]:
  for letter in ["a", "b", "c"]:
    print (f"{num}{letter}")

OUTPUT

1a
1b
1c
2a
2b
2c
3a
3b
3c

First, rewrite this code (in python) to print the same output, but using just one big loop instead of the two nested loops.

Show me the solution

PYTHON

for value in ["1a", "1b", "1c", "2a", "2b", "2c", "3a", "3b", "3c"]:
  print (value)

Challenge

Convert a loop into an array job

Now we’ve reworked the example to use a single loop, write a slurm job array to print the same values, one per task i.e. task 1 will print “1a”, task 2 will print “1b”, task 3 will prints “3a” etc.

Here is a faded example to read individual lines from a text file loop_values.txt

BASH

#!/bin/bash
#SBATCH -p compute
#SBATCH -t 01:00
#SBATCH -_ _-_ # Add option to indicate a job array, and task ID values

# Task id 1 will read line 1 from loop_values.txt
# Task id 2 will read line 2 from loop_values.txt
# and so on...

# Use some Linux commands to save the value read from 'loop_values.txt' to
# a script variable named VALUE that we can use in other commands.
VALUE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" loop_values.txt)

echo ${VALUE}

Show me the solution

Your file loop_values.txt should contain the following lines:

OUTPUT

1a
1b
1c
2a
2b
2c
3a
3b
3c

and the job script should look like this

BASH

#!/bin/bash
#SBATCH -p compute
#SBATCH -t 01:00
#SBATCH -a 1-9 # Add option to indicate a job array, and task ID values

# Task id 1 will read line 1 from loop_values.txt
# Task id 2 will read line 2 from loop_values.txt
# and so on...

# Use some Linux commands to save the value read from 'loop_values.txt' to
# a script variable named VALUE that we can use in other commands.
VALUE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" loop_values.txt)

echo ${VALUE}

If you inspect the output files from the jobscript, each one should contain one of the values in loop_values.txt.

Putting it all together

We now have all the pieces of the puzzle, and can tackle the bulk modulus example.

The first task will be to download the python scripts for the setup and post-processing. Recall from the transfer files episode that we can use wget for this.

BASH

yourUsername@login:~$ wget https://hpc-training.digitalmaterials-cdt.ac.uk/files/bulk_modulus.tar.gz

Recall also how to extract the files from the archive with

BASH

yourUsername@login:~$ tar -xzvf bulk-modulus.tar.gz

Extracting the tar file should create a new directory bulk_modulus, which we’ll move into using

BASH

yourUsername@login:~$ cd bulk_modulus

Before we can run any of the python scripts, we’ll need to set up a new venv with the dependencies for this example. We do that using

BASH

yourUsername@login:~$ module load python/3.13.5
yourUsername@login:~$ python3 -m venv .venv
yourUsername@login:~$ source .venv/bin/activate
(.venv) yourUsername@login:~$ pip install --upgrade pip
(.venv) yourUsername@login:~$ pip install -r requirements.txt

We will run the setup script directly on the login node, as it is very lightweight in terms of resources required.

BASH

(.venv) yourUsername@login:~$ python create_directories.py

This should produce a directory structure like this:

OUTPUT

Simulations/
    ├── potential_1/
    │    ├── Scale_0.980/
    │    |     ├── in.lmps
    │    |     └── Pot1.set
    │    ├── Scale_0.985/
    │    |     ├── in.lmps
    │    |     └── Pot1.set
    │    ├── ...
    ├── potential_2/
    │    ├── Scale_0.980/
    │    |     ├── in.lmps
    │    |     └── Pot2.eam.fs
    │    ├── Scale_0.985/
    │    |     ├── in.lmps
    │    |     └── Pot2.eam.fs
    │    ├── ...
    ├── ...

The files in each bottom level folder are:

in.lmps – the LAMMPS input file
PotX.set, PotX.eam.fs, or PotX.eam.alloy – the empirical potential file

Challenge

Process the LAMMPS input files using a job array

With the directories set up, you should now be in a position to write a jobarray submission script to process the input files using LAMMPS.

You should already be in the bulk_modulus directory, which contains a dir_list.txt file, containing the path to each directory you will need to be in to run LAMMPS on the input file there.

So the task is to:

edit the jobscript below to set up a job array with the correct number of tasks i.e. complete the #SBATCH command on line 4. How many tasks do you need?
replace row_number on line 9 with the correct variable name that gives the SLURM array task ID.

A copy of this jobscript should already be in your current directory, as jobarray.sh.

BASH

#!/bin/bash
#SBATCH -p compute
#SBATCH -t 10:00
#SBATCH 

module load lammps/20240829.3
source .venv/bin/activate

DIRECTORY=$(sed -n "${row_number}p" dir_list.txt)
cd ${DIRECTORY}
lmp -in in.lmps

Hint

The number of tasks corresponds to the number of lines in the file dir_list.txt. You could count them by opening the file with nano. Alternatively, this command will count the lines for you wc -l dir_list.txt.

Show me the solution

BASH

#!/bin/bash
#SBATCH -p compute
#SBATCH -t 10:00
#SBATCH -a 1-54

module load lammps/20240829.3
source .venv/bin/activate

DIRECTORY=$(sed -n "${SLURM_ARRAY_TASK_ID}p" dir_list.txt)
cd ${DIRECTORY}
lmp -in in.lmps

This is then submitted in the normal way with sbatch jobarray.sh.

To complete the example we’ll run the post-processing script to create two summary graphs, compare_potentials.png and energy_vs_scaling.png.

Challenge

Submit a jobscript to do the post-processing

Write and submit a jobscript to run the post-processing script python post_processing.py

Show me the solution

A suitable jobscript is given below.

BASH

#!/bin/bash
#SBATCH -p compute
#SBATCH -t 10:00

source .venv/bin/activate
python post_processing.py

This should produce summary graphs as follows:

Disadvantages of array jobs

We have seen some of the advantages of job arrays in this episode, but we should also note a couple of disadvantages.

If a single task fails to run correctly it can be difficult to determine and re-submit failed tasks.
If the tasks are small, the scheduler will spend more time managing and queueing your tasks than computing them.

Key Points

High Throughput Computing is for running lots of similar, independent tasks
Iterating over input files, or parameter combinations are good HTC examples
Use SBATCH -a X-Y to configure a job array
${SLURM_ARRAY_TASK_ID} gives the slurm task ID