What is an HPC cluster?
Last updated on 2025-10-21 | Edit this page
Overview
Questions
- What is an HPC cluster?
- When would you use HPC instead of a laptop?
- How does HPC allocate resources to your job?
Objectives
- Explain some advantages and disadvantages of HPC over laptop
- Explain some different types of HPC job
- Understand that HPC uses a scheduler to run jobs
Motivation
Some research computation is not suitable for running on your laptop—maybe it takes too long to run, needs more memory than you have available on your laptop, or uses too much disk space.
These types of problems needn’t be a limitation on the research you can do, if you have access to a High Performance Computing (HPC) cluster.
We will first discuss working on a laptop before looking at what HPC entails.
Using a laptop
This is a familiar set up, is accessed locally, and the user can run a GUI right away. However, laptops have some significant limitations for research computing.
Let’s imaging that you have a resource-intensive job to run—perhaps a simulation, or some data analysis. You might be able to do this using a laptop, but it requires your laptop to be left on, and probably precludes you from using it for much else until the job has finished.
You are likely also limited to only one “job” (simulation/analysis) at a time.
Consider the schematic below that represents typical resources you might have on a laptop—4 cores (CPUs) and 8GB of memory.
Using HPC
When logged in to an HPC cluster, things look a bit different.
Your work, or “jobs” must be submitted to a scheduler, and wait in a queue until resources are available.
Jobs run on high-end hardware, so you have access to lots of cores and memory. You can log out of the cluster while the jobs run, and many jobs can run at once.
Let’s now consider the HPC schematic below.
Users connect to the HPC system via a login node. This is a shared resource where users can submit batch (non-interactive) jobs. You shouldn’t run work on the login node as this will make the login node slow for other users.
Jobs don’t (usually) run immediately, rather jobs must be submitted to the job scheduler, which decides which compute node(s) will run the job.
“Nodes” are a bit like high-end workstations (lots of CPUs (cores) and memory), linked together to make up a cluster. A cluster will typically have many nodes, suitable for varying types of job.
Compute nodes actually do the computation, and have (high performance) resources specifically allocated for the job.
Your files can be stored in your home directory (which is backed up) but jobs are usually submitted from scratch, which is a faster file system with more storage, though files here aren’t usually backed up.
Types of HPC jobs
An HPC cluster is usually composed of multiple types of hardware, each of which lend themselves to different types of jobs.
The simplest type of HPC job is a serial job, which uses 1 CPU to complete the work.
High Performance Computing (HPC) describes solving computationally intensive problems quickly. This usually involves powerful CPUs and parallel processing (splitting work over multiple CPUs, or even over multiple nodes).
High Throughput Computing (HTC) means running the same task on multiple different inputs. The tasks are independent of each other so can be run at the same time. This might be processing multiple data files, or running simultions using different input parameter values each time.
Graphics Processing Units (GPUs) lend themselves well to tasks that involve a lot of matrix multiplication such as image processing, machine learning, and physics simulations.
High Memory jobs use CPUs that have lots of memory available to them—these can be serial or parallel tasks.
Challenge 1: What are some pros and cons of using an HPC cluster?
- laptop takes too long to run/ needs more memory/ uses too much disk space
- HPC jobs can run for days without you logged in
- can run multiple jobs at once
- can run on high end hardware (lots of cores/memory/GPU)
- types of HPC jobs (HTC, multi-core, multi-node (MPI), GPU)