Connecting to an HPC system
Last updated on 2025-12-15 | Edit this page
Overview
Questions
- How do I connect to an HPC cluster?
Objectives
- Be able to use
sshto connect to an HPC cluster - Be aware that some IDEs can be configured to connect using
ssh
Connecting to the HPC cluster
We learned in the previous episode that we access a cluster via the
login node, but not how to. We will use an encrypted network protocol
called ssh (secure shell) to connect, and the usual way to
do this is via a terminal (also known as the command line or shell).
The shell is another way to interact with your computer instead of a Graphical User Interface (GUI). Instead of clicking buttons and navigating menus, you give instructions to the computer by executing commands at a prompt, which is a sequence of characters in a Command Line Interface (CLI) which indicates readiness to accept commands.
One of the advantages of a CLI is the ability to chain together commands to quickly create custom workflows, and automate repetitive tasks. A CLI will typically use fewer resources than a GUI, which is one of the reasons they’re used in HPC. A GUI can be more intuitive to use, but you’re more limited in terms of functionality.
There are pros and cons of GUIs vs CLIs (command line interface), but suffice to say that a CLI is how you interact with an HPC cluster.
We’ll first practice a little bit with the CLI by looking at the files on our laptops.
Open your terminal/Git Bash now and you’ll see that there is a prompt. This is usually something like:
The information before the $ in the example above shows
the logged in user (you) and the hostname of
the machine (laptop), and the working directory
(~). Your prompt may include something different (or
nothing) before the $, and might finish with a different
character such as # instead of $.
There is very likely a flashing cursor after your prompt – this is where you type commands. If the command takes a while to run, you’ll notice that your prompt disappears. Once the command has finished running, you’ll get your prompt back again.
Let’s run a couple of commands to view what is in our home directory:
The cd command changes the “working directory” which is
the directory where the shell will run any commands. Without any
arguments, cd will take you to your home directory. You can
think of this like a default setting for the cd
command.
Now we’re going to list the contents of our home directory.
OUTPUT
Desktop Documents Downloads Music Pictures Public Videos
[you@laptop ~]$ cd
Your output will look different but will show the contents of your home directory. Notice that after the output from the command is printed, you get your prompt back.
Where is your home directory? We can print the current working directory with:
OUTPUT
/home/username
Again, your output will look slightly different. When using a UNIX
type system such as Linux or macOS, the directory structure can be
thought of as an inverse tree, with the root directory / at
the top level, and everything else branching out beneath that. Git Bash
(or another terminal emulator) will treat your file system similarly,
with the hard drive letter showing as a directory so your home directory
would be something like:
OUTPUT
/c/users/you/home
Let’s now connect to the HPC cluster, using the username and password emailed to you for this course, i.e. not your normal university username:
Forgotten your password?
Reset it here using your email address:
https://hpc-training.digitalmaterials-cdt.ac.uk:8443/reset-password/step1
If this is your first time connecting to the cluster, you will see a message like this:
OUTPUT
The authenticity of host 'hpc-training.digitalmaterials-cdt.ac.uk (46.62.206.78)' can't be established.
ED25519 key fingerprint is SHA256:..........
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
This is normal and expected, and just means that your computer hasn’t
connected to this server before. Type yes and then you will
see something like this:
OUTPUT
Warning: Permanently added 'hpc-training.digitalmaterials-cdt.ac.uk' (ED25519) to the list of known hosts.
At this point you will be prompted for your password. As you enter it you won’t see anything you type, so type carefully.
Once you’ve logged in you should see that your prompt changes to something like:
Bear in mind that other HPC systems are likely to have a slightly different prompt – some examples are given below:
Note that while the values are different, it usually follows broadly the same pattern as the prompt on your laptop, showing your username, the hostname of the login node, and your current directory.
The command needed to connect to your local institution’s cluster
will look similar to the ssh command we just used, but each
HPC system should have documentation on how to get an account and
connect for the first time e.g. Manchester’s CSF
and the ARCHER2
national cluster.
Your home directory will usually be at a different location:
OUTPUT
/some/path/yourUsername
and it will contain different files and subdirectories (but might be empty if you’ve not logged in before):
Git Bash isn’t the only tool you can use to connect to HPC from Windows. MobaXterm is a popular tool with a CLI and some graphical tools for file transfer.
VScode can be configured for ssh, but follow instructions for configuring VScode so it doesn’t hog resources on the login node! e.g. https://ri.itservices.manchester.ac.uk/csf3/software/tools/vscode/
Explore resources on the cluster
A typical laptop might have 2-4 cores and 8-16GB of RAM (memory). You laptop might have a bit more or a bit less, but we’ll use this as a reference to compare with the resources on the HPC cluster.
Explore resources on the login node
We have already considered the resources of a typical laptop – how does that compare with the login node?
You should already be connected to the HPC cluster, but if not log back in using:
See how many cores and how much memory the login node has, using the
commands below. nproc gives the numer of CPUs, and
free -gh shows available memory.
You can get more information about the processors using
lscpu
The output from the commands is shown below, which tells us that the login node has 4 cores and 15GB of memory, which is comparable to a typical laptop.
BASH
userName@login:~$ nproc
4
userName@login:~$ free -gh
total used free shared buff/cache available
Mem: 15Gi 401Mi 11Gi 5.0Mi 3.7Gi 14Gi
Swap: 0B 0B 0B
The login node on most clusters will have more CPUs and memory than your laptop, although as we will soon see, the compute nodes will have even more cores and RAM than the login nodes.
As noted in the setup section the cluster used for this course is an exception to this rule, and has very little computing resource.
Your output should show something that looks a bit like this:
OUTPUT
HOSTNAMES CPUS MEMORY
compute01 4 15610
On most clusters you’re likely to see a few different types of compute node with different numbers of cores and memory, e.g.
OUTPUT
HOSTNAMES CPUS MEMORY
node1203 168 1546944
node1204 168 1546944
node1206 168 1546944
node1207 168 1546944
node1208 168 1546944
node904 32 191520
node600 24 128280
node791 32 1546104
node868 48 514944
node870 48 514944
Compare a laptop, the Login Node and the Compute Node
Consider the output below from a typical HPC cluster, rather than the numbers you got from the cluster used for this training course.
OUTPUT
96
OUTPUT
total used free shared buff/cache available
Mem: 754Gi 50Gi 216Gi 1.2Gi 495Gi 704Gi
Swap: 15Gi 0B 15Gi
Compare your laptop’s number of processors and memory with the numbers from a typical HPC system’s login node (above) and compute nodes (below).
OUTPUT
HOSTNAMES CPUS MEMORY
node1203 168 1546944 # Units is MB, so roughly 1.5TB
node1204 168 1546944
node1206 168 1546944
node1207 168 1546944
node1208 168 1546944
node904 32 191520
node600 24 128280
node791 32 1546104
node868 48 514944
node870 48 514944
What implications do you think the differences might have on running your research work on the different systems and nodes?
Compute nodes are usually built with processors that have higher core-counts than the login node or personal computers in order to support highly parallel tasks. Compute nodes usually also have substantially more memory (RAM) installed than a personal computer. More cores tends to help jobs that depend on some work that is easy to perform in parallel, and more, faster memory is key for large or complex numerical tasks.
- The
sshprotocol is used to connect to HPC clusters - The cluster should have documentation detailing how to connect