HPC Cluster Basic Use Guide
The CHTC has partnered with the UW-Madison
Advanced Computing Initiative
(ACI) in order to provide support for using the campus high-performance computing
(HPC) cluster. Before using the campus-shared HPC Cluster, you will need to
obtain access by
filling out the
Large-Scale Computing Request Form on the ACI website.
Information regarding buy-in options for access to your own priority queue.
Information About the Cluster
The HPC Cluster servers consist of two head nodes and many compute nodes
("servers"). There is one queue with access to separate "partitions" for
portions of hardware that are owned by different groups or contain different
hardware generations, including the largest partitions that are available to
anyone on campus (for free). contain different generations of hardware.
Our first generation nodes ("univ" partition) each have 16 CPU cores of
2.2 GHz, and 64 GB of RAM (4 GB per CPU core). Our second generation nodes
(in the "univ2" partition) each have 20 CPU cores of 2.5 GHz, and 128 GB of
RAM. All users log in at a head node, and all user files on the shared file
sytem (Gluster) are accessible on all nodes. Additionally, all nodes are
tightly networked (56 Gbit/s Infiniband) so they can work together as a
single "supercomputer", depending on the number of CPUs you specify.
B. Logging In
You may log in to the cluster, submit jobs, and transfer/move data through either head node
(aci-service-1.chtc.wisc.edu or aci-service-2.chtc.wisc.edu). However, compiling should only be
performed on aci-service-2, where all compilers (including node-locked compilers)
are located. Do not run programs on the head nodes.
Small scripts and commands (to compress data, create directories, etc.) that run within a
few minutes on the head node are okay, but their use should be minimized when possible.
Computational work should always be submitted as a job. If you want to try out a program
interactively, request an interactive session as described later in this guide.
Only ssh connections from an on-campus network are allowed, so
you may wish to first connect to an on-campus server with ssh or VPN before
connecting to either HPC head node, when off-campus.
C. Data Storage
Data space in the HPC file system is not
backed-up and should be treated as temporary by users. Only files
necessary for actively-running jobs should be kept on the file system,
and files should be removed from the cluster when jobs complete. A copy of any
essential files should be kept in an alternate, non-CHTC storage location.
Each user is initially allocated 100 GB of data storage space in their home directory
(/home/username/), though we can increase data quotas upon email request to
email@example.com with a description of data space needing for concurrent, active work.
CHTC Staff reserve the right to remove any significant amounts of data
on the HPC Cluster in our efforts to maintain
filesystem performance for all users, though we will always first ask users
to remove excess data and minimize file counts before taking additional action.
Local scratch space of 500 GB is available on each execute node in
/scratch/local/ and is automatically cleaned out upon completion of scheduled
job sessions (interactive or non-interactive). Local scratch is available on
the compiling node, aci-service-2, in the same location and should be cleaned
out by the user upon completion of compiling activities. CHTC staff will otherwise
clean this location of the oldest files when it reaches 80% capacity.
D. Partition Configuration and Job Scheduling
The job scheduler on the HPC Cluster is SLURM. You can read more about
submitting jobs to the queue on
SLURM's website, but we have provided a simple guide below for getting
We have provisioned 3 freely-available submission partitions and
a small set of nodes prioritized for interactive testing. These
partitions can be thought of as different queues, and are
selected by the user at the time of job submission.
To promote fairness, there is a 600-core running limit per-user across
the entire cluster of partitions, with rare exceptions for researchers
who own more than this number of cores. Additionally, each user
may only have 10 jobs running at once. Users with many smaller (1-node or
2-node) jobs will find that they experience better throughput on CHTC's
high-throughput computing (HTC) system, and can email firstname.lastname@example.org to
|| # nodes (N)
|| max nodes/job
|| cores/node (n)
|| RAM/node (GB)
||16 or 20
||64 or 128
||16 or 20
||64 or 128
*note: jobs not requesting a run time will be alotted the default value
(t-default) for that partition; jobs without a partition indicated will be run
in the "univ" partition.
The University (univ) partition is available to all UW-Madison
researchers, and jobs are run without being pre-empted for the
duration of time requested. This partition is best for running longer (multi-day)
jobs on any number of CPUs and will always have at least 32 nodes (512 cores),
but usually much more.
The Owner partitions actually consist of multiple group-specific
partitions for research groups who have paid into the cluster for a set number
of nodes. Each owner partition will have unique settings, and owned
nodes are backfilled by jobs from the "pre" queue.
The Interactive (int) partition consists of a few nodes meant for short
and immediate interactive testing on a single node (up to 16 CPUs, 64 GB RAM).
There is a specific command to access the "int" partition:
srun -n16 -N1 -p int --pty bash
The Pre-emptable (pre) partition is under-layed on the entire cluster
and is meant for more immediate turn-around of shorter and somewhat smaller jobs,
or for interactive sessions requiring more than the 30-minute limit of the "int"
Pre-emptable jobs will run on any idle nodes (primarily Owner nodes, as the
University partition is likely to be full), but will be pre-empted by jobs
of other partitions with priority on those nodes. However, pre-empted jobs will
be re-queued if originally submitted with an sbatch script (see below).
Job Priority Determinations
A. User priority decreases as the user accumulates hours of CPU time over the
last 21 days, across all queues. This "fair-share" policy means that users who
have run many/larger jobs in the near-past will have a lower priority, and users
with little recent activity will see their waiting jobs start sooner. We do NOT
have a strict "first-in-first-out" queue policy.
B. Job priority increases with job wait time. After the history-based user
priority calculation in (A), the next most important factor for each job's priority
is the amount of time that each job has already waited in the queue. For all
the jobs of a single user, these jobs will most closely follow a
C. Job priority increases with job size, in cores. This least important factor
slightly favors larger jobs, as a means of somewhat countering the inherently
longer wait time necessary for allocating more cores to a single job.
Basic Use of the Cluster
1. Log in to the cluster head node
Create an ssh connection to aci-service-1.chtc.wisc.edu using your UW-Madison
username and associated password.
Checking partition availabilty
To see partitions that you can submit to, use the following command:
Using the "
-a" argument to "
sinfo" will show ALL
2. Software Capabilities
As part of our overall strategy for enabling users through computing, we
actually encourage users to install and compile their desired software (and
version), as they wish, within the /home/username location. Compiling should
be performed on aci-service-2.chtc.wisc.edu in order to keep
compute-intensive compilation from affecting queue activites. All
necessary compilers should all be accessible from this head node. As some
codes compile with better performance using local scratch space than using the
cluster's shared file system (/home/username),
we have enabled local scratch space on aci-service-2 in
/scratch/local/username. Please email
email@example.com if you can't find the compiler you need or have other issues.
For more specific details on compiling and running MPI code
Please see our MPI Use Guide for information about the availability
of specific libraries and how to load them.
3. Submitting jobs
A. Requesting an Interactive Job ("int" and "pre" partitions)
You may request up to a full node (16 CPUs, 64 GB RAM) when requesting an
interactive session in the "int" partition. Interactive sessions on the "int"
partition are allowed for 30 minutes, but you may request less time
(see the below example). Sessions in the "pre" partition are limited according to
the "Partition" table above, but are potentially subject to interruption.
[alice@service]$ srun -n16 -N1 -p int --pty bash
The above example indicates a request for 16 CPUs (
-n16) on a
single node (
-N1) in the "int" partition (
-p int), and
-t 15" would indicate a request for 15 minutes, if desired rather than the
30-minute default. After the interactive shell is created to a compute node with
the above command, you'll have access to files on the shared file system
and be able to execute code interactively as if you had
directly logged in to that node. It is important to exit the interactive shell
when you're done working by typing
B. Submitting a Job to the Queue (all partitions)
To submit jobs to the queue for a given partition such that a connection
to the jobs is not maintained, you should use
You will first want to create an
sbatch script, which is
is essentially just a shell script (sh, bash, etc.) with
The following example requests a job slot with 16 CPU cores
on each of 2 nodes (32 cores total) for 4 hours and 30 minutes:
#This file is called submit-script.sh
#SBATCH --partition=univ # default "univ", if not specified
#SBATCH --time=0-04:30:00 # run time in days-hh:mm:ss
#SBATCH --nodes=2 # require 2 nodes
#SBATCH --ntasks-per-node=16 # (by default, "ntasks"="cpus")
#SBATCH --mem-per-cpu=4000 # RAM per CPU core, in MB (default 4 GB/core)
#Make sure to change the above two lines to reflect your appropriate
# file locations for standard error and output
#Now list your executable command (or a string of them).
# Example for non-SLURM-compiled code:
module load mpi/gcc/openmpi-1.6.4
mpirun -n 32 /home/username/mpiprogram
You can then submit the script with the following command:
[alice@service]$ sbatch submit-script.sh
Other lines that you may wish to add to your script for specifying a number of total
tasks (equivalent to "cores" by default),
desired CPU cores per task (for multiple CPU cores per MPI task),
or total RAM per node are:
#SBATCH --mem=4000 # RAM per node, in MB (default 64000/node, max values in partition table)
#SBATCH --ntasks=32 # total number of "tasks" (cores) requested
#SBATCH --cpus-per-task=1 # default "1" if not specified
In any case, it is important to make sure that your request fits within the hardware
configuration of your chosen partition.
C. Using srun and salloc
In early tests with the cluster, we encouraged running non-interactive jobs
without an sbatch script; however, doing so
creates and requires a persistent connection to your job as it runs, and
interrupted jobs are not re-queued if submitted this way (even when using
the "pre" partition). You are welcome to submit jobs in
these modes according to
guide, which has some awesome advanced features for complex MPI configurations.
Please remember to indicate partition and run time with the -p and
-t flags, respectively (see the interactive job command, above, for an
example using these flags).
4. Viewing jobs in the queue
To view your jobs in the SLURM queue, enter the following:
[alice@service]$ squeue -u username
will show all user jobs in the queue. You can view all jobs for a particular
partition with "
squeue -p univ"
5. Removing jobs
squeue, you can kill and/or remove your job from
the queue with the following:
[alice@service]$ scancel job#
job# is the number shown for your job in the
This guide was updated on 2013-08-14 to indicate changes regarding partition
configuration, job compiling, and job submission. You can see the previous guide