Powered by:
Open Science Grid
Center for High Throughput Computing

HPC Cluster Basic Use Guide

The CHTC has partnered with the UW-Madison Advanced Computing Initiative (ACI) in order to provide support for using the campus high-performance computing (HPC) cluster. Before using the campus-shared HPC Cluster, you will need to obtain access by filling out the Large-Scale Computing Request Form on our website.

Information regarding buy-in options for access to your own priority queue.

Information About the Cluster

A. Hardware

The HPC Cluster servers consist of two head nodes and many compute nodes ("servers"). There is one queue with access to separate "partitions" for portions of hardware that are owned by different groups or contain different hardware generations, including the largest partitions that are available to anyone on campus (for free). contain different generations of hardware.

Our first generation nodes ("univ" partition) each have 16 CPU cores of 2.2 GHz, and 64 GB of RAM (4 GB per CPU core). Our second generation nodes (in the "univ2" partition) each have 20 CPU cores of 2.5 GHz, and 128 GB of RAM. All users log in at a head node, and all user files on the shared file sytem (Gluster) are accessible on all nodes. Additionally, all nodes are tightly networked (56 Gbit/s Infiniband) so they can work together as a single "supercomputer", depending on the number of CPUs you specify.

B. Logging In

You may log in to the cluster, submit jobs, and transfer/move data through either head node (aci-service-1.chtc.wisc.edu or aci-service-2.chtc.wisc.edu). However, compiling should only be performed on aci-service-2, where all compilers (including node-locked compilers) are located. Do not run programs on the head nodes. Small scripts and commands (to compress data, create directories, etc.) that run within a few minutes on the head node are okay, but their use should be minimized when possible. Computational work should always be submitted as a job. If you want to try out a program interactively, request an interactive session as described later in this guide.

Only ssh connections from an on-campus network are allowed, so you may wish to first connect to an on-campus server with ssh or VPN before connecting to either HPC head node, when off-campus.

C. Data Storage

Data space in the HPC file system is not backed-up and should be treated as temporary by users. Only files necessary for actively-running jobs should be kept on the file system, and files should be removed from the cluster when jobs complete. A copy of any essential files should be kept in an alternate, non-CHTC storage location.

Each user is initially allocated 100 GB of data storage space in their home directory (/home/username/), though we can increase data quotas upon email request to chtc@cs.wisc.edu with a description of data space needing for concurrent, active work.

CHTC Staff reserve the right to remove any significant amounts of data on the HPC Cluster in our efforts to maintain filesystem performance for all users, though we will always first ask users to remove excess data and minimize file counts before taking additional action.

Local scratch space of 500 GB is available on each execute node in /scratch/local/ and is automatically cleaned out upon completion of scheduled job sessions (interactive or non-interactive). Local scratch is available on the compiling node, aci-service-2, in the same location and should be cleaned out by the user upon completion of compiling activities. CHTC staff will otherwise clean this location of the oldest files when it reaches 80% capacity.

D. Partition Configuration and Job Scheduling

The job scheduler on the HPC Cluster is SLURM. You can read more about submitting jobs to the queue on SLURM's website, but we have provided a simple guide below for getting started.

We have provisioned 3 freely-available submission partitions and a small set of nodes prioritized for interactive testing. These partitions can be thought of as different queues, and are selected by the user at the time of job submission.

Per-user limitations

To promote fairness, there is a 600-core running limit per-user across the entire cluster of partitions, with rare exceptions for researchers who own more than this number of cores. Additionally, each user may only have 10 jobs running at once. Users with many smaller (1-node or 2-node) jobs will find that they experience better throughput on CHTC's high-throughput computing (HTC) system, and can email chtc@cs.wisc.edu to get access.

Partition p-name # nodes (N) t-max t-default max nodes/job cores/node (n) RAM/node (GB)
University univ 46 7 days 1 day 16 16 64
University 2 univ2 144 7 days 1 day 16 20 128
Owners unique 124 unique unique unique 16 or 20 64 or 128
Interactive int 2 30 min 30 min 1 16 64
Pre-emptable (backfill) pre 316 24 hrs 4 hrs 16 16 or 20 64 or 128
*note: jobs not requesting a run time will be alotted the default value (t-default) for that partition; jobs without a partition indicated will be run in the "univ" partition.

The University (univ) partition is available to all UW-Madison researchers, and jobs are run without being pre-empted for the duration of time requested. This partition is best for running longer (multi-day) jobs on any number of CPUs and will always have at least 32 nodes (512 cores), but usually much more.

The Owner partitions actually consist of multiple group-specific partitions for research groups who have paid into the cluster for a set number of nodes. Each owner partition will have unique settings, and owned nodes are backfilled by jobs from the "pre" queue.

The Interactive (int) partition consists of a few nodes meant for short and immediate interactive testing on a single node (up to 16 CPUs, 64 GB RAM). There is a specific command to access the "int" partition:
srun -n16 -N1 -p int --pty bash

The Pre-emptable (pre) partition is under-layed on the entire cluster and is meant for more immediate turn-around of shorter and somewhat smaller jobs, or for interactive sessions requiring more than the 30-minute limit of the "int" partition. Pre-emptable jobs will run on any idle nodes (primarily Owner nodes, as the University partition is likely to be full), but will be pre-empted by jobs of other partitions with priority on those nodes. However, pre-empted jobs will be re-queued if originally submitted with an sbatch script (see below).

Job Priority Determinations

A. User priority decreases as the user accumulates hours of CPU time over the last 21 days, across all queues. This "fair-share" policy means that users who have run many/larger jobs in the near-past will have a lower priority, and users with little recent activity will see their waiting jobs start sooner. We do NOT have a strict "first-in-first-out" queue policy.
B. Job priority increases with job wait time. After the history-based user priority calculation in (A), the next most important factor for each job's priority is the amount of time that each job has already waited in the queue. For all the jobs of a single user, these jobs will most closely follow a "first-in-first-out" policy.
C. Job priority increases with job size, in cores. This least important factor slightly favors larger jobs, as a means of somewhat countering the inherently longer wait time necessary for allocating more cores to a single job.

Basic Use of the Cluster

1. Log in to the cluster head node

Create an ssh connection to aci-service-1.chtc.wisc.edu using your UW-Madison username and associated password.

Checking partition availabilty

To see partitions that you can submit to, use the following command:

[alice@service]$ sinfo
Using the "-a" argument to "sinfo" will show ALL partitions.

2. Software Capabilities

As part of our overall strategy for enabling users through computing, we actually encourage users to install and compile their desired software (and version), as they wish, within the /home/username location. Compiling should be performed on aci-service-2.chtc.wisc.edu in order to keep compute-intensive compilation from affecting queue activites. All necessary compilers should all be accessible from this head node. As some codes compile with better performance using local scratch space than using the cluster's shared file system (/home/username), we have enabled local scratch space on aci-service-2 in /scratch/local/username. Please email chtc@cs.wisc.edu if you can't find the compiler you need or have other issues.

For more specific details on compiling and running MPI code
Please see our MPI Use Guide for information about the availability of specific libraries and how to load them.

3. Submitting jobs

A. Requesting an Interactive Job ("int" and "pre" partitions)
You may request up to a full node (16 CPUs, 64 GB RAM) when requesting an interactive session in the "int" partition. Interactive sessions on the "int" partition are allowed for 30 minutes, but you may request less time (see the below example). Sessions in the "pre" partition are limited according to the "Partition" table above, but are potentially subject to interruption.

[alice@service]$ srun -n16 -N1 -p int --pty bash
The above example indicates a request for 16 CPUs (-n16) on a single node (-N1) in the "int" partition (-p int), and "-t 15" would indicate a request for 15 minutes, if desired rather than the 30-minute default. After the interactive shell is created to a compute node with the above command, you'll have access to files on the shared file system and be able to execute code interactively as if you had directly logged in to that node. It is important to exit the interactive shell when you're done working by typing exit.

B. Submitting a Job to the Queue (all partitions)
To submit jobs to the queue for a given partition such that a connection to the jobs is not maintained, you should use sbatch submission. You will first want to create an sbatch script, which is is essentially just a shell script (sh, bash, etc.) with #SBATCH descriptor lines.
The following example requests a job slot with 16 CPU cores on each of 2 nodes (32 cores total) for 4 hours and 30 minutes:
#This file is called submit-script.sh
#SBATCH --partition=univ		# default "univ", if not specified
#SBATCH --time=0-04:30:00		# run time in days-hh:mm:ss
#SBATCH --nodes=2			# require 2 nodes
#SBATCH --ntasks-per-node=16            # (by default, "ntasks"="cpus")
#SBATCH --mem-per-cpu=4000		# RAM per CPU core, in MB (default 4 GB/core)
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
#Make sure to change the above two lines to reflect your appropriate
# file locations for standard error and output

#Now list your executable command (or a string of them).
# Example for non-SLURM-compiled code:
module load mpi/gcc/openmpi-1.6.4
mpirun -n 32 /home/username/mpiprogram
You can then submit the script with the following command:
[alice@service]$ sbatch submit-script.sh
Other lines that you may wish to add to your script for specifying a number of total tasks (equivalent to "cores" by default), desired CPU cores per task (for multiple CPU cores per MPI task), or total RAM per node are:
#SBATCH --mem=4000         # RAM per node, in MB (default 64000/node, max values in partition table)
#SBATCH --ntasks=32        # total number of "tasks" (cores) requested
#SBATCH --cpus-per-task=1  # default "1" if not specified
In any case, it is important to make sure that your request fits within the hardware configuration of your chosen partition.

C. Using srun and salloc
In early tests with the cluster, we encouraged running non-interactive jobs with SLURM's srun and salloc commands without an sbatch script; however, doing so creates and requires a persistent connection to your job as it runs, and interrupted jobs are not re-queued if submitted this way (even when using the "pre" partition). You are welcome to submit jobs in these modes according to SLURM's user guide, which has some awesome advanced features for complex MPI configurations.
Please remember to indicate partition and run time with the -p and -t flags, respectively (see the interactive job command, above, for an example using these flags).

4. Viewing jobs in the queue

To view your jobs in the SLURM queue, enter the following:

[alice@service]$ squeue -u username
Issuing squeue alone will show all user jobs in the queue. You can view all jobs for a particular partition with "squeue -p univ"

5. Removing jobs

After running squeue, you can kill and/or remove your job from the queue with the following:

[alice@service]$ scancel job#
where job# is the number shown for your job in the squeue output.

This guide was updated on 2013-08-14 to indicate changes regarding partition configuration, job compiling, and job submission. You can see the previous guide version here.