Powered by:
Open Science Grid
Center for High Throughput Computing

Old HPC Cluster Basic Use Guide

Please note that this guide version was retired on 2013-08-14 when the cluster configuration, compiling instructions, and job submission instructions were significantly changed. Please see the new guide here.

The CHTC has partnered with the UW-Madison Advanced Computing Infrastructure (ACI) in order to provide support for using the campus high-performance computing (HPC) cluster. Before using the campus-shared HPC Cluster, you will need to obtain access by filling out the Large-Scale Computing Request Form on the ACI website.

Information About the Cluster

Hardware

The HPC Cluster is made up of multiple nodes, each with 16 cores and 4 GB of memory (RAM) per core. All users log in at the "head node", and all nodes share a file sytem (Gluster), which means that your programs and files will be accessible on all of the nodes when your computing jobs run. Additionally, all nodes are tightly connected so that they work together as a single "supercomputer" depending on the number of cores you specify. You can read more about the hardware configuration on the ACI website.

Logging In

You may log in to the cluster and transfer/move data through either head node (aci-service-1.chtc.wisc.edu, or aci-service-2.chtc.wisc.edu). However, you may only submit jobs from aci-service-1, and compiling should only be performed on aci-service-2, where all compilers (including node-locked compilers) are located.

Data Storage

Storage space in the HPC file system is not backed-up and should be treated as temporary by users. Only files necessary for actively-running jobs should be kept on the file system, and files should be removed when they're no longer necessary for active jobs. A copy of any essential files should be kept in an alternate storage location.

Each user is allocated 1 TB of storage space in their home directory (/home/NetID/) and additional space for running jobs in a scratch directory (/scratch/NetID/). Inputs for jobs should be copied from home to scratch, and output should be written to that location while a job is running so as not to overfill the user's quota in their home directory with temporary files or with output that won't be necessary after the job is completed.

When a job finishes, the scratch directory should be cleaned out. Input copies and unnecessary output data should be deleted from the scratch location and necessary output should be moved (not copied) to the user's home directory. When files are no longer needed for active jobs, they should be copied to another location and removed from the HPC file system.

Job Scheduling

The job scheduler on the HPC Cluster is SLURM. You can read more about submitting jobs to the queue on SLURM's website, but we have provided a simple guide below for getting started.

Basic Use of the Cluster

1. Log in to the cluster head node

Create an ssh connection to aci-service-1.chtc.wisc.edu using your UW-Madison NetID and associated password.

2. Staging data

Necessary inputs for compiling code and running jobs should be initially copied into the user's home directory (/home/NetID/). Prior to compiling code or running jobs, necessary files should be copied to a suitable location in the user's scratch directory (/scratch/NetID/).

3. Compiling code

All compiling should be performed on aci-service-2.chtc.wisc.edu (which you can ssh to from aci-service-1). The necessary compilers should all be accessible from this head node, and your files will still be accessible on the gluster file system.

If you're compiling MPI code
When compiling MPI code to run on the HPC Cluster, it is necessary to use SLURM's MPI libraries by specifying a -lpmi argument as shown in the following command-line examples using gcc and mpi compilers:

$ gcc [args] -lpmi executable
$ mpicc [args] -lpmi executable
$ mpicc++ [args] -lpmi executable

In these examples, [args] indicates where your usual compiling arguments go, and executable indicates where the executable is named. When specifying the executable, it is best to use the absolute location, as in /scratch/lmichael/myjob/mpi.exe. Otherwise, make sure you have navigated to the location of the executable within your scratch directory.

Using the -lpmi argument will link your code with SLURM's built-in MPI libraries, such that you will not need to specify an MPI executable (mpirun, mpiexec, etc.) when submitting jobs to the queue. You can read more about SLURM's MPI compatibility in the software's MPI Use Guide.

If you're compiling code without MPI
Compile as you normally would after copying necessary files to a suitable location in your scratch directory.

In either case
If you get an error indicating that a suitable library was not found, try compiling again with an additional -static argument. If you're still having issues compiling (or need a compiler/version not already installed), please send us an email at chtc@cs.wisc.edu.

4. Submitting jobs

To submit your computing job to the SLURM queue, issue one of the following commands:

$ srun -n #cores executable [args]
$ srun -N #nodes executable [args]

In the first example, -n is used to indicate an integer number of cores (#cores) you'd like your executable to run on. Alternatively, you may wish to run on a set number of nodes (16 cores per node, 4 GB RAM per core, 64 GB RAM per node) as indicated in the second example with the -N argument. Remember that because you compiled with SLURM's MPI libraries, you do not need to indicate an MPI executable (mpirun, mpiexec, etc.).

If you would like to run a single job using 64 cores or more, please contact us first by sending an email to chtc@cs.wisc.edu so that we may help determine the best way to run your job.

If you would first like to test or debug your code on a single node (16 cores or less), you can initiate an interactive shell on a worker node by issuing the following command from the head node:

$ srun --pty bash

After the interactive shell is created, you'll have access to files in your home and scratch directories on the shared file system for interactive work. It is important to exit the interactive shell when you're done working by typing exit.

5. Viewing jobs in the queue

To view jobs in the SLURM queue as they run, enter the following:

$ squeue [-u NetID]

In this example, including -u NetID will produce output showing only your jobs. Issuing squeue alone will show all user jobs in the queue.

6. Killing jobs

After running squeue, you can kill your job with the following:

$ scancel job#

where job# is the number shown for your job in the squeue output.