Powered by:
Open Science Grid
Follow us on social media:Twitter
Center for High Throughput Computing

Jobs That Use GPUs


GPUs (Graphical Processing Units) are a special kind of computer processor that are optimized for running very large numbers of simple calculations in parallel, which often can be applied to problems related to image processing or machine learning. Well-crafted GPU programs for suitable applications can outperform implementations running on CPUs by a factor of ten or more, but only when the program is written and designed explicitly to run on GPUs using special libraries like CUDA. For researchers who have problems that are well-suited to GPU processing, it is possible to run jobs that use GPUs in CHTC. Read on to determine:

  1. GPUs available in CHTC
  2. Software Considerations
  3. Submit File Considerations
  4. Using GPUs on the Open Science Grid
This is the initial version of a guide about running GPU jobs in CHTC. If you have any suggestions for improvement, or any questions about using GPUs in CHTC, please email the research computing facilitators at chtc@cs.wisc.edu.

1. GPUs Available in CHTC

CHTC's high throughput (HTC) system has the following servers with GPU capabilities that are available to any CHTC user (as of 4/13/2017):

Server No. of GPUs GPU Type Current OS (as of 2017-11-7) HasGluster
gpu0000.chtc.wisc.edu 2 Tesla C2050 CentOS 7 no
gpu3000.chtc.wisc.edu 2 Tesla C2050 CentOS 7 yes
gzk-1.chtc.wisc.edu 8 GeForce GTX 1080 SL 6 no
gzk-2.chtc.wisc.edu 8 GeForce GTX 1080 SL 6 no

You can also find out information about GPUs in CHTC through the condor_status command. All of our servers with GPUs have a TotalGPUs attribute that is greater than zero; thus we can query the pool to find GPU-enabled servers by running:

[alice@submit]$ condor_status -compact -constraint 'TotalGpus > 0'

To print out specific information, you can use the "auto-format" option and the names of specific server attributes. The table above can be recreated using the attributes Machine, TotalGpus and CUDADeviceName:

[alice@submit]$ condor_status -compact -constraint 'TotalGpus > 0' -af Machine TotalGpus CUDADeviceName

In addition, HTCondor tracks other GPU-related attributes for each server, including:

  • Gpus: Number of GPUs in an individual job slot on a server (one server can be divided into slots to run multiple jobs)
  • CUDARuntimeVersion: Version of the CUDA software libraries available on the server.

For highly optimized codes, the following attributes may be helpful:

  • CUDACapability: Represents how new the GPU is -- higher numbers are newer GPUs.
  • CUDAGlobalMemoryMb: Amount of global memory available on the GPU
  • CUDAComputeUnits, CUDACoresPerCU: Multiplied together, the number of GPU cores available on the GPU card

To see this information printed out, add the attributes you want to the condor_status command shown above. A sample command (and output) might look like this:

[alice@submit]$ condor_status -constraint 'TotalGpus > 0' -af Name Gpus CUDADeviceName Cpus Memory
slot1@gpu0000.chtc.wisc.edu 2 Tesla C2050 0 1411
slot1@gpu3000.chtc.wisc.edu 2 Tesla C2050 0 1155
gpu3001.chtc.wisc.edu 1 Tesla K40c 16 46080
slot1@gzk-1.chtc.wisc.edu 7 GeForce GTX 1080 0 63291
slot1_1@gzk-1.chtc.wisc.edu 1 GeForce GTX 1080 16 1024
slot1_1@gzk-2.chtc.wisc.edu 1 GeForce GTX 1080 1 8192
slot1_2@gzk-2.chtc.wisc.edu 1 GeForce GTX 1080 1 8192
slot1_3@gzk-2.chtc.wisc.edu 1 GeForce GTX 1080 1 8192
slot1_4@gzk-2.chtc.wisc.edu 1 GeForce GTX 1080 1 8192
slot1_5@gzk-2.chtc.wisc.edu 1 GeForce GTX 1080 1 1024
slot1_6@gzk-2.chtc.wisc.edu 1 GeForce GTX 1080 1 1024
slot1_7@gzk-2.chtc.wisc.edu 1 GeForce GTX 1080 1 1024
slot1_8@gzk-2.chtc.wisc.edu 1 GeForce GTX 1080 1 1024

In the output above "slot1" represents the remaining available resources on a server; if a server isn't running any jobs, this slot will show the total resources available on that server. As jobs start, their resources are taken away from "slot1" to run individual jobs, which you can see in "slot1_1" etc. above.

To see all possible attributes for a particular server, you can use the "long" option with condor_status and the name of the server/slot you want to see:

[alice@submit]$ condor_status -l slot1@gpu-0000.chtc.wisc.edu

2. Software Considerations

Before using GPUs in CHTC you should ensure that the use of GPUs will actually help your program run faster. This means that the code or software you are using has the special programming required to use GPUs and that your particular task will use this capability.

If this is the case, there are several ways to run GPU-enabled software in CHTC:

A. Compiled Code

You can use our conventional methods of creating a portable installation of a software package (as in our R/Python guides) to run on GPUs. Most software needs to be compiled on a machine with GPUs in order to use them correctly. To build your code on a GPU server, create a submit file that requests GPUs (see below) and submit it to run interactively by using the "-i" option with condor_submit:

[alice@submit]$ condor_submit -i gpu.sub

Some software will automatically detect the GPU capability of the server and use that information to compile; for other programs you may need to set specific flags or options.

B. Docker

Some of CHTC's GPU servers have "invida-docker" installed, a specific version of Docker that integrates Docker containers with GPUs. If you can find or create a Docker image with your software that is based on the invidia-docker container, you can use this to run your jobs in CHTC. See our Docker guide for how to use Docker in CHTC.

Docker is not supported on the operating system version (Scientific Linux 6) that is running on most of our GPU servers. We hope to have more GPUs with a current operating system/support for Docker in the future.

C. Singularity (for Tensorflow)

If you are running jobs using Tensorflow, you can use a pre-made environment (in a Singularity container) to run jobs. See this guide for details.

3. Submit File Considerations


All jobs that use GPUs must request GPUs in their submit file (along with the usual requests for CPUs, memory and disk).

request_gpus = 1

It is important to still request at least one CPU per job to do the processing that is not well-suited to the GPU.

Note that HTCondor will make sure your job has access to the GPU -- you shouldn't need to set any environmental variables or other options related to the GPU, except what is needed inside your code.


If your software or code requires a specific version of CUDA, a certain type of GPU, or has some other special requirement, you will need to add a "requirements" statement to your submit file that uses one of the attributes shown above.

For example, if your code needs to use at least version 4.0 of the CUDA libraries:

requirements = (CUDARuntimeVersion >= 4.0)

If you want a certain class of GPU, use CUDACapability:

requirements = (CUDACapability > 3.0)

It may be tempting to add requirements for specific machines or types of GPU cards. However, when possible, it is best to write your code so that it can run across GPU types and without needing the latest version of CUDA.

4. Using GPUs on the Open Science Grid

CHTC, as a member of the Open Science Grid (OSG) can access GPUs that are available on the OSG. See this guide to know whether your jobs are good candidates for the OSG.