|Server||No. of GPUs||GPU Type|
|gzk-1.chtc.wisc.edu||8||GeForce GTX 1080|
|gzk-2.chtc.wisc.edu||8||GeForce GTX 1080|
You can also find out information about GPUs in CHTC through the
condor_status command. All of our servers with GPUs have
TotalGPUs attribute that is greater than zero; thus we can
query the pool to find GPU-enabled servers by running:
[alice@submit]$ condor_status -compact -constraint 'TotalGpus > 0'
To print out specific information, you can use the "auto-format" option and the
names of specific server attributes. The table above can be recreated using the
[alice@submit]$ condor_status -compact -constraint 'TotalGpus > 0' -af Machine TotalGpus CUDADeviceName
In addition, HTCondor tracks other GPU-related attributes for each server, including:
Gpus: Number of GPUs in an individual job slot on a server (one server can be divided into slots to run multiple jobs)
CUDARuntimeVersion: Version of the CUDA software libraries available on the server.
For highly optimized codes, the following attributes may be helpful:
CUDACapability: Represents how new the GPU is -- higher numbers are newer GPUs.
CUDAGlobalMemoryMb: Amount of global memory available on the GPU
CUDAComputeUnits, CUDACoresPerCU: Multiplied together, the number of GPU cores available on the GPU card
To see this information printed out, add the attributes you want to the
condor_status command shown above. A sample command (and output)
might look like this:
[alice@submit]$ condor_status -constraint 'TotalGpus > 0' -af Name Gpus CUDADeviceName Cpus Memory firstname.lastname@example.org 1 Tesla K40m 16 62427 email@example.com 1 Tesla K40m 16 62427 firstname.lastname@example.org 2 Tesla C2050 0 3 email@example.com 0 Tesla C2050 1 4096 firstname.lastname@example.org 0 Tesla C2050 1 4096 email@example.com 0 Tesla C2050 4 2048 firstname.lastname@example.org 0 Tesla C2050 1 1024 email@example.com 0 Tesla C2050 1 512 gpu-3.chtc.wisc.edu 1 Tesla K40c 16 44033 firstname.lastname@example.org 0 GeForce GTX 1080 0 316 email@example.com 8 GeForce GTX 1080 16 64000 firstname.lastname@example.org 0 GeForce GTX 1080 0 316 email@example.com 8 GeForce GTX 1080 16 64000
In the output above "slot1" represents the remaining available resources on a server; if a server isn't running any jobs, this slot will show the total resources available on that server. As jobs start, their resources are taken away from "slot1" to run individual jobs, which you can see in "slot1_1" etc. above.
To see all possible attributes for a particular server, you can use the "long" option with
condor_status and the name of the server/slot you want to see:
[alice@submit]$ condor_status -l firstname.lastname@example.org
2. Software Considerations
Before using GPUs in CHTC you should ensure that the use of GPUs will actually help your program run faster. This means that the code or software you are using has the special programming required to use GPUs and that your particular task will use this capability.
If this is the case, there are several ways to run GPU-enabled software in CHTC:
A. Compiled Code
You can use our conventional methods of creating a portable installation
of a software package (as in our R/Python guides) to run on GPUs. Most
software needs to be compiled on a machine with GPUs in order to use them
correctly. To build your code on a GPU server, create a submit file that
requests GPUs (see below) and submit it to run
interactively by using the "-i" option with
[alice@submit]$ condor_submit -i gpu.sub
Some software will automatically detect the GPU capability of the server and use that information to compile; for other programs you may need to set specific flags or options.
All of CHTC's GPU servers have "invida-docker" installed, a specific version of Docker that integrates Docker containers with GPUs. If you can find or create a Docker image with your software that is based on the invidia-docker container, you can use this to run your jobs in CHTC. See our Docker guide for how to use Docker in CHTC.
3. Submit File Considerations
All jobs that use GPUs must request GPUs in their submit file (along with the usual requests for CPUs, memory and disk).
request_gpus = 1
You can request up to the maximum number of GPUs available on one server. It is important to still request at least one CPU per job to do the processing that is not well-suited to the GPU.Optional
If your software or code requires a specific version of CUDA, a certain type of GPU, or has some other special requirement, you will need to add a "requirements" statement to your submit file that uses one of the attributes shown above.
For example, if your code needs to use at least version 4.0 of the CUDA libraries:
requirements = (CUDARuntimeVersion >= 4.0)
If you want a certain class of GPU, use CUDACapability:
requirements = (CUDACapability >= 6.0)
It may be tempting to add requirements for specific machines or types of GPU cards. However, when possible, it is best to write your code so that it can run across GPU types and without needing the latest version of CUDA.
For all user support, questions, and comments: email@example.com