|Server||No. of GPUs||GPU Type||Current OS (as of 2017-11-7)||HasGluster|
|gpu-1.chtc.wisc.edu||2||Tesla C2050||CentOS 7||yes|
|gpu-2.chtc.wisc.edu||2||Tesla C2050||CentOS 7||no|
|gzk-1.chtc.wisc.edu||8||GeForce GTX 1080||SL 6||no|
|gzk-2.chtc.wisc.edu||8||GeForce GTX 1080||SL 6||no|
You can also find out information about GPUs in CHTC through the
condor_status command. All of our servers with GPUs have
TotalGPUs attribute that is greater than zero; thus we can
query the pool to find GPU-enabled servers by running:
[alice@submit]$ condor_status -compact -constraint 'TotalGpus > 0'
To print out specific information, you can use the "auto-format" option and the
names of specific server attributes. The table above can be recreated using the
[alice@submit]$ condor_status -compact -constraint 'TotalGpus > 0' -af Machine TotalGpus CUDADeviceName
In addition, HTCondor tracks other GPU-related attributes for each server, including:
Gpus: Number of GPUs in an individual job slot on a server (one server can be divided into slots to run multiple jobs)
CUDARuntimeVersion: Version of the CUDA software libraries available on the server.
For highly optimized codes, the following attributes may be helpful:
CUDACapability: Represents how new the GPU is -- higher numbers are newer GPUs.
CUDAGlobalMemoryMb: Amount of global memory available on the GPU
CUDAComputeUnits, CUDACoresPerCU: Multiplied together, the number of GPU cores available on the GPU card
To see this information printed out, add the attributes you want to the
condor_status command shown above. A sample command (and output)
might look like this:
[alice@submit]$ condor_status -constraint 'TotalGpus > 0' -af Name Gpus CUDADeviceName Cpus Memory email@example.com 1 Tesla K40m 16 62427 firstname.lastname@example.org 1 Tesla K40m 16 62427 email@example.com 2 Tesla C2050 0 3 firstname.lastname@example.org 0 Tesla C2050 1 4096 email@example.com 0 Tesla C2050 1 4096 firstname.lastname@example.org 0 Tesla C2050 4 2048 email@example.com 0 Tesla C2050 1 1024 firstname.lastname@example.org 0 Tesla C2050 1 512 gpu-3.chtc.wisc.edu 1 Tesla K40c 16 44033 email@example.com 0 GeForce GTX 1080 0 316 firstname.lastname@example.org 8 GeForce GTX 1080 16 64000 email@example.com 0 GeForce GTX 1080 0 316 firstname.lastname@example.org 8 GeForce GTX 1080 16 64000
In the output above "slot1" represents the remaining available resources on a server; if a server isn't running any jobs, this slot will show the total resources available on that server. As jobs start, their resources are taken away from "slot1" to run individual jobs, which you can see in "slot1_1" etc. above.
To see all possible attributes for a particular server, you can use the "long" option with
condor_status and the name of the server/slot you want to see:
[alice@submit]$ condor_status -l email@example.com
2. Software Considerations
Before using GPUs in CHTC you should ensure that the use of GPUs will actually help your program run faster. This means that the code or software you are using has the special programming required to use GPUs and that your particular task will use this capability.
If this is the case, there are several ways to run GPU-enabled software in CHTC:
A. Compiled Code
You can use our conventional methods of creating a portable installation
of a software package (as in our R/Python guides) to run on GPUs. Most
software needs to be compiled on a machine with GPUs in order to use them
correctly. To build your code on a GPU server, create a submit file that
requests GPUs (see below) and submit it to run
interactively by using the "-i" option with
[alice@submit]$ condor_submit -i gpu.sub
Some software will automatically detect the GPU capability of the server and use that information to compile; for other programs you may need to set specific flags or options.
Some of CHTC's GPU servers have "invida-docker" installed, a specific version of Docker that integrates Docker containers with GPUs. If you can find or create a Docker image with your software that is based on the invidia-docker container, you can use this to run your jobs in CHTC. See our Docker guide for how to use Docker in CHTC.
Docker is not supported on the operating system version (Scientific Linux 6) that is running on most of our GPU servers. We hope to have more GPUs with a current operating system/support for Docker in the future.
C. Singularity (for Tensorflow)
If you are running jobs using Tensorflow, you can use a pre-made environment (in a Singularity container) to run jobs. See this guide for details.
3. Submit File Considerations
All jobs that use GPUs must request GPUs in their submit file (along with the usual requests for CPUs, memory and disk).
request_gpus = 1
It is important to still request at least one CPU per job to do the processing that is not well-suited to the GPU.
Note that HTCondor will make sure your job has access to the GPU -- you shouldn't need to set any environmental variables or other options related to the GPU, except what is needed inside your code.Optional
If your software or code requires a specific version of CUDA, a certain type of GPU, or has some other special requirement, you will need to add a "requirements" statement to your submit file that uses one of the attributes shown above.
For example, if your code needs to use at least version 4.0 of the CUDA libraries:
requirements = (CUDARuntimeVersion >= 4.0)
If you want a certain class of GPU, use CUDACapability:
requirements = (CUDACapability > 3.0)
It may be tempting to add requirements for specific machines or types of GPU cards. However, when possible, it is best to write your code so that it can run across GPU types and without needing the latest version of CUDA.
4. Using GPUs on the Open Science Grid
CHTC, as a member of the Open Science Grid (OSG) can access GPUs that are available on the OSG. See this guide to know whether your jobs are good candidates for the OSG.
For all user support, questions, and comments: firstname.lastname@example.org