Scale Beyond Local HTC Capacity
This guide provides an introduction to running jobs outside of CHTC: why using these resources is beneficial, what resources are available, and how to use them.
Table of Contents
Why run on additional capacity outside CHTC?
Running on other resources in addition to CHTC has one huge benefit: size!
In addition to what’s available at CHTC, UW-Madison groups and the national OSG Consortium make thousands of computers available for high throughput computing, including specialized hardware resources like GPUs.
Most CHTC users who submit jobs to CHTC, campus pools, and the OSPool can get more than 100,000 computer hours (more than 11 years of computing!) in a single day.
Is this capacity for you?
Many jobs on CHTC’s HTC system would benefit from extra computing capacity! We make the following recommendations to decide if your jobs would be a good fit.
Job Length | 10-24 hours (per job) | Your job should complete in under 10 hours — either it finishes in that amount of time, or it self-checkpoints at least that frequently. If you would like to implement self-checkpointing for a longer code, we are happy to provide resources and guidance. |
---|---|---|
Data Size | Up to 20GB input/output per job | This covers input files that would normally be
transferred out of a /home directory or using an osdf:/// URL from
the /staging directory. |
Software | An Apptainer container (recommended) | Almost any software that runs in CHTC will run outside CHTC, but the best scenario is using a container to maintain a consistent software environment. |
If you have large data files, long jobs, or questions about containers, please contact us!
External computing capacity accessible from CHTC
UW Campus Pools
CHTC has connections with other groups and centers on campus that run their own high throughput computing pool that uses HTCondor. Some of these groups include departments (Biochemistry, Statistics) or large physics projects (IceCube, CMS). Through agreements with these groups, jobs submitted in CHTC can opt into running on these other campus pools if they are not fully utilized by their owners.
Open Science Pool (OSPool)
As the home for the PATh project, CHTC operates a national high throughput computing pool called the Open Science Pool, composed of computing capacity contributed by campuses, national labs, and other institutions across and beyond the United States. CHTC users submitting from a CHTC Access Point can opt into allowing their jobs to utilize this national pool.
How to use external capacity
If your jobs meet the characteristics above and you would like to use external HTC pools to run jobs, in addition to CHTC, you can add the following to your submit file:
want_campus_pools = true |
Opts into sending jobs to other HTCondor Pools on campus. Good for jobs that are less than ~12 hours, on average, or jobs with checkpointing. |
want_ospool = true |
Opts into sending jobs to the OS Pool. Good for jobs that are less than ~12 hours, on average, or jobs with checkpointing |
Testing jobs outside CHTC
To guarantee maximum efficiency, please do the following steps whenever submitting a new type of job beyond CHTC.
-
Run test jobs: You should run set of test jobs (anywhere from 10-2000 jobs) outside CHTC before submitting your full workflow. To do this, take a job submission that you know runs successfully on CHTC. Add the following options in the submit file and submit the test jobs:
requirements = (Poolname =!= "CHTC") want_campus_pools = true want_ospool = true
(If your submit file already has a
requirements =
line, you can appending thePoolname
requirement by using a double ampersand (&&
) and then the additional requirement.) -
Identify problems: If your jobs don't run successfully on the UW Grid or OS Pool, please get in touch with a research computing facilitator.
-
Scale up: Once you have tested your jobs and they seem to be running successfully, you are ready to submit a full batch of jobs that uses CHTC and the UW Grid/OS Pool. REMOVE the
Poolname
requirement from the test jobs but leave thewant_campus_pools
andwant_ospool
lines.
Things to consider
Containers
Containers are the best way to ensure a consistent software environment for your jobs when running inside and outside CHTC. We generally recommend using Apptainer, since the OSPool provides the best support for Apptainer containers.
- Already using Apptainer? Great! No changes needed.
- Already using Docker? Convert your container to the apptainer format and use
staging and and
osdf:///
URL to send it to your jobs. See this guide for details. - Not using containers at all? Check out our docs or talk to CHTC facilitators about the best approach for your code.
Data
If you are transferring your data to jobs:
- from
/home
, using normal HTCondor file transfer - from an individual
/staging
directory, using anosdf:///
URL.
You can use capacity outside of CHTC. If you are using a different method to access your files, contact the facilitation team about how you might run your work outside CHTC.