|Meant For||Input/Software File Sizes||Output File Sizes||Available for Jobs Running ...||File Security||Special Considerations|
|HTCondor file transfer||basic file delivery and return; see size limits at right||0 - 100 MB per file; <500 MB total per job||0 - 4 GB total||in CHTC, UW Grid, and OSG||available to your jobs, on CHTC and beyond||DO NOT USE for files in /mnt/gluster OR /squid|
|SQUID Web Proxy||large input or software shared by many jobs||100 MB - 1 GB per shared file||N/A||in CHTC, UW Grid, and OSG||files will be world-readable||large files unique to individual jobs are better for Gluster|
|Gluster File Share||largest software, input, and output||100 MB - TBs per unique file; 1GB - TBs per shared file||4 GB - TBs||in only a portion of CHTC||accessible ONLY to your jobs on specific CHTC servers||special submit "Requirements"|
SQUID Web Proxy
CHTC maintains a SQUID web proxy from which pre-staged input files and executables can be downloaded into jobs using CHTC's proxy HTTP address.
- Intended Use:
- The SQUID web proxy is best for cases where many jobs will use
the same large file (or few files), including large software. It
is not good for cases when each of many jobs needs a different large input file,
in which case "Gluster"
should be used. Remember that you're always better off by pre-splitting a
large input file into smaller job-specific files if each job only needs some of
the large files's data. If each job needs a large set of many files, you should create a
.tar.gzfile containing all the files, and this file will still need to be less than 1 GB.
- Access to SQUID:
- is granted upon request to email@example.com. A user on CHTC submit
servers may will be granted a user directory within
/squid, which users should transfer data into via the CHTC transfer server (transfer00.cthc.wisc.edu). As for all CHTC file space, users should minimize the amount of data on the SQUID web proxy, and should clean files from the
/squidlocation regularly. CHTC staff reserve the right to remove any file from
/squidwhen needed to preserve availability and performance for all users.
- Files placed on the SQUID web proxy can be downloaded by jobs running anywhere, because the files are world-readable.
- Limitations and Policies:
- SQUID cannot be used for job output, as there is no way to change files in SQUID from within a job.
- SQUID is also only capable of delivering individual files up to 1 GB in size.
- A change you make to a file within your
/squiddirectory may not take effect immediately on the SQUID web proxy if you use the same filename. Therefore, it is important to use a new filename when replacing a file in your
- Jobs should still ALWAYS and ONLY be submitted from within the
- Only the "http" address should be listed in the
transfer_input_files" line of the submit file. File locations starting with "
/squid" should NEVER be listed in the submit file.
- Users should only have data in /squid that is being use for currently-queued jobs; CHTC provides no back ups of any data in CHTC systems, and our staff reserve the right to remove any data causing issues. It is the responsibility of users to keep copies of all essential data in preparation for potential data loss or file system corruption.
- Data Security:
- Files placed in SQUID can only be edited by the owner
of the user directory within
/squid, but will end up being world-readable on the SQUID web proxy in order to be readily downloadable by jobs (with the proper HTTP address); thus, large files that should be "private" should not be placed in your user directory in
/squid, and should instead use CHTC's Gluster location for large-file staging.
Using SQUID to Deliver Input Files
- Request a directory in SQUID. Write to firstname.lastname@example.org describing the data you'd like to place in SQUID, and indicating your username and submit server hostname (i.e. submit-5.chtc.wisc.edu).
- Place files within your
/squid/usernamedirectory via a CHTC transfer server (if from your laptop/desktop) or on the submit server. From your laptop/desktop:
[username@computer]$ scp large_file.tar.gz email@example.com:/squid/username/If the file already exists within your /home directory on a submit server:
[username@submit]$ cp large_file.tar.gz /squid/username/Check the file from the submit server:
[username@submit]$ ls /squid/username/
- Have HTCondor download the file to the working job using
http://proxy.chtc.wisc.edu/SQUIDaddress in the transfer_input_files line of your submit file:
transfer_input_files = other_file1,other_file2,http://proxy.chtc.wisc.edu/SQUID/username/lg_file.txtImportant:Make sure to replace "username" with your username in the above address. All other files should be staged before job submission.
If your large file is a
.tar.gzfile that untars to include other files, remember to remove such files before the end of the job; otherwise, HTCondor will think that such files are new output that needs to be transferred back to the submit server. (HTCondor will not automatically transfer back directories.)
For all user support, questions, and comments: firstname.lastname@example.org