Powered by:
Open Science Grid
Center for High Throughput Computing

File Availability with Squid Web Proxy

Which Option is the Best for Your Files?


Meant For Input/Software File Sizes Output File Sizes Available for Jobs Running ... File Security Special Considerations
HTCondor file transfer basic file delivery and return; see size limits at right 0 - 100 MB per file; <500 MB total per job 0 - 4 GB total in CHTC, UW Grid, and OSG available to your jobs, on CHTC and beyond DO NOT USE for files in /mnt/gluster OR /squid
SQUID Web Proxy large input or software shared by many jobs 100 MB - 1 GB per shared file N/A in CHTC, UW Grid, and OSG files will be world-readable large files unique to individual jobs are better for Gluster
Gluster File Share largest software, input, and output 100 MB - TBs per unique file; 1GB - TBs per shared file 4 GB - TBs in only a portion of CHTC accessible ONLY to your jobs on specific CHTC servers special submit "Requirements"

SQUID Web Proxy

CHTC maintains a SQUID web proxy from which pre-staged input files and executables can be downloaded into jobs using CHTC's proxy HTTP address.

Applicability

Intended Use:
The SQUID web proxy is best for cases where many jobs will use the same large file (or few files), including large software. It is not good for cases when each of many jobs needs a different large input file, in which case "Gluster" should be used. Remember that you're always better off by pre-splitting a large input file into smaller job-specific files if each job only needs some of the large files's data. If each job needs a large set of many files, you should create a .tar.gz file containing all the files, and this file will still need to be less than 4 GB.
Access to SQUID:
is granted upon request to chtc@cs.wisc.edu. A user on CHTC submit servers may will be granted a user directory within /squid, which is only accessible via the CHTC submit server. As for all CHTC file space, users should minimize the amount of data on the SQUID web proxy, and should clean files from the /squid location regularly. CHTC staff reserve the right to remove any file from /squid when needed to preserve availability and performance for all users.
Advantages:
Files placed on the SQUID web proxy can be downloaded by jobs running anywhere, because the files are world-readable.
Limitations and Policies:
  • SQUID cannot be used for job output, as there is no way to change files in SQUID from within a job.
  • SQUID is also only capable of delivering individual files up to 4 GB in size.
  • A change you make to a file within your /squid directory may not take effect immediately on the SQUID web proxy if you use the same filename. Therefore, it is important to use a new filename when replacing a file in your /squid directory.
  • Jobs should still ALWAYS and ONLY be submitted from within the user's /home location.
  • Only the "http" address should be listed in the "transfer_input_files" line of the submit file. File locations starting with "/squid" should NEVER be listed in the submit file.
Data Security:
Files placed in SQUID can only be edited by the owner of the user directory within /squid, but will end up being world-readable on the SQUID web proxy in order to be readily downloadable by jobs (with the proper HTTP address); thus, large files that should be "private" should not be placed in your user directory in /squid, and should instead use CHTC's Gluster location for large-file staging.

Using SQUID to Deliver Input Files

  1. Request a directory in SQUID. Write to chtc@cs.wisc.edu describing the data you'd like to place in SQUID, and indicating your username and submit server hostname (i.e. submit-5.chtc.wisc.edu).
  2. Place files within your /squid/username directory on your submit server. For example, from a location within your /home directory, you could type something like the following:
    [username@submit]$ cp large_file.tar.gz /squid/username/
    Check the file:
    [username@submit]$ ls /squid/username/
  3. Have HTCondor download the file to the working job using the http://proxy.chtc.wisc.edu/SQUID address in the transfer_input_files line of your submit file:
    transfer_input_files = other_file1,other_file2,http://proxy.chtc.wisc.edu/SQUID/username/lg_file.txt
    Important:Make sure to replace "username" with your username in the above address. All other files should be staged before job submission.

    If your large file is a .tar.gz file that untars to include other files, remember to remove such files before the end of the job; otherwise, HTCondor will think that such files are new output that needs to be transferred back to the submit server. (HTCondor will not automatically transfer back directories.)