Use and transfer data in jobs on the HTC system

Which Option is the Best for Your Files?


Input Sizes Output Sizes File Location Syntax for transfer_input_files Availability, Security
0 - 100 MB per file, up to 500 MB per job 0 - 5 GB per job /home No special syntax CHTC and external pools
100 MB - TBs per job-specific file; repeatedly-used files > 1GB 4 GB - TBs per job /staging osdf:/// CHTC and external pools
100 MB - TBs per job-specific file 4 GB - TBs per job /staging/groups file:/// CHTC only

Data Storage Locations

The HTC system has two primary locations where users can place their files:

/home

  • The default location for files and job submission
  • Efficiently handles many files
  • Smaller input files (<100 MB) should be placed here

/staging

  • Expandable storage system but cannot efficiently handle many small (few MB or less) files
  • Larger input files (>100 MB) should be placed here, including container images (.sif)

The data management mechanisms behind /home and /staging are different and are optimized to handle different file sizes and numbers of files. It’s important to place your files in the correct location, as it will improve the speed and efficiency at which your data is handled and will help maintain the stability of the HTC filesystem.

If you need a /staging directory, request one here.

Transferring Data to Jobs with transfer_input_files

In the HTCondor submit file, transfer_input_files should always be used to tell HTCondor what files to transfer to each job, regardless of if that file originates from your /home or /staging directory. However, the syntax you use to tell HTCondor to fetch files from /home and /staging and transfer to your job will change depending on the file size.

Input Sizes File Location Submit File Syntax to Transfer to Jobs
0 - 100 MB /home transfer_input_files = input.txt
100 MB - 30 GB /staging transfer_input_files = osdf:///chtc/staging/NetID/input.txt
100 MB - 100 GB /staging/groups transfer_input_files = file:///staging/groups/group_dir/input.txt
> 30 GB /staging transfer_input_files = file:///staging/NetID/input.txt
> 100 GB   For larger datasets (100GB+ per job), contact the facilitation team about the best strategy to stage your data

Multiple input files and file transfer protocols can be specified and delimited by commas, as shown below:

# My job submit file

transfer_input_files = file1, osdf:///chtc/staging/username/file2, file:///staging/username/file3, dir1, dir2/

... other submit file details ...

Ensure you are using the correct file transfer protocol for efficiency. Failure to use the right protocol can result in slow file transfers or overloading the system.

Important Note: File Transfers and Caching with osdf:///

The osdf:/// file transfer protocol uses a caching mechanism for input files to reduce file transfers over the network. This can affect users who refer to input files that are frequently modified.

If you are changing the contents of the input files frequently, you should rename the file or change its path to ensure the new version is transferred.

Transferring Data Back from Jobs to /home or /staging

Default Behavior for Transferring Output Files

When a job completes, by default, HTCondor will return newly created or edited files only in top-level directory back to your /home directory. Files in subdirectories are not transferred. Ensure that the files you want are in the top-level directory by moving them or creating tarballs.

Specify Which Output Files to Transfer with transfer_output_files and transfer_output_remaps

If you don’t want to transfer all files but only specific files, in your HTCondor submit file, use

transfer_output_files = file1.txt, file2.txt, file3.txt

To transfer a file or folder back to /staging, you will need an additional line in your HTCondor submit file:

transfer_output_remaps = "file1.txt = file:///staging/NetID/output1.txt; file2.txt = /home/NetId/outputs/output2.txt"

In this example above, file1.txt is remapped to the staging directory using the file:/// transfer protocol and simultaneously renamed output1.txt. In addition, file2.txt is renamed to output2.txtand will be transferred to a different directory on /home. Ensure you have the right file transfer syntax (osdf:/// or file:/// depending on the anticipated file size).

If you have multiple files or folders to transfer back to /staging, use a semicolon (;) to separate each object:

transfer_output_remaps = "output1.txt = file:///staging/NetID/output1.txt; output2.txt = file:///staging/NetID/output2.txt"

Make sure to only include one set of quotation marks that wraps around the information you are feeding to transfer_output_remaps.

HTC Guides